Introduction to WebSockets :: Part - 2
Bhaskar S | 02/14/2014 |
Introduction
In Part-1 we got our hands dirty with a practical example in WebSockets.
But, how do WebSockets really work behind the scene ???
In this part, we look under-the-hood to understand the mechanics behind WebSockets.
WebSockets - Under the Hood
Launch Firefox and enter the following URL:
We should see the browser like the one shown in Figure-1 below:
When we click on the Start Monitor button, a HTTP GET command is issued as shown in Figure-2 below:
This is the opening handshake request message from the client to the server.
The Request URI /polarsparc/SimpleMonitor after GET is used to identify the WebSocket server endpoint.
The Host request header allows both the client and the server to verify and agree which host to use.
The Sec-WebSocket-Version request header allows the client to indicate the WebSocket protocol version it plans to communicate with. If incompatible with the version supported by the server, then the handshake is terminated by the server.
The Origin request header indicates the origin of the client that is initiating the WebSocket connection. The server can use this information to determine whether to accept the incoming client connection. If the server does not wish to accept this client connection, it will terminate the handshake.
The Sec-WebSocket-Key request header is a BASE64 encoded value that is used by the server to send a corresponding response header to indicate acceptance of the client WebSocket connection.
The Connection request header from the client must include the Upgrade string.
The Upgrade request header from the client must contain the protocol string websocket. Upon receiving this header from a client, the server will attempt to switch to the requested protocol, which is WebSocket in this case.
In order to successfully complete the client WebSocket connection request, the server will respond with a message as shown in Figure-3 below:
This is the opening handshake response message from the server to the client.
The server must send a HTTP status code of HTTP/1.1 101 Switching Protocols to indicate the server is switching protocols from HTTP to WebSocket.
The Upgrade response header from the server must contain the protocol string websocket. Upon receiving this header from the server, the client will attempt to switch to the requested protocol, which is WebSocket in this case.
The Connection response header from the server must include the Upgrade string.
The Sec-WebSocket-Accept resonse header is a BASE64 encoded value. It is computed by taking the value from the client request header Sec-WebSocket-Key, appending it with a Globally Unique Identifier (GUID) string of "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" and creating a BASE64 of the SHA-1 of the concatenated value. The presence of this respobnse header indicates that the server has accepted the client websocket connection. The client will validate this value upon receiving the response from the server.
This completes the initial handshake between the client and the server and at this point a WebSocket connection is established between the client and the server.
The client and the server can now comunicate with each other in full-duplex using WebSocket messages. At the protocol level the WebSocket messages are known as WebSocket Data Frames.
In our example, the client sends a text WebSocket message once after which the server starts sending CPU metrics as text WebSocket messages to the client at regular intervals (every 5 seconds in our example). The following is an example of a WebSocket message from the server to the client captured at the network protocol level:
A WebSocket Data Frame at the protocol level is defined as shown in Figure-5 below:
The following is the explanation of the various fields from the WebSocket Data Frame:
FIN - 1 Bit :: If this bit is set to a 1, it indicates this is the final frame
RSV1 - 1 Bit :: This bit is set to a 0. It is a reserved bit that is currently unused
RSV2 - 1 Bit :: This bit is set to a 0. It is a reserved bit that is currently unused
RSV3 - 1 Bit :: This bit is set to a 0. It is a reserved bit that is currently unused
opcode - 4 Bits :: These bits define how to interpret the data frame. The following are the definitions for each value:
0x0 :: Means this is a continuation frame
0x1 :: Means this is a text frame
0x2 :: Means this is a binary frame
0x8 :: Means this is a connection close frame
0x9 :: Means this is a ping frame
0xA :: Means this is a pong frame
MASK - 1 Bit :: If this bit is set to a 1, then the Masking-key field has a value that is used to mask (using XOR) the payload
Payload len - 7 Bits :: These bits define the length of the Payload Data in bytes if length less than or equal to 125. If the value is equal to 126, then the following 2 bytes interpreted as a 16-bit unsigned integer is the length of the Payload Data in bytes. If the value is equal to 127, then the following 8 bytes interpreted as a 64-bit unsigned integer (the most significant bit MUST be 0) is the length of the Payload Data in bytes
Masking-key - 4 Bytes :: If the MASK bit is set to a 1, then the 32-bit value in this field is used to mask (using XOR) the value in the Payload Data
Payload Data - in Bytes :: Actual application data. If the MASK bit is set to a 1, then the data is masked (using XOR) using Masking-key
When we click on the Stop Monitor button, a close WebSocket message is sent by the client to the server as shown in Figure-6 below:
When the server endpoint receives a close WebSocket message from a client, the server will respond with a close WebSocket message to the client as shown in Figure-7 below:
A this point the WebSocket connection between the client and the server is closed.
References