WebSockets

WebSockets

In this article we take a look at WebSockets and how they have transformed the web, allowing for bi-directional realtime communication between server and client.

The early days of the realtime web

Let's start at the beginning. You've probably heard of HTTP right? Initially, the web was centred around the concept of a HTTP request. A client sends a HTTP request to a server for some resource or another. The server does some processing and returns some useful data to the client.

Importantly, this meant that the web was essentially a one-way stream, client requests, server fulfils. This underlying architecture of the web stayed in place for many years. People didn't question it at first, but after some time the idea of a bi-directional web was floated. Imagine the power of having a server communicate data to clients in realtime without them first requesting it; would it somehow be possible to achieve such independent, active communication between both parties?

The invention of JavaScript by Brendan Eich in 1995 was a major step towards making the web realtime. With JavaScript, a web page's user interface could be updated without a page refresh, meaning that for the first time, users could have a seamless web experience.

This was enhanced greatly by the invention of the XMLHTTPRequest (XHR) and AJAX. With AJAX, a HTTP request could be made to a server behind the scenes without user intervention, and the result of such a request could be used in a myriad of ways, most commonly to change the state (UI) of the page using JavaScript. As AJAX could be used to get data from a server on demand at regular intervals, not only could users have a seamless web experience, but now they could have a quasi-realtime web experience too.

The technique of requesting data from a server at regular intervals as described above is known as polling. Although initially used by many sites out of necessity, polling is now considered unsuitable for client-server interaction due to the large overhead of each request, in the form of headers and cookies which are (in many cases, needlessly) sent across. Surely there must be a better way?

The XMLHTTPRequest proved to be quite versatile. The desire for realtime, bi-directional communication between server and clients meant that some inventive techniques arose which repurposed the XMLHTTPRequest to allow for data to be passed from server to clients seemingly on-demand. These techniques, exploiting long held HTTP connections were early examples of what would become known as Comet.

Long polling is perhaps the most well-known Comet technique. With long polling, an asynchronous HTTP request is initiated by the client and kept open until the server has some data to send. When the client has received a response (data) from the server, it immediately sends another request to the server and the process continues. It follows that with this method a connection is always open between the client and server, allowing for a sense of bi-directionality and realtime.

Another version of long polling involves using dynamically created script tags to point to a remote resource that returns JavaScript that is executed in the current document.

Although somewhat useful, the long polling technique is prone to errors and is unsuitable for small messages due to the large overhead; at its core, it's using technology which was not purposed for bi-directionality and the request object is the same as that used with polling.

Enter WebSockets

Beginning in 2008, it was suggested that there could potentially be a better way to achieve realtime, full-duplex (simultaneous two-way) communication between a given client and server. The WebSocket protocol was devised for this very purpose.

WebSockets are a thin layer on top of an underlying TCP/IP stack. This means that they actually act over a TCP connection, the same as HTTP. They can operate over the same ports as HTTP too, 80 and 443, and have their own URI scheme, ws:// and wss://, the latter being used for secure connections, in the same way that https:// is used for secure HTTP connections.

WebSockets aim to provide a means for two-way, client-server communication to happen in realtime, with minimal overhead. In other words, they facilitate quick message exchanges between a client and a server; a conversation, if you may.

WebSockets were introduced in Google Chrome 4 in 2010 and most other browsers followed suit. As of 2019, all major browsers support WebSockets (97% coverage worldwide).

Opening a WebSocket connection

It's straightforward to open a WebSocket connection using JavaScript. The following code is all you need:

  • JavaScript
var socket = new WebSocket("wss://example.com");

The WebSocket handshake

Opening a WebSocket connection starts what's called a handshake between the client and the server. The handshake does a few things, but primarily it negotiates upgrading the underlying HTTP connection to use the WebSocket protocol and ensures that no third party has been involved in falsely requesting a WebSocket connection in place of the client.

To establish a WebSocket connection, a client must send a request with an Upgrade header as follows:

GET / HTTP/1.1
Host: www.example.com
Connection: Upgrade
Upgrade: websocket

The server then responds as follows:

HTTP/1.1 101 Web Socket Protocol Handshake
Upgrade: WebSocket
Connection: Upgrade
Sec-WebSocket-Accept: a8raz3Lr22hfqAjtCxWigVwhpaB=

Not displayed above, but in addition to the Upgrade header, the client sends a Sec-WebSocket-Key, which is then transformed by appending the GUID 258EAFA5-E914-47DA-95CA-C5AB0DC85B11. The resulting string is hashed and returned in the response, as the Sec-WebSocket-Accept header. This ensures that each WebSocket handshake is free from tampering.

Binding to events

Once the handshake is complete, both parties can use the WebSocket connection to send data to one another. Client-side, we can bind to events on the WebSocket object to get notified when the connection is open and when a message is received as follows:

  • JavaScript
socket.onopen = function () {
    // Send a message to the server
    socket.send("Hello World!");
};

socket.onmessage = function (e) {
    // Log the data received
    console.log(e.data);
};

You can also bind to the onerror and onclose events, if desired.

Messages and frames

Messages between the client and server over a WebSocket connection are exchanged in the form of frames. Messages can span multiple frames, in which case it is the responsibility of both parties to keep track of the frames they receive. An added benefit is that the total number of frames does not have to be known in advance, so a party can get started processing the information they receive straight away.

What are WebSockets used for?

WebSockets are most commonly used for realtime communication between a client and a server. They are inherently low-latency and full-duplex, and over the last few years they have transformed the web allowing for a realisation of realtime bi-directional user experiences.

There are a number of real-world practical applications for WebSockets, including realtime communication and instant messaging, screen sharing and collaboration, live updating content and streamed media.

WebSocket servers: Use a hosted solution

The client-side code presented in this article is only a fraction of the picture. To start using WebSockets you will need to have a platform that handles client connections, probably with multiple servers for scalability and extra handling for pinging sockets, not to mention end-to-end encryption and presence functionality.

Don't go it alone, use our solution.

PushRadar is a realtime notifications API service that provides a fully managed solution for realtime communication between your server and clients. Built with WebSockets at its core, our API solves all the infrastructure problems for you, so you can concentrate on what matters - your business.

Get started today!

Get started with PushRadar free by clicking the image below:

PushRadar: Realtime Notifications API Service