Networking Fundamentals for Web Optimization

Jul 22, 2020 by Nicklas Envall

Optimizing web applications is not only about tweaking code. A great deal of optimizing is about defusing the limitations of the World Wide Web (the Web). The Web was designed for exchanging text documents over the Internet - not today's complex web applications. So, if you want to create high-performance web applications, it'll be crucial to understand how the Web works and its flaws.

The Internet is a Network that the Web uses

The World Wide Web (WWW) is an Internet application, just like Spotify and Netflix. An internet application is a program that runs on different hosts that communicates with each other over the internet. And, the internet itself is a network that connects billions of devices, and a network is in simple terms hosts/computers that can exchange data with each other because they are connected.

The Web consists of two different programs:

The Browser program (a client process)
The Web server program (a server process)

To visit websites, we use browsers like Firefox, Chrome, or Internet Explorer to request objects from Web servers like Apache, Nginx, and IIS. Web servers receive requests and then respond to Web browsers with the resource or a reason as to why not. A Web page is just a document written in HTML that usually contains URLs that point to objects like, images, videos, and other HTML-files. Objects like the ones just mentioned are stored on Web servers where they can be retrieved with a unique URL. A document for a web page with an image could look like this:

<!DOCTYPE html>
<html>
  <head></head>
  <body>
    <img src="the url here">
  </body>
</html>

By examining the HTML-document above, it would be reasonable to think that the image gets transferred with the document - but that's incorrect. Once the browser receives the document, it'll start to parse the document and then see that it needs to request the image before it can be displayed. That is a major flaw and performance issue, and to make sense of it, we need to look at how the Web has evolved over the years. Because, in its early days, Hypertext documents dominated the Web; simple plain text files. Then Hypertext Markup Language (HTML), extended support for hypermedia resources, like videos, and images which allowed us to create Web pages. But today, we use JavaScript and the XMLHttpRequest API to do AJAX programming, which lets us create complex web applications.

Web applications consist of many resources such as images, JavaScript, stylesheets, and many more. Loading resources are costly and more resources equals more requests. And even worse, each resource can contain its own resources that need to be fetched. As mentioned earlier, the Web was never intended for these kinds of websites, so our performance suffers. Now let's delve into what these problems are caused by and what we can do about it.

Performance Measurements and Units

In our industry, there's a lot of jargon, and performance optimization is no exception. Let's take a quick detour and look at some common keywords:

Throughput: is the actual amount of data passing through a media or connection.
Bandwidth: is the maximum throughput capacity of a network or data-transmission medium.
Latency: is the time it takes to send from source to destination.
Speed: is a very generic word but in this context it's the combination of throughput and latency.
Round-trip time (RTT): is the time it takes for a packet to be sent and getting a response back from the receiver.
Socket: is an internal endpoint for receiving or sending data within a node on a network.
Simplex operation: is when data can only be sent in one direction.
Half-duplex: is when data can be sent in two directions but only one at a time.
Full-duplex: is when data can be sent in two directions simultaneously.

Then we need to know our bits and bytes. A big B stands for byte while a small b stands for bit. So, 100Mb refers to megabits while 100MB refers to megabytes. Expressing networking throughput is almost always done in bits per second (bps), not bytes per second. Then on top of that, the more throughput we get, we use the decimal versions of Kilo, Mega, and Giga. Now with some acquired jargon, we'll shift our attention to TCP and how it causes extra round trips and limits how much data we may send initially.

TCP/IP and TCP

Confusingly enough, TCP/IP is a set of protocols or a protocol suite - not a protocol. TCP/IP contains protocols such as TCP, UDP, DNS, HTTP, and many more. It has four layers where each layer takes care of a specific thing:

Physical layer: handles the physical communications media.
Internet layer: accepts and delivers packets for the network.
Transport layer: maintains end-to-end communications.
Application layer: provides standardized data exchanges between hosts.

A protocol is a set of rules that decide how hosts may communicate with each other at a certain layer. As the name "TCP/IP" implies, TCP and IP are its main protocols. TCP stands for Transmission Control Protocol and is a protocol at the transport layer. TCP is on top of IP and ensures reliable in-order transmission of bytes between two hosts. TCP is connection-based and must do a three-way handshake to set up a connection between two hosts. To send data reliably, TCP controls the traffic being sent with certain mechanisms such as flow control and congestion control.

Flow control makes sure that the receiver is not overwhelmed by data. Hosts have their receive window (rwnd) that symbolizes how much data it can currently receive. The rwnd variable allows the sender to adjust the rate of flow that it sends data. Throughout the lifetime of the TCP-connection, the window decreases and increases with the help of feedback. Congestion control, on the other hand, makes sure that the network does not get congested. In other words, it makes sure routers can forward packets without losing them. It does so by ensuring that the sender does not overwhelm the network. Let's look at two of the most fundamental algorithms for congestion control:

Slow-start adds a congestion window (cwnd) for the sender. The variable cwnd is a sender-side limit of how much data may be sent. The cwnd's initial value is low and increments upwards until it gets a rough estimate of the available bandwidth. The congestion window gets doubled every round trip. We also have another variable called ssthresh (slow-start threshold), which is set to a high value. When Slow-start stops the congestion avoidance algorithm starts, it does so when either cwnd >= ssthresh or congestion occurs.
Congestion avoidance when congestion occurs or cwnd >= ssthresh, congestion avoidance will be activated. This algorithm increases cwnd linearly instead of doubling it. If it's started by a duplicated packet, then ssthresh will be set to cwnd divided by two. A timeout will also set ssthresh to half of cwnd and then set cwnd to 1. It's divided by two because that is the last known safe value.

Fine, flow, and congestion control doesn't seem so bad, they're great for reliability. But reliability comes at a price; that price is speed. TCP is designed for reliability, not speed. Let's take a closer look at the implications that this inflicts our performance.

Starting with the three-way handshake. TCP connections begin with a three-way handshake, the sender sends an SYN package and then the receiver responds with an SYN-ACK, once the sender receives the SYN-ACK it can send an ACK and then start sending data right away. This means that setting up a new TCP connection costs us a roundtrip of latency. Then flow-control and congestion-control limit our throughput at the beginning of a new connection even though the bandwidth allows more, this means that we need to do even more roundtrips. With that in mind we can do the following to optimize our TCP:

Update your server to get the latest TCP improvements.
Increase TCP's initial congestion window.
Reuse TCP connections to avoid three-way handshakes.
Send fewer bits by utilizing compression.
Move the receiver closer to the sender to decrease latency.
See if TCP fast open is an option for you - it allows you to send data in the SYN packet.

So more bandwidth is not the solution to all our problems. Bandwidth is important, but when it comes to "everyday web browsing" it's the roundtrip latency and how TCP is built that gets us. Streaming videos are bandwidth limited while loading a web page is latency limited. We're almost already achieving latency at the speed of light. But, going around the globe still takes about 134ms with the speed of light. So hopefully, this section made it clear how TCP isn't designed for our modern everyday web browsing. We're no longer fetching only one document - things have changed.

Hypertext Transfer Protocol (HTTP)

HTTP stands for Hypertext Transfer Protocol. HTTP is the protocol that defines the structure of messages between clients and servers. HTTP is an application layer protocol that uses TCP for transporting the messages. HTTP is not required to use TCP, it can also use UDP, but essentially all HTTP traffic is transmitted via TCP. So to be able to send HTTP messages between a client and a server a TCP-connection needs to be established.

HTTP 0.9 was the first version and it was designed to transfer simple text documents where the connection is closed after every request. HTTP/1.0 later came along and expanded the features and made sure that the response object no longer only could be hypertext. Since HTTP/1.0 our resources can be many different types of data, so we use MIME (Multipurpose Internet Mail Extensions) to understand how to handle the received resource. An HTML-file's MIME-type is text/html while a JPEG image is image/jpeg. But it was not until the release of HTTP/1.1 that we got keep-alive which lets us reuse the TCP-connection for more requests than just the initial one. Keep-alive is used by default in HTTP/1.1 and it also gave us a bunch more performance improvements. Here are some optimizations that came along during the HTTP/1.1 era:

Persistent connections: helps us avoid doing new handshakes.
Six TCP connections per host: modern browsers often give us 6 TCP connections per host.
Domain sharding: is when you use subdomains {shard1, shard2}.example.com to get more than 6 TCP connections.
Concatenation: is when we bundle our JavaScript or CSS files into bundles, which reduces the number of requests.
Spriting: is when we create one big image that contains images to avoid more requests.
Minification: removes unnecessary bytes by removing things that the computer doesn't need - like comments.
HTTP pipelining: allows us to send requests without having to wait for a response over the same TCP connection. Sadly this technique is prone to cause head-of-line blocking. Responses are still queued up, so if request A takes longer than request B, then response B will have to wait for A, which we refer to as head-of-line blocking.

The items in the list above have been proved to improve performance. However, these are all hacks derived from the limitations of TCP and previous HTTP versions. Understanding that these optimizations were imperfect but a step in the right direction is important. For example, bundling things gives fewer requests, but it hurts the caching if we bundle A.js and B.js into C.js, and then we update A.js; the user will have to get a new C.js even though the user only is using B.js.

Now after about 20 years, HTTP/2.0 arrived and contained fixes for many hacks introduced in HTTP/1.1. Instead of giving more TCP connections, HTTP/2.0 tries to solve the problem at its core by introducing multiplexing, which not only removes head-of-line blocking, it'll also make it cheaper making requests. HTTP/2.0 gives more benefits such as header compression. Before HTTP/2.0, request and response headers were not compressed, which was a missed opportunity to remove unnecessary bytes. HTTP/2.0 deeply focuses on decreasing latency, with new techniques such as multiplexing and server-push. In essence, HTTP/2.0 undo a lot of hacks that HTTP/1.1 introduced.

We won't cover all of HTTP - but you should now understand that HTTP and TCP never were designed for the way we use the Web these days.

So, what optimizations should I do?

The short answer is that it depends. The easiest way is to build with performance in mind from the start - but that is often not an option if you're working in an exciting code base. So start with abiding the number one rule of performance, do not optimize what cannot be measured. Then follow the two universal rules:

Reduce network latency
Eliminate or minimize the number of transferred bytes

Remember to put your efforts correctly by measuring where your application's problem is. This article was never about giving you solutions to all problems. It was meant to make it easier to identify problems once they occur; so you can take corrective action. Many of our optimizations might be done on the application layer, but they're often derived from limitations of the transport layer. We're in a dilemma where TCP wants long-lived connections, while HTTP is all about being stateless and short bursts of data. So what should you optimize? Well, it truly depends.