Optimizing web applications is not only about tweaking code. A great deal of optimizing is about defusing the limitations of the World Wide Web (the Web). The Web was designed for exchanging text documents over the Internet - not today's complex web applications. So, if you want to create high-performance web applications, it'll be crucial to understand how the Web works and its flaws.
The World Wide Web (WWW) is an Internet application, just like Spotify and Netflix. An internet application is a program that runs on different hosts that communicates with each other over the internet. And, the internet itself is a network that connects billions of devices, and a network is in simple terms hosts/computers that can exchange data with each other because they are connected.
The Web consists of two different programs:
To visit websites, we use browsers like Firefox, Chrome, or Internet Explorer to request objects from Web servers like Apache, Nginx, and IIS. Web servers receive requests and then respond to Web browsers with the resource or a reason as to why not. A Web page is just a document written in HTML that usually contains URLs that point to objects like, images, videos, and other HTML-files. Objects like the ones just mentioned are stored on Web servers where they can be retrieved with a unique URL. A document for a web page with an image could look like this:
<!DOCTYPE html> <html> <head></head> <body> <img src="the url here"> </body> </html>
In our industry, there's a lot of jargon, and performance optimization is no exception. Let's take a quick detour and look at some common keywords:
Then we need to know our bits and bytes. A big B stands for byte while a small b stands for bit. So, 100Mb refers to megabits while 100MB refers to megabytes. Expressing networking throughput is almost always done in bits per second (bps), not bytes per second. Then on top of that, the more throughput we get, we use the decimal versions of Kilo, Mega, and Giga. Now with some acquired jargon, we'll shift our attention to TCP and how it causes extra round trips and limits how much data we may send initially.
Confusingly enough, TCP/IP is a set of protocols or a protocol suite - not a protocol. TCP/IP contains protocols such as TCP, UDP, DNS, HTTP, and many more. It has four layers where each layer takes care of a specific thing:
A protocol is a set of rules that decide how hosts may communicate with each other at a certain layer. As the name "TCP/IP" implies, TCP and IP are its main protocols. TCP stands for Transmission Control Protocol and is a protocol at the transport layer. TCP is on top of IP and ensures reliable in-order transmission of bytes between two hosts. TCP is connection-based and must do a three-way handshake to set up a connection between two hosts. To send data reliably, TCP controls the traffic being sent with certain mechanisms such as flow control and congestion control.
Flow control makes sure that the receiver is not overwhelmed by data. Hosts have their receive window (
rwnd) that symbolizes how much data it can currently receive. The
rwnd variable allows the sender to adjust the rate of flow that it sends data. Throughout the lifetime of the TCP-connection, the window decreases and increases with the help of feedback. Congestion control, on the other hand, makes sure that the network does not get congested. In other words, it makes sure routers can forward packets without losing them. It does so by ensuring that the sender does not overwhelm the network. Let's look at two of the most fundamental algorithms for congestion control:
cwnd) for the sender. The variable
cwndis a sender-side limit of how much data may be sent. The cwnd's initial value is low and increments upwards until it gets a rough estimate of the available bandwidth. The congestion window gets doubled every round trip. We also have another variable called
ssthresh(slow-start threshold), which is set to a high value. When Slow-start stops the congestion avoidance algorithm starts, it does so when either
cwnd >= ssthreshor congestion occurs.
cwnd >= ssthresh, congestion avoidance will be activated. This algorithm increases
cwndlinearly instead of doubling it. If it's started by a duplicated packet, then
ssthreshwill be set to
cwnddivided by two. A timeout will also set
ssthreshto half of cwnd and then set
cwndto 1. It's divided by two because that is the last known safe value.
Fine, flow, and congestion control doesn't seem so bad, they're great for reliability. But reliability comes at a price; that price is speed. TCP is designed for reliability, not speed. Let's take a closer look at the implications that this inflicts our performance.
Starting with the three-way handshake. TCP connections begin with a three-way handshake, the sender sends an
SYN package and then the receiver responds with an
SYN-ACK, once the sender receives the
SYN-ACK it can send an
ACK and then start sending data right away. This means that setting up a new TCP connection costs us a roundtrip of latency. Then flow-control and congestion-control limit our throughput at the beginning of a new connection even though the bandwidth allows more, this means that we need to do even more roundtrips. With that in mind we can do the following to optimize our TCP:
So more bandwidth is not the solution to all our problems. Bandwidth is important, but when it comes to "everyday web browsing" it's the roundtrip latency and how TCP is built that gets us. Streaming videos are bandwidth limited while loading a web page is latency limited. We're almost already achieving latency at the speed of light. But, going around the globe still takes about 134ms with the speed of light. So hopefully, this section made it clear how TCP isn't designed for our modern everyday web browsing. We're no longer fetching only one document - things have changed.
HTTP stands for Hypertext Transfer Protocol. HTTP is the protocol that defines the structure of messages between clients and servers. HTTP is an application layer protocol that uses TCP for transporting the messages. HTTP is not required to use TCP, it can also use UDP, but essentially all HTTP traffic is transmitted via TCP. So to be able to send HTTP messages between a client and a server a TCP-connection needs to be established.
HTTP 0.9 was the first version and it was designed to transfer simple text documents where the connection is closed after every request. HTTP/1.0 later came along and expanded the features and made sure that the response object no longer only could be hypertext. Since HTTP/1.0 our resources can be many different types of data, so we use MIME (Multipurpose Internet Mail Extensions) to understand how to handle the received resource. An HTML-file's MIME-type is
text/html while a JPEG image is
image/jpeg. But it was not until the release of HTTP/1.1 that we got
keep-alive which lets us reuse the TCP-connection for more requests than just the initial one. Keep-alive is used by default in HTTP/1.1 and it also gave us a bunch more performance improvements. Here are some optimizations that came along during the HTTP/1.1 era:
The items in the list above have been proved to improve performance. However, these are all hacks derived from the limitations of TCP and previous HTTP versions. Understanding that these optimizations were imperfect but a step in the right direction is important. For example, bundling things gives fewer requests, but it hurts the caching if we bundle A.js and B.js into C.js, and then we update A.js; the user will have to get a new C.js even though the user only is using B.js.
Now after about 20 years, HTTP/2.0 arrived and contained fixes for many hacks introduced in HTTP/1.1. Instead of giving more TCP connections, HTTP/2.0 tries to solve the problem at its core by introducing multiplexing, which not only removes head-of-line blocking, it'll also make it cheaper making requests. HTTP/2.0 gives more benefits such as header compression. Before HTTP/2.0, request and response headers were not compressed, which was a missed opportunity to remove unnecessary bytes. HTTP/2.0 deeply focuses on decreasing latency, with new techniques such as multiplexing and server-push. In essence, HTTP/2.0 undo a lot of hacks that HTTP/1.1 introduced.
We won't cover all of HTTP - but you should now understand that HTTP and TCP never were designed for the way we use the Web these days.
The short answer is that it depends. The easiest way is to build with performance in mind from the start - but that is often not an option if you're working in an exciting code base. So start with abiding the number one rule of performance, do not optimize what cannot be measured. Then follow the two universal rules:
Remember to put your efforts correctly by measuring where your application's problem is. This article was never about giving you solutions to all problems. It was meant to make it easier to identify problems once they occur; so you can take corrective action. Many of our optimizations might be done on the application layer, but they're often derived from limitations of the transport layer. We're in a dilemma where TCP wants long-lived connections, while HTTP is all about being stateless and short bursts of data. So what should you optimize? Well, it truly depends.