Browser Rendering: JS + DOM + CSSOM

Feb 28, 2020 by Nicklas Envall

There's a thing called the Critical Rendering Path (CRP), which is the pipeline for rendering a page. In other words, it's the steps the browser needs to take to create pixels into the screen from your HTML, CSS, and JavaScript code. Now, CSS and HTML are so-called render-blocking resources, meaning, they need to be fetched and fully parsed before we can render our page. The parsing of HTML and CSS creates two trees, the DOM and the CSSOM. Furthermore, during this process, JavaScript can be both loaded and executed. The time it takes to parse, fetch and execute things is known as blocking, which subsequently slows down the page load time.

Crane building a web page

We will cover what these parts are all about and then, in the end, we'll look at how knowing the CRP can help us build high-performance web apps. However, this article's sole focus is not on the CRP, but rather a closer look at some of its parts. So fear not if you have no idea what the DOM is or CSSOM, we'll cover that.

Table of Contents:

Document Object Model (DOM)
Cascading Style Sheets Object Model (CSSOM)
Loading and Executing JavaScript
Critical Rendering Path (CRP)

Document Object Model (DOM)

The Document Object Model (DOM) is the result of parsing HTML-code. When we write HTML documents we encapsulate HTML within HTML, this creates a hierarchy that we can use to create a tree. The browser parses that hierarchy (the document’s structure) into a tree of nodes. We may refer to the result as a syntax tree or parse tree. To be clear, we parse our HTML-code into a tree of node objects that represents the DOM, this is done since the DOM itself is an interface used for scripting. It's the browser that deals with the creation of the nodes for the initial loading of the HTML document.

Dom tree visual example

We also have the Browser Object Model (BOM), which allows our JavaScript code to interact with the browser. The BOM is a collection of browser objects. The top-level object in the BOM is the window object, which represents an open browser window. Examples of browser objects are navigator, screen, history and also the DOM (document), they are all accessible via the window object. Unlike the DOM, the BOM does not have a standard for implementation and is not strictly defined, which subsequently means that the BOM varies depending on which browser you use. Furthermore, the window object is where all globals are put into. But now we'll focus on the DOM by starting at looking at what a node is.

What is a Node?

All node objects inherit the Node interface. This interface has the essential properties and methods for manipulating, inspecting, and traversing the DOM. So, there are different kinds of nodes, examples are Document, Element, CharacterData, DocumentFragment. They all inherit from Node, but they also have their unique attributes. You can use nodeType and nodeName when you want to know the available properties on a node:

// <p>hello</p>
const p = document.getElementsByTagName('p')[0];

p.nodeType;  // 1 === Node.ELEMENT_NODE
p.nodeName;  // P

1. Document Node

The DOM is itself a node object, a Document node. The Document node can hold two different types of nodes. The first being, the DocumentType node object which represents a DocType (<!DOCTYPE>). The DocType declares the markup language and what version of it that the document uses. The second node that the Document node holds is an ElementType node object.

<!DOCTYPE html>
<html></html>

We can get the references to these nodes easily via document, like document.doctype, document.documentElement, document.body and so on. Lastly, even though the browser parses our HTML on the initial load, we can still create nodes after with document.createElement(tag: string).

2. Element Node

There are many different types of Element nodes like html, body, span, h1, and the list goes on. These elements can have attributes, for example with <div id="1" class="my-class"></div>, we have two attributes which are id and class. The attributes themselves are Attr nodes, we can access the attributes by using the .attributes property. If you want to get, set or remove an attribute you can use the following:

Element.getAttribute(attrName: string)
Element.setAttribute(attrName: string, attrValue: string)
Element.removeAttribute(attrName: string)
Element.hasAttribute(attrName: string)
Or just use the .attributes property directly.

Furthermore you've probably sometime used more than one class for an element, like <span class="class1 class2"></span>. When working with the class attribute you can use Element.classList to get the classes in an object like { length: 2, value: "class1 class2", 0: "class1", 1: "class2" }, or simply do Element.className to get "class1 class2". You also have the following add and remove methods when working with classes:

Element.classList.add(className: string)
Element.classList.remove(className: string)
Element.classList.toggle(className: string)
Element.classList.contains(className: string)

3. Text Node

The text scattered around in your HTML-code will be parsed into Text node objects. This also applies to whitespaces, since it's a character. So <div> </div> would, for example, have a child node that's a text node. Furthermore, when you use Element.textContent it'll concatenate all the text nodes within that element (including the children's text nodes, etc) and return it as a string. You can also use Element.textContent to set a new single text node while also removing all other text nodes.

On a side note, innerText is similar to textContent, but it ignores the text if it's hidden by CSS or inside script or style tags, while textContent does not.

How to create, insert, replace, remove and clone Nodes

The properties/methods innerHTML, outerHTML, textContent, and insertAdjacentHTML() allows us to use strings when adding elements to the DOM. However, a word of caution when doing so is that some of these methods invoke an expensive HTML parsing process. The following code would remove all the content in your body tag and then add some content:

document.body.innerHTML = '';
document.body.innerHTML += 'add1'; // add1
document.body.innerHTML += 'add2'; // add1add2

We also have the appendChild() and insertBefore() methods that we can use to add a node object as a child to another node object. In the example below, we create an h1 object, adding text to it and then appending it to the body tag:

let h1 = document.createElement('h1');
h1.innerText = 'hi all';
document.body.appendChild(h1);

Then we have the methods, removeChild() and replaceChild() which both do just as their name implies. The methods return the reference to the node object, so the node gets removed from its parent, not memory. These methods can be somewhat tricky to invoke since we pass the reference of the node object to the methods, but the methods are invoked on the parent, like parentOfNodeObj.removeChild(referenceToNodeObj).

…
<body>
  <h1>hello</h1>
  <p>world</p>

<script>
  let h1 = document.body.firstChild;
  h1.parentNode.removeChild(h1); // same as document.body.removeChild(h1)
</script>
...

To clone a node you simply do let newNode = oldNode.clone(). If you also want to clone its children you must pass true, like let newNode = oldNode.clone(true). A word of caution, since the attributes and values are copied you might encounter duplicates of element IDs in a document.

Selecting and Traversing Element Nodes

When we want to get a single element we can use getElementById or querySelector(). The querySelector() has the following characteristics:

Returns the first node that’s found.
You pass it a CSS3 selector.
Can be used like Element.querySelector(), which means it only searches that particular part of the DOM tree.

As you see, they return the first found node. So, how do we select multiple nodes? Well, the following methods can help us out:

querySelectorAll(CSSselector: string)
getElementsByTagName(tag: string)
getElementsByClassName(className: string)

These methods create a list containing the elements. But be aware that these can cause unexpected behaviours, querySelectorAll() returns a snapshot of the current state, while the other two returns lists that always represent the current state of the DOM. Study the following code:

// Creating and setting up our Element node
let div = document.createElement('div');
div.setAttribute('class', 'classValue');
div.appendChild(document.createTextNode('hello'));

// Adding the div to the body
document.body.appendChild(div);

// Getting the div in two lists
let queryList = document.querySelectorAll('.classValue');
let classList = document.getElementsByClassName('classValue');

// We get a list back containing our div
console.log(queryList); // NodeList [div.classValue]
console.log(classList); // HTMLCollection [div.classValue]

// Remove the div from the DOM
document.body.removeChild(div);

// As we see queryList contains a snapshot while classList the current state
console.log(queryList); // NodeList [div.classValue]
console.log(classList); // HTMLCollection []

NodeList vs HTMLCollection

When trying to get multiple elements, you'll likely encounter NodeList and HTMLCollection. They are both read-only array-like objects, in other words, both are collections of DOM nodes. They are similar but differ slightly, for example, HTMLCollection only contains elements while NodeList may contain other nodes than element, but rarely does. It's important to know that a NodeList can both be live and static. If it's live it will update according to the DOM's state, if it's static it'll be a snapshot of the current state of the DOM.

Traversing the DOM with Node properties

Properties like childNodes, firstChild, and nextSibling gives a way to traverse (travel around in) the DOM tree. A word of caution, traversing with these properties will include text and comment nodes (non-element nodes) which can cause unexpected behaviours. You can avoid traversing text and comment nodes by using firstElementChild, lastElementChild, nextElementChild, previousElementChild, childElementCount, children, and parentElement.

Cascading Style Sheets Object Model (CSSOM)

CSS stands for Cascading Style Sheets and is a language that describes the visual representation of our HTML elements. Our CSS gets parsed into a tree called CSSOM which could look something like this:

CSSOM tree visual example

Now, the process of both fetching and parsing the CSS is render-blocking because the CSSOM is needed to create a render tree (more on this later). To decide which CSS to parse, we can use three different ways, inline, internal/embedded, or external.

1. Inline

Inline styling entails passing a string containing the CSS to the attribute style on HTML elements:

<div style="color: green;"></div>

2. Internal/embedded

Internal stylesheets are embedded in a <head> tag with <style> (HTMLStyleElement):

<html>
    <head>
        <style>
            … style here …
        </style>
    </head>
</html>

3. External

External stylesheets are files containing CSS that usually have a.css extension. Externals require downloading, which increases the render-blocking aspect. To include external files in our document, we use the HTMLLinkElement like:

<html>
    <head>
        <link href="/style.css" rel="stylesheet" type="text/css">
    </head>
</html>

Cascading, Specificity, Inheritance

One of the reasons why we have to create the entire CSSOM, other than flash of unstyled content (FOUC), is because of the complexity that the browser must handle due to cascading, specificity and inheritance. Cascading is an algorithm that combines stylesheets into one style. The algorithm uses the priority of the stylesheets to know which rules apply if a conflict occurs. Note that there are more stylesheets than just the ones you define, there may be three stylesheets:

user-agent (browser's default style, think headings)
author (you the developer)
user (the user can override and customize)

Furthermore, we have Inheritance, which entails that the children can inherit style from its parent, the style flows down in the CSSOM tree. Lastly, specificity is all about determining which style should be applied based on how specific the rule is. Our CSS selectors are prioritized in the following manner: ID > CLASS > TAG. We can annotate this with 0:0:0 look at the following example:

<!DOCTYPE html>
<html>
   <head>
       <style>
           h1 { font-size: 10px; }                     /* 0:0:1 */
 
           .className { font-size: 20px; }             /* 0:1:0 */

           .className .className2 { font-size: 25px; } /* 0:2:0 */
 
           #myId { font-size: 30px; }                  /* 1:0:0 */
       </style>
   </head>
   <body>
       <h1 id="myId" class="className className2">hello</h1>
   </body>
</html>

In the example above, our h1 element will get a font size of 30px. Lastly, it's also good to know that inline styles override all stylesheets, both external and internal (unless you've used !important). However, it's often recommended not to use inline style, IDs, or !important for maintainability purposes.

Interacting with CSSOM

CSSOM also gives us APIs to find out things like the size and position of our elements. It's possible to interact with the CSSOM via JavaScript by accessing the style property on HTML elements. But the style property only contains the inline CSS that's defined via the element's style attribute. Luckily, you can use window.getComputedStyle(el: element) to get a CSSStylesheet object that contains both the inline CSS and the CSS from the cascade. TheCSSStylesheet object itself contains CSSStyleRule objects that you can manipulate with CSSStylesheet.insertRule() and CSSStylesheet.deleteRule() but that's very uncommon. Lastly, since each stylesheet corresponds to a CSSStylesheet object you can disable and enable them by toggling their disabled boolean property.

Examples of using the style property:

// get specific
element.style.height;

// set
element.style.height = '100px';

Loading and Executing JavaScript

JavaScript lets us manipulate both the DOM and CSSOM. JavaScript is parsing-blocking, which means that when the parser encounters a <script> tag, it'll stop the construction of the DOM. The following will happen:

Stop the construction of DOM.
Fetch the JS code if external.
Construct the CSSOM if not constructed (CSS is script blocking).
Execute the JS code.
Resume the construction of DOM.

As you see JavaScript can block parsing, so we must carefully consider where we put our script tags. We can add JavaScript in three ways, external, element inline, and page inline. Now, on a side note, it might seem strange that the CSSOM must be created before JavaScript can be executed. But this is because we might try to access style that has not been defined yet, like document.body.style.

1. External

We have to fetch external JavaScript files, which increases the blocking time. We use the src attribute on a script tag to specify where the file is located. Note that all code inside the script tag is ignored when using the src attribute.

<script src="app.js"></script>

With externals, we can use the defer and async attributes. Both will tell the browser to continue to construct the DOM and load the script in the background. What separates them is that defer will execute once the enclosing </html> is parsed, while async will execute the code right away when it has been fetched.

2. Element inline

Adding JavaScript with element inline entails using the event handler attribute:

<div onclick="code"></div>

3. Page inline

If you do not use the src attribute on the script tag, then you can add your JavaScript code between the tag.

<script>
    console.log('JavaScript code goes here!');
</script>

Critical Rendering Path (CRP)

The Critical Rendering Path is, in essence, all the steps required by the browser to create pixels into the screen. So far, we’ve covered the DOM, CSSOM, and how JavaScript is loaded and executed. They are all part of the Critical Rendering Path. The image below shows the pipeline for rendering a page:

CRP visual example

As we see the browser uses the constructed CSSOM and DOM to create a render tree. The render tree only contains the visible nodes, which means we exclude script tags, meta tags, elements whose style makes them invisible, etc. Once a render tree is created, the layout is computed, followed by the actual paint. Our goal should be to make this process as smooth and quick as possible. This is why we need to be aware of render-blocking resources.

In the code below, we are using an external stylesheet and an external script:

<!DOCTYPE html>
<html>

<head>
  <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css" rel="stylesheet">
  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>
</head>

<body>
  <h1>Hello</h1>
  <p>World</p>
</body>

</html>

I audited the code with lighthouse multiple times with three different cases to measure how long the first paint takes. The results where:

By changing nothing, it gave me 2.1s.
By having defer on the script tag, it gave me 1.5s.
By having defer on the script tag and removing the stylesheet, it gave me 0.8s.

As we see, just by adding defer, we made our first paint 600ms quicker. By removing bootstrap, we made it even quicker. It might seem obvious, but not sending unnecessary bytes can have a massive impact on your website's page load.

Closing remarks

We've learned that it matters how we structure our HTML elements. A common phrase is, "put the CSS at the top and the script at the bottom", which hopefully makes sense to you now. After looking at how the browser renders our HTML, CSS, and JavaScript, we can put them into three main layers:

Structure (HTML)
Presentation (CSS)
Behaviour (JavaScript + DOM + CSSOM)

In this article, we've looked at how the CRP correlates to a quick first paint (FP). But CRP also correlates to achieving high FPS, which might seem strange because we are creating websites, not "real games". But having low FPS causes page jank, which leads to bad user experience. Nevertheless, now you should have a good foundation on browser rendering and how to work with the DOM.