Browser Rendering: JS + DOM + CSSOM
by Nicklas EnvallThere's a thing called the Critical Rendering Path (CRP), which is the pipeline for rendering a page. In other words, it's the steps the browser needs to take to create pixels into the screen from your HTML, CSS, and JavaScript code. Now, CSS and HTML are so-called render-blocking resources, meaning, they need to be fetched and fully parsed before we can render our page. The parsing of HTML and CSS creates two trees, the DOM and the CSSOM. Furthermore, during this process, JavaScript can be both loaded and executed. The time it takes to parse, fetch and execute things is known as blocking, which subsequently slows down the page load time.
We will cover what these parts are all about and then, in the end, we'll look at how knowing the CRP can help us build high-performance web apps. However, this article's sole focus is not on the CRP, but rather a closer look at some of its parts. So fear not if you have no idea what the DOM is or CSSOM, we'll cover that.
Table of Contents:
- Document Object Model (DOM)
- Cascading Style Sheets Object Model (CSSOM)
- Loading and Executing JavaScript
- Critical Rendering Path (CRP)
Document Object Model (DOM)
The Document Object Model (DOM) is the result of parsing HTML-code. When we write HTML documents we encapsulate HTML within HTML, this creates a hierarchy that we can use to create a tree. The browser parses that hierarchy (the document’s structure) into a tree of nodes. We may refer to the result as a syntax tree or parse tree. To be clear, we parse our HTML-code into a tree of node objects that represents the DOM, this is done since the DOM itself is an interface used for scripting. It's the browser that deals with the creation of the nodes for the initial loading of the HTML document.
We also have the Browser Object Model (BOM), which allows our JavaScript code to interact with the browser. The BOM is a collection of browser objects. The top-level object in the BOM is the window
object, which represents an open browser window. Examples of browser objects are navigator
, screen
, history
and also the DOM (document
), they are all accessible via the window
object. Unlike the DOM, the BOM does not have a standard for implementation and is not strictly defined, which subsequently means that the BOM varies depending on which browser you use. Furthermore, the window
object is where all globals are put into. But now we'll focus on the DOM by starting at looking at what a node is.
What is a Node?
All node objects inherit the Node interface. This interface has the essential properties and methods for manipulating, inspecting, and traversing the DOM. So, there are different kinds of nodes, examples are Document, Element, CharacterData, DocumentFragment. They all inherit from Node, but they also have their unique attributes. You can use nodeType
and nodeName
when you want to know the available properties on a node:
// <p>hello</p> const p = document.getElementsByTagName('p')[0]; p.nodeType; // 1 === Node.ELEMENT_NODE p.nodeName; // P
1. Document Node
The DOM is itself a node object, a Document node. The Document node can hold two different types of nodes. The first being, the DocumentType
node object which represents a DocType (<!DOCTYPE>
). The DocType declares the markup language and what version of it that the document uses. The second node that the Document node holds is an ElementType
node object.
<!DOCTYPE html> <html></html>
We can get the references to these nodes easily via document
, like document.doctype
, document.documentElement
, document.body
and so on. Lastly, even though the browser parses our HTML on the initial load, we can still create nodes after with document.createElement(tag: string)
.
2. Element Node
There are many different types of Element
nodes like html
, body
, span
, h1
, and the list goes on. These elements can have attributes, for example with <div id="1" class="my-class"></div>
, we have two attributes which are id
and class
. The attributes themselves are Attr
nodes, we can access the attributes by using the .attributes
property. If you want to get, set or remove an attribute you can use the following:
Element.getAttribute(attrName: string)
Element.setAttribute(attrName: string, attrValue: string)
Element.removeAttribute(attrName: string)
Element.hasAttribute(attrName: string)
- Or just use the
.attributes
property directly.
Furthermore you've probably sometime used more than one class for an element, like <span class="class1 class2"></span>
. When working with the class
attribute you can use Element.classList
to get the classes in an object like { length: 2, value: "class1 class2", 0: "class1", 1: "class2" }
, or simply do Element.className
to get "class1 class2"
. You also have the following add and remove methods when working with classes:
Element.classList.add(className: string)
Element.classList.remove(className: string)
Element.classList.toggle(className: string)
Element.classList.contains(className: string)
3. Text Node
The text scattered around in your HTML-code will be parsed into Text
node objects. This also applies to whitespaces, since it's a character. So <div> </div>
would, for example, have a child node that's a text node. Furthermore, when you use Element.textContent
it'll concatenate all the text nodes within that element (including the children's text nodes, etc) and return it as a string. You can also use Element.textContent
to set a new single text node while also removing all other text nodes.
On a side note, innerText
is similar to textContent
, but it ignores the text if it's hidden by CSS or inside script or style tags, while textContent does not.
How to create, insert, replace, remove and clone Nodes
The properties/methods innerHTML
, outerHTML
, textContent
, and insertAdjacentHTML()
allows us to use strings when adding elements to the DOM. However, a word of caution when doing so is that some of these methods invoke an expensive HTML parsing process. The following code would remove all the content in your body
tag and then add some content:
document.body.innerHTML = ''; document.body.innerHTML += 'add1'; // add1 document.body.innerHTML += 'add2'; // add1add2
We also have the appendChild()
and insertBefore()
methods that we can use to add a node object as a child to another node object. In the example below, we create an h1
object, adding text to it and then appending it to the body
tag:
let h1 = document.createElement('h1'); h1.innerText = 'hi all'; document.body.appendChild(h1);
Then we have the methods, removeChild()
and replaceChild()
which both do just as their name implies. The methods return the reference to the node object, so the node gets removed from its parent, not memory. These methods can be somewhat tricky to invoke since we pass the reference of the node object to the methods, but the methods are invoked on the parent, like parentOfNodeObj.removeChild(referenceToNodeObj)
.
… <body> <h1>hello</h1> <p>world</p> <script> let h1 = document.body.firstChild; h1.parentNode.removeChild(h1); // same as document.body.removeChild(h1) </script> ...
To clone a node you simply do let newNode = oldNode.clone()
. If you also want to clone its children you must pass true
, like let newNode = oldNode.clone(true)
. A word of caution, since the attributes and values are copied you might encounter duplicates of element IDs in a document.
Selecting and Traversing Element Nodes
When we want to get a single element we can use getElementById
or querySelector()
. The querySelector()
has the following characteristics:
- Returns the first node that’s found.
- You pass it a CSS3 selector.
- Can be used like
Element.querySelector()
, which means it only searches that particular part of the DOM tree.
As you see, they return the first found node. So, how do we select multiple nodes? Well, the following methods can help us out:
querySelectorAll(CSSselector: string)
getElementsByTagName(tag: string)
getElementsByClassName(className: string)
These methods create a list containing the elements. But be aware that these can cause unexpected behaviours, querySelectorAll()
returns a snapshot of the current state, while the other two returns lists that always represent the current state of the DOM. Study the following code:
// Creating and setting up our Element node let div = document.createElement('div'); div.setAttribute('class', 'classValue'); div.appendChild(document.createTextNode('hello')); // Adding the div to the body document.body.appendChild(div); // Getting the div in two lists let queryList = document.querySelectorAll('.classValue'); let classList = document.getElementsByClassName('classValue'); // We get a list back containing our div console.log(queryList); // NodeList [div.classValue] console.log(classList); // HTMLCollection [div.classValue] // Remove the div from the DOM document.body.removeChild(div); // As we see queryList contains a snapshot while classList the current state console.log(queryList); // NodeList [div.classValue] console.log(classList); // HTMLCollection []
NodeList vs HTMLCollection
When trying to get multiple elements, you'll likely encounter NodeList and HTMLCollection. They are both read-only array-like objects, in other words, both are collections of DOM nodes. They are similar but differ slightly, for example, HTMLCollection only contains elements while NodeList may contain other nodes than element, but rarely does. It's important to know that a NodeList can both be live and static. If it's live it will update according to the DOM's state, if it's static it'll be a snapshot of the current state of the DOM.
Traversing the DOM with Node properties
Properties like childNodes
, firstChild
, and nextSibling
gives a way to traverse (travel around in) the DOM tree. A word of caution, traversing with these properties will include text and comment nodes (non-element nodes) which can cause unexpected behaviours. You can avoid traversing text and comment nodes by using firstElementChild
, lastElementChild
, nextElementChild
, previousElementChild
, childElementCount
, children
, and parentElement
.
Cascading Style Sheets Object Model (CSSOM)
CSS stands for Cascading Style Sheets and is a language that describes the visual representation of our HTML elements. Our CSS gets parsed into a tree called CSSOM which could look something like this:
Now, the process of both fetching and parsing the CSS is render-blocking because the CSSOM is needed to create a render tree (more on this later). To decide which CSS to parse, we can use three different ways, inline, internal/embedded, or external.
1. Inline
Inline styling entails passing a string containing the CSS to the attribute style
on HTML elements:
<div style="color: green;"></div>
2. Internal/embedded
Internal stylesheets are embedded in a <head>
tag with <style>
(HTMLStyleElement
):
<html> <head> <style> … style here … </style> </head> </html>
3. External
External stylesheets are files containing CSS that usually have a.css
extension. Externals require downloading, which increases the render-blocking aspect. To include external files in our document, we use the HTMLLinkElement
like:
<html> <head> <link href="/style.css" rel="stylesheet" type="text/css"> </head> </html>
Cascading, Specificity, Inheritance
One of the reasons why we have to create the entire CSSOM, other than flash of unstyled content (FOUC), is because of the complexity that the browser must handle due to cascading, specificity and inheritance. Cascading is an algorithm that combines stylesheets into one style. The algorithm uses the priority of the stylesheets to know which rules apply if a conflict occurs. Note that there are more stylesheets than just the ones you define, there may be three stylesheets:
- user-agent (browser's default style, think headings)
- author (you the developer)
- user (the user can override and customize)
Furthermore, we have Inheritance, which entails that the children can inherit style from its parent, the style flows down in the CSSOM tree. Lastly, specificity is all about determining which style should be applied based on how specific the rule is. Our CSS selectors are prioritized in the following manner: ID > CLASS > TAG
. We can annotate this with 0:0:0
look at the following example:
<!DOCTYPE html> <html> <head> <style> h1 { font-size: 10px; } /* 0:0:1 */ .className { font-size: 20px; } /* 0:1:0 */ .className .className2 { font-size: 25px; } /* 0:2:0 */ #myId { font-size: 30px; } /* 1:0:0 */ </style> </head> <body> <h1 id="myId" class="className className2">hello</h1> </body> </html>
In the example above, our h1 element will get a font size of 30px. Lastly, it's also good to know that inline styles override all stylesheets, both external and internal (unless you've used !important
). However, it's often recommended not to use inline style, IDs, or !important
for maintainability purposes.
Interacting with CSSOM
CSSOM also gives us APIs to find out things like the size
and position
of our elements. It's possible to interact with the CSSOM via JavaScript by accessing the style property on HTML elements. But the style property only contains the inline CSS that's defined via the element's style attribute. Luckily, you can use window.getComputedStyle(el: element)
to get a CSSStylesheet
object that contains both the inline CSS and the CSS from the cascade. TheCSSStylesheet
object itself contains CSSStyleRule
objects that you can manipulate with CSSStylesheet.insertRule()
and CSSStylesheet.deleteRule()
but that's very uncommon. Lastly, since each stylesheet corresponds to a CSSStylesheet
object you can disable and enable them by toggling their disabled
boolean property.
Examples of using the style property:
// get specific element.style.height; // set element.style.height = '100px';
Loading and Executing JavaScript
JavaScript lets us manipulate both the DOM and CSSOM. JavaScript is parsing-blocking, which means that when the parser encounters a <script>
tag, it'll stop the construction of the DOM. The following will happen:
- Stop the construction of DOM.
- Fetch the JS code if external.
- Construct the CSSOM if not constructed (CSS is script blocking).
- Execute the JS code.
- Resume the construction of DOM.
As you see JavaScript can block parsing, so we must carefully consider where we put our script tags. We can add JavaScript in three ways, external, element inline, and page inline. Now, on a side note, it might seem strange that the CSSOM must be created before JavaScript can be executed. But this is because we might try to access style that has not been defined yet, like document.body.style
.
1. External
We have to fetch external JavaScript files, which increases the blocking time. We use the src
attribute on a script tag to specify where the file is located. Note that all code inside the script tag is ignored when using the src
attribute.
<script src="app.js"></script>
With externals, we can use the defer
and async
attributes. Both will tell the browser to continue to construct the DOM and load the script in the background. What separates them is that defer
will execute once the enclosing </html>
is parsed, while async
will execute the code right away when it has been fetched.
2. Element inline
Adding JavaScript with element inline entails using the event handler attribute:
<div onclick="code"></div>
3. Page inline
If you do not use the src
attribute on the script tag, then you can add your JavaScript code between the tag.
<script> console.log('JavaScript code goes here!'); </script>
Critical Rendering Path (CRP)
The Critical Rendering Path is, in essence, all the steps required by the browser to create pixels into the screen. So far, we’ve covered the DOM, CSSOM, and how JavaScript is loaded and executed. They are all part of the Critical Rendering Path. The image below shows the pipeline for rendering a page:
As we see the browser uses the constructed CSSOM and DOM to create a render tree. The render tree only contains the visible nodes, which means we exclude script tags, meta tags, elements whose style makes them invisible, etc. Once a render tree is created, the layout is computed, followed by the actual paint. Our goal should be to make this process as smooth and quick as possible. This is why we need to be aware of render-blocking resources.
In the code below, we are using an external stylesheet and an external script:
<!DOCTYPE html> <html> <head> <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css" rel="stylesheet"> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script> </head> <body> <h1>Hello</h1> <p>World</p> </body> </html>
I audited the code with lighthouse multiple times with three different cases to measure how long the first paint takes. The results where:
- By changing nothing, it gave me 2.1s.
- By having
defer
on the script tag, it gave me 1.5s. - By having
defer
on the script tag and removing the stylesheet, it gave me 0.8s.
As we see, just by adding defer, we made our first paint 600ms quicker. By removing bootstrap, we made it even quicker. It might seem obvious, but not sending unnecessary bytes can have a massive impact on your website's page load.
Closing remarks
We've learned that it matters how we structure our HTML elements. A common phrase is, "put the CSS at the top and the script at the bottom", which hopefully makes sense to you now. After looking at how the browser renders our HTML, CSS, and JavaScript, we can put them into three main layers:
- Structure (HTML)
- Presentation (CSS)
- Behaviour (JavaScript + DOM + CSSOM)
In this article, we've looked at how the CRP correlates to a quick first paint (FP). But CRP also correlates to achieving high FPS, which might seem strange because we are creating websites, not "real games". But having low FPS causes page jank, which leads to bad user experience. Nevertheless, now you should have a good foundation on browser rendering and how to work with the DOM.