Data Visualization Fundamentals
by Nicklas EnvallData is a collection of information that may vary in form like, text, numbers, images or audio. Data can be in its original form (unprocessed) which we refer to as raw. When the data is analyzed and transformed we call it processed. Information can be collected with different methods such as surveys or tracking software and many more ways. The approach to collection, storage and sharing of data can be ethical or unethical, in some cases it can even be illegal.
We often use data to make decisions and predictions. But this requires being able to interpret the data, and once it has been analyzed we may want to communicate it clearly to others. This can be done by visualizing the data to make it more accessible and digestible.
In this article we'll go through the basics of data visualization and how we as developers can approach it. The focus audience is people who are using JavaScript. It covers the following:
- Graphical formats & Datasets
- Behind the libraries: SVG, Canvas, and WebGL
- Keep it simple with Libraries
Graphical formats & Datasets
Data visualization is a process of representing data in graphical formats like:
- Line graph
- Bar chart
- Pie chart
- Scatter plots
- Histograms
- Maps
They differ but are common at least one way and that is that they all need data. So let's shift our focus to how different diagrams data can look like. Let's define our datasets.
1. Pie Chart Dataset
The dataset for our pie chart is a collection of companies and their market share in percent.
Example of how the data can look like with JSON as data format:
[ { "companyName": "company A", "marketShare": 40 }, { "companyName": "company B", "marketShare": 30 }, { "companyName": "company C", "marketShare": 20 }, { "companyName": "company D", "marketShare": 10 } ]
2. Bar Chart Dataset
The dataset for our bar chart is a collection of quartely sales figures for each restaurant this year.
Example of how the data can look like with JSON as data format:
[ { "quarter": "Q1", "restaurant": "Luxury Food", "salesInDollar": 300000 }, { "quarter": "Q1", "restaurant": "Sun Café", "salesInDollar": 10000 }, { "quarter": "Q1", "restaurant": "Korvmoj", "salesInDollar": 5000 }, { "quarter": "Q2", "restaurant": "Luxury Food", "salesInDollar": 400000 }, { "quarter": "Q2", "restaurant": "Sun Café", "salesInDollar": 20000 }, { "quarter": "Q2", "restaurant": "Korvmoj", "salesInDollar": 6000 }, { "quarter": "Q3", "restaurant": "Luxury Food", "salesInDollar": 500000 }, { "quarter": "Q3", "restaurant": "Sun Café", "salesInDollar": 30000 }, { "quarter": "Q3", "restaurant": "Korvmoj", "salesInDollar": 6000 }, { "quarter": "Q4", "restaurant": "Luxury Food", "salesInDollar": 700000 }, { "quarter": "Q4", "restaurant": "Sun Café", "salesInDollar": 30000 }, { "quarter": "Q4", "restaurant": "Korvmoj", "salesInDollar": 100000 } ]
3. Line Graph Dataset
The dataset for our line graph is a collection of how many visitors all the restaurants combined have each month.
Example of how the data can look like with JSON as data format:
[ { "month": "January", "visitors": 10000 }, { "month": "February", "visitors": 40000 }, { "month": "March", "visitors": 30000 }, { "month": "April", "visitors": 20000 }, { "month": "May", "visitors": 60000 }, { "month": "June", "visitors": 60000 }, { "month": "July", "visitors": 70000 }, { "month": "August", "visitors": 20000 }, { "month": "September", "visitors": 40000 }, { "month": "October", "visitors": 50000 }, { "month": "November", "visitors": 90000 }, { "month": "December", "visitors": 80000 } ]
Behind the libraries: SVG, Canvas, and WebGL
Before we continue to visualize our data we will briefly look at SVG, Canvas, and WebGL to understand the technologies used by most libraries for data visualization.
SVG
SVG stands for Scalable Vector Graphics. The first draft for SVG got released in in 1999-02-11 according to the World Wide Web Consortium (W3C). SVG is XML-based unlike GIF, PNG, and JPEG, which rely on bitmaps (raster format). In other words, SVGs utilize mathematical equations to define shapes while raster graphics rely on a grid of pixels to define an image. Consequently, this comes with numerous benefits such as, scalability, file size, editability, interactivity, and accessibility.
Now, to break it down, since SVGs use mathematical equations it allows them to scale without losing quality, this is not the case for bitmaps that have a fixed grid of pixels which results in the image getting blurry when the pixels are stretched, the images also lose details when shrunk because the pixels are getting combined. Furthermore, the higher resolution the raster image has, the more pixels are required for the bitmap which results in more memory. Beyond memory considerations, SVGs are also easy to edit manually because it's XML-based.
SVG elements are DOM elements which entails that they can be styled with CSS, and manipulated with JavaScript. This is not the case for bitmaps (note that GIFs are "just" a sequence of images). More importantly, this allows assistive technologies to interact with the elements, which results in greater accessibility.
Here's an example of a circle in SVG:
<svg viewBox="0 0 100 100" xmlns="http://www.w3.org/2000/svg"> <circle cx="50" cy="50" r="50" /> </svg>
Canvas
The <canvas>
element is part of the HTML5-specification. It was created by Apple in 2004 to amongst other things enable graphics rendering in their Safari browser. Later it got adopted by other browsers like Google Chrome and Firefox. It also got standardized by WHATWG (the Web Hypertext Application Technology Working Group). The specification concisely says:
"The canvas element provides scripts with a resolution-dependent bitmap canvas, which can be used for rendering graphs, game graphics, art, or other visual images on the fly."
In short, the <canvas>
element allows us to draw on it via a JavaScript API. Contrary to SVG, which is vector-based, <canvas>
is bitmap-based, which means that it's a fixed grid of pixels. Consequently, SVG elements are part of the DOM, while the content drawn on the <canvas>
is not. Only the <canvas>
element itself is part of the DOM because it's a bitmap rendering surface. Although you can add elements inside the <canvas>
element, it does not remove the fact that the content drawn is not part of the DOM. This results in some serious consequences for accessibility and SEO, as MDN explains:
"Canvas content is not exposed to accessibility tools as semantic HTML is. In general, you should avoid using canvas in an accessible website or app."
In summary, the benefits show when working with advanced 3D graphics or performance, since it does not need to re-render the DOM. However, this comes at a cost and that is mainly accessibility and SEO.
Just to get an idea of how the code looks, here's an example of new circles being created every 5 seconds that move up and down.
<!DOCTYPE html> <html> <head> <meta charset="UTF-8" /> <title>Canvas Example</title> </head> <body> <canvas height="500" width="500" style="border: solid 1px black;" id="circles"></canvas> <script> const canvas = document.getElementById("circles"); // Returns an object that exposes an API for drawing on the canvas. contextId specifies the desired API const ctx = canvas.getContext("2d"); const circles = []; const createCircle = () => { const circle = { x: Math.random() * canvas.width, // Random x position y: 0, // top color: "green", size: 10, speed: 10, direction: 1, }; circles.push(circle); }; const animateCircles = () => { ctx.clearRect(0, 0, canvas.width, canvas.height); circles.forEach((circle) => { circle.y += circle.speed * circle.direction; if (circle.y >= canvas.height) { circle.y -= circle.size; circle.direction = -1; } if (circle.y <= 0) { circle.y = circle.size; circle.direction = 1; } // Draw circle ctx.beginPath(); ctx.arc(circle.x, circle.y, circle.size, 0, 2 * Math.PI); ctx.fillStyle = circle.color; ctx.fill(); }); requestAnimationFrame(animateCircles); }; setInterval(() => createCircle(), 5000); animateCircles(); </script> </body> </html>
Note that for the <canvas>
element we have a context. The context is an object that provides an API required to draw and update graphics on the <canvas>
.
WebGL
WebGL is a web standard for a low-level 3D graphics API based on OpenGL ES. It enables rasterization, which is the process of converting an image in vector graphics (lines, shapes, and polygons) into a pixel-based image (bitmap) for display on the screen. By providing WebGL with code written in GLSL (Graphics Library Shader Language), which is a C/C++ like language, we can run code on the GPU. While writing in GLSL, we create shaders, which comes in two types: the vertex shader and the fragment shader. Together, these two shaders form a shader program, their job in simple terms are:
- Vertex Shaders: handle the positioning of 3D objects.
- Fragment Shaders: compute the color of each pixel.
By using the HTML5 <canvas>
element, we create a bitmap that we later draw on with the help of an API. We have a context for the <canvas>
element which is an object that provides the API needed to draw and update graphics on the <canvas>
. For instance, calling getContext("webgl2")
returns an object that implements the WebGL2RenderingContext interface. However, in most cases, you'll use a 3D library instead of working with the WebGL API directly. Three.js is an example of a library that primarily (not exclusively) uses WebGL to draw 3D content. Three.js makes it easier to create 3D content, by providing a user-friendly API that hides a lot of the complexity.
For this article, WebGL is out of scope as it's more relevant to games and advanced 3D graphics, which typically aren't part of web developers' day-to-day work. Instead, you're more likely interested in visualizations such as line graphs and pie charts. That said, libraries like Plotly.js actually use both SVG and WebGL, utilizing WebGL for faster rendering and advanced 3D graphics.
Keep it simple with Libraries
Libraries can save us time and effort, and that is especially true when it comes to data visualization. In this section, we'll explore three different libraries, D3.js, Chart.js, and Recharts to get an idea of how to work with libraries that assist us with data visualizations.
Let's start by creating an app where we can experiment with these libraries. We'll use create-vite which offers a wide array of templates that let us quickly start a project:
$ npm create vite@latest Ok to proceed? (y) y ✔ Project name: … vite-project ✔ Select a framework: › React ✔ Select a variant: › JavaScript + SWC $ cd vite-project $ npm install $ npm run dev
D3.js
D3.js (Data-Driven Documents) is a library used for creating data visualizations. It offers fine-grained control which enables more customization, unlike other libraries that provide pre-built charts. Instead of giving you standard charts with a limited API, D3.js gives you a "D3 toolbox" that allows you to glue everything together yourself from scratch. This gives you more freedom and control over both the appearance and behaviour of your visualizations.
I also appreciate that D3.js is divided into multiple libraries/modules, allowing you to only download what you actually need and use.
Now, let's create a pie chart. Start by installing d3-shape
which provides access to pie:
npm install d3-shape
Then, we create an arc generator which generates path data for an arc. Subsequently, to avoid having to calculate the startAngle
and endAngle
we'll use a pie generator:
import { pie, arc } from "d3-shape"; const data = [ { companyName: "company A", marketShare: 40, fill: "red" }, { companyName: "company B", marketShare: 30, fill: "green" }, { companyName: "company C", marketShare: 20, fill: "blue" }, { companyName: "company D", marketShare: 10, fill: "purple" }, ]; const pieGenerator = pie().value((arc) => arc.marketShare); const arcGenerator = arc().innerRadius(0).outerRadius(100).padAngle(0.02); const generatePaths = (data) => { const arcs = pieGenerator(data); // The resulting arcs is an array of objects return arcs.map((x, i) => ({ d: arcGenerator(x), fill: data[i].fill, })); }; function App() { const paths = generatePaths(data); return ( <svg width="250" height="250"> <g transform="translate(125, 125)"> {paths.map((path, i) => ( <path key={i} d={path.d} fill={path.fill} /> ))} </g> </svg> ); } export default App;
As mentioned before, D3.js offers a lot of customizability. For example, we can add effects to individual arcs:
import { useState } from "react"; import { pie, arc } from "d3-shape"; const data = [ { companyName: "company A", marketShare: 40, fill: "red" }, { companyName: "company B", marketShare: 30, fill: "green" }, { companyName: "company C", marketShare: 20, fill: "blue" }, { companyName: "company D", marketShare: 10, fill: "purple" }, ]; const pieGenerator = pie().value((arc) => arc.marketShare); const arcGenerator = arc().innerRadius(0).outerRadius(100).padAngle(0.02); const selectedArcGenerator = arc() .innerRadius(10) .outerRadius(110) .padAngle(0.2); const generatePaths = (data, selectedIndex) => { const arcs = pieGenerator(data); // The resulting arcs is an array of objects return arcs.map((x, i) => ({ d: selectedIndex === i ? selectedArcGenerator(x) : arcGenerator(x), fill: data[i].fill, })); }; function App() { const [mouseOverIndex, setMouseOverIndex] = useState(null); const paths = generatePaths(data, mouseOverIndex); return ( <svg width="250" height="250"> <g transform="translate(125, 125)"> {paths.map((path, i) => ( <path key={i} d={path.d} fill={path.fill} onMouseOver={() => setMouseOverIndex(i)} onMouseLeave={() => setMouseOverIndex(null)} /> ))} </g> </svg> ); } export default App;
D3.js provides greater flexibility but also introduces complexity. Consider whether creating visualizations from scratch is where your time should be spent. Additionally, ask yourself if it's easier for your team to maintain a codebase that utilizes a library with pre-built charts or one that enables custom-built charts. It truly depends on your needs; In most cases, I'd say that the former is the best alternative, although if the project requires a high level of customizability, then D3.js might be the right choice.
Chart.js
Chart.js is a library for simple HTML charts. It renders graphics on an <canvas>
element, unlike most other libraries who are SVG-based. Consequently this leads to greater performance with larger datasets and complex visualizations.
Start by installing chart.js
followed by react-chartjs-2
for react components, read more about integration here:
npm install react-chartjs-2 chart.js
As we'll see in the code below, we will modify the original data to better suit the API's requirements. However, I recommend you to consider whether the mapping logic can be moved to the backend when you are working with your own data. In this case, I decided to do a simple .reduce
:
import { Bar } from "react-chartjs-2"; import { Chart as ChartJS, CategoryScale, LinearScale, BarElement, Title, Tooltip, Legend, } from "chart.js"; // Registering the chart elements for Chart.js ChartJS.register( CategoryScale, LinearScale, BarElement, Title, Tooltip, Legend ); const dataJson = [ { quarter: "Q1", restaurant: "Luxury Food", salesInDollar: 300000 }, { quarter: "Q1", restaurant: "Sun Café", salesInDollar: 10000 }, { quarter: "Q1", restaurant: "Korvmoj", salesInDollar: 5000 }, { quarter: "Q2", restaurant: "Luxury Food", salesInDollar: 400000 }, { quarter: "Q2", restaurant: "Sun Café", salesInDollar: 20000 }, { quarter: "Q2", restaurant: "Korvmoj", salesInDollar: 6000 }, { quarter: "Q3", restaurant: "Luxury Food", salesInDollar: 500000 }, { quarter: "Q3", restaurant: "Sun Café", salesInDollar: 30000 }, { quarter: "Q3", restaurant: "Korvmoj", salesInDollar: 6000 }, { quarter: "Q4", restaurant: "Luxury Food", salesInDollar: 700000 }, { quarter: "Q4", restaurant: "Sun Café", salesInDollar: 30000 }, { quarter: "Q4", restaurant: "Korvmoj", salesInDollar: 100000 }, ]; const INDEX_TO_COLOR = new Map([ [0, "red"], [1, "green"], [2, "blue"], ]); const datasets = dataJson.reduce((acc, restaurantData) => { acc[restaurantData.restaurant] ??= []; acc[restaurantData.restaurant].push(restaurantData.salesInDollar); return acc; }, {}); const data = { labels: ["Q1", "Q2", "Q3", "Q4"], datasets: Object.entries(datasets).map(([key, dataset], index) => ({ label: key, data: dataset, backgroundColor: INDEX_TO_COLOR.get(index), })), }; function App() { return ( <div> <Bar data={data} /> </div> ); } export default App;
A word of caution is that since Chart.js uses <canvas>
instead of SVGs, it most likely does not fulfill accessibility requirements for most web apps. Especially if you work in a country that adheres to EU regulations, such as the European Accessibility Act (EAA).
Recharts
Recharts is a charting library built for React. It's built on top of D3.js and provides pre-built charts via components.
Start by installing recharts
:
npm install recharts
The following code is a simple example of a line chart:
import { LineChart, Line, CartesianGrid, XAxis, YAxis, Tooltip, Legend, } from "recharts"; const data = [ { month: "January", visitors: 10000 }, { month: "February", visitors: 40000 }, { month: "March", visitors: 30000 }, { month: "April", visitors: 20000 }, { month: "May", visitors: 60000 }, { month: "June", visitors: 60000 }, { month: "July", visitors: 70000 }, { month: "August", visitors: 20000 }, { month: "September", visitors: 40000 }, { month: "October", visitors: 50000 }, { month: "November", visitors: 90000 }, { month: "December", visitors: 80000 }, ]; function App() { return ( <LineChart width={500} height={250} data={data} margin={{ top: 5, right: 30, left: 20, bottom: 5 }} > <CartesianGrid strokeDasharray="3 3" /> <XAxis dataKey="month" tickFormatter={(month) => month.slice(0, 3)} /> <YAxis /> <Tooltip /> <Legend /> <Line type="monotone" dataKey="visitors" stroke="green" /> </LineChart> ); } export default App;
It's possible to customize to some extent, for example I noticed that displaying full month names looked strange, so I used the tickFormatter
function to slice the name. In short, Recharts can be a time-saver, but it's limiting compared to D3.js.