Visual Regression Testing
What is it?
Visual regression testing is a form of testing in which you check in an automated fashion if the latest code changes you made had any unintended visual changes.
The perfect example of this is in my day job, I am building a component library which means we build base components, then build components out of the base elements. Over time to create bigger and bigger components, much the same as you would with lego. To make our lives easier we use Material UI for React (MUI) as a base. The reason we do this is that we don’t wish to reinvent the wheel, but we do need the components to look like the design system we have. The problem isn’t that we need a very specific bit of functionality from a button, we just need a button to look like our design system. As a result, we need to style MUI with our design system, which involves overriding the styles set by MUI with ours. This is done by editing a theme file, this file contains all the base components from MUI, giving you access to style each component as you need to.
The problem with this is if you have an input field that is used in two different form controls. Take for example a radio button with a heading and a checkbox with a heading. The headings in our system have different margins from the left, based on how we use them throughout our applications. This means I have to override these two inputs separately in order to not affect the other. This is done more or less with the specificity of CSS.
This is a perfect example of how you override the styles in MUI. Take line 12 for example. This is how you alter the `FormHelperText`’s margin when it’s inside the `FormGroup` component. If for example somewhere else in the theme I override the `marginLeft` property, it will alter all of them. This makes sense when you think about it, this is just CSS after all. However, when you onboard new developers they might not know this, or not understand how the component library works.
In React if I edit the CSS on one page, the likelihood that it affects a completely different page is small, because of our component base system. In MUI and component systems in general, you are altering the CSS that is used readily around the system. Changing the value of `marginLeft` can have untold changes around the system. It’s exactly the same if you have a component in your application that is used extensively. In one part of the application, you might override some styles to have it align a little differently than in another. This is where you get your visual regressions.
This is where Visual regression testing comes into play. There are a number of ways you can do this. I will focus on the simplest and in my mind the coolest one. A little bit of background, we use Storybook. This tool allows you to set up pages that demonstrate in an interactive playground how to use the components we build. The team at Storybook also created a tool called Chromatic. What Chromatic does, is with zero effort it takes each of your stories, it takes a pixel-perfect snapshot, much like a camera and then compares them between Git commits.
This results in a flow like so; Make changes, push up code, create PR, Chromatic runs all the stories again checking if each one matches what it previously did, if they don’t match you have to approve or reject the changes manually. This allows you to skip the manual “Yep that looks about right” on every component, and all of its variations. You let the computer do the hard work and then it tells you if anything changes. This makes it dead simple to see when you weren’t specific enough in your CSS, as a result, it catches many regressions you might have.
Why should I care?
In most cases, you might not actually care. If you make a trading application, it's probably unlikely that changing CSS in one part of the application changes it in all. However, you might change the spacing of a button, which changes how a layout in another part of the application looks. This goes back to the older days of the web, targeting a `p` tag is a bad idea as this can hit all `p` tags on the site. This immediately results in a regression that you might not pick up. As you can see this kind of tooling is not only vital for component libraries, but it’s extraordinary for a regular web application as well.
Gone are the days when designers come storming into the room yelling “WHO CHANGED THE ALIGNMENT OF THIS FORM?”. This gets picked up a PR time, where a designer can check, or simply other developers and yourself. Designers notice nigh on everything, but visual regression testing will be perfect. Designers will be much happier knowing that any slight deviations will be picked up early and before they make their way to production.
Making the most out of it
Making the most out of this tool is important. Much like all testing, it’s important to stay on top of this. To make it a critical part of each ticket, something easy to perform, with as little setup as possible. This is why I personally use Chromatic, I have tried other tools with little success. There are other options such as Cypress or Playwright. These are frontend automation testing/e2e testing tools, however, you can use them to test components. In my setup, I was unable to get them to work with our tooling and we relied so heavily on Storybook anyway that it made sense to use a zero-config tool like Chromatic. If you have Cypress already you could use their ‘Component’ testing, coupled with a package that takes snapshots, comparing them between commits. This gets you some of the way there, but Chromatic just takes the stories and runs a pixel-perfect match on them without any setup. Moving on from Cypress, you could also do a similar thing with Playwright, but again I struggled to get this working with Playwright, so your mileage may vary. I would like to note that Cypress and Playwright are primarily used for e2e testing, their component testing capabilities are new or only semi-supported.
Below are some examples of how you can see the differences between two instances of a component. In this example, I made a PR with some changes to the header component. Not only did it show me the differences between the header stories, but it also highlighted the differences that other stories had as a result of the header changes. This is tremendous in making sure a small change doesn’t cascade into many other areas.
Each time you create a PR you can have your stories tested and if they change you can make sure they should have changed. This goes a step further if you then use it with a designer. The designer can look at the end result of your PR, and approve or make suggestions based on anything they see. Each PR also gets a shareable link to send a version of that storybook to others. This could be used to send to a manager or a lead to verify the end result.
As you can see the benefits of testing are quite substantial. In my example, all I did was change the size of a button and “accidentally” added an extra g to the word log. These are all things that could easily happen as you go about your PR. A human struggle to notice this, in fact, if it's a header it might simply shift the content of the page down a few pixels. A computer can see this without an issue and at PR time it’s clearly highlighted to you.
When building libraries such as a component library, this kind of investment into visual regression testing is critical. Furthermore, it’s going to save hundreds of hours of developer time as well as make it easier for new developers to not break an existing design. In summary: Visual regression testing gives you the safety net much like other forms of testing do.