Errol Hassall

View Original

The cost of quality

Martin Fowler writes about the quality of software in the article "Is quality worth the cost". I tend to believe this notion from personal experience. Not only is the cost of quality free it actually pays you to do it. In fact the cost of quality is negative on any project you intend to stick around for more than 5 minutes. I will breakdown from my experience how this comes to be.

I will breakdown two different codebases that I have worked on that highlight the differences between good and bad quality software. These two projects are night and day different and are real examples, for obvious reasons I will call them Project A for the great codebase and Project B for the bad one.

Project A (The great codebase)

This was not always a great codebase, at first it was rough, we attempted to follow domain driven design which we did fairly well but it was more of a lack of understanding on some of the technologies we were using that lead us to a bit of a mess. At around the 3 month mark we noticed that the codebase was getting pretty rough, we had lots of tests which was great but the overall quality of the solution was poor. It was hard for new people to join and there was a big lack of conventions followed.

We decided that enough was enough and that we would spend 5-10% of each ticket refactoring something, anything to make the codebase just 1% better. Not only that we did some major work to in which we restructured the directory structure to follow a much easier pattern. These large refactors took about 2 weeks but it produced about 30% of the improvements which at first glance isn't huge given the time but it was the future savings that we were concerned about. This large refactor meant that we now had a stable structure to follow and that each time you had to create anything new you already had a place for it without thinking. We added the entire directory structure to the readme, we added an infrastructure diagram to the mix so that anyone could easily see how the entire thing was deployed.

Then began the incremental 1% improvements. We made it a rule team wide that every time you pick up a ticket you must improve something no matter how small at least a little. This could be as small as a comment, or a function name, or moving some logic out of a function that is too big to as big as breaking up a large file into smaller more specific ones. However, this was all possible because we had a great test structure, everything was tested from the top down, meaning that each major user flow was tested from the highest point (integration test) then we had excellent unit test coverage as well. This made it extremely easy to refactor anything because within a moments notice we knew exactly what was broken and what wasn't from our tests.

Fast forward, 6 months of 1% improvements and to this day that codebase is the closest thing to perfection that I have ever seen. None of us were the top 1% of developers, it was simply from having the one rule of refactor something no matter how small each ticket. In a regular codebase entropy slowly decays your solution until it becomes impossible to add anything new. On the other hand if you make it a habit to fight that entropy you leave the project in an amazing state that no matter who picks it up next it's easy for them to add new features, or maintain existing ones.

Project B

This codebase isn't the worst I've seen, in terms of the quality of the files, the breakdown of functions the separation of concerns. For the most part it was okay, some files were very large and could have been broken down further and others had code duplication but for the most part as someone who knows the codebase it was okay. It as fairly straightforward to understand what the code was doing and where it was going. However, if you were new to it, it could be pretty rough getting started.

So what are my issues with this project? well the testing, or the lack there of. We had 0 tests, well thats a lie we had 4 unit tests, so close enough to zero in a codebase approaching 100k lines long. This meant that adding anything to this giant ball of spaghetti would cause about 3 other things to break. It was expected that when you finished one feature of say 10 tickets in length you would get about 2-3 defects back mostly around something in the ticket not being completed. Thats okay, but in those 10 tickets you would pretty much bet on breaking another 4 areas of the codebase. It was extremely common to finish up a feature, 2 weeks go by (for manual QA to test) and right before release we would discover that some on related part of the site was broken or maybe not even unrelated it could have been something else on the page for example. A classic example was one feature broke about every week, one because the requirements would change but two because there wasn't a single test around it and it was extremely complicated. This meant that coming to the end of the project we would complete 1 feature a sprint if we were lucky and it wasn't a big feature. Compared to the start where we could finish multiple features in a sprint. It got so bad that we would end up spending 80% of a sprint on defects, of which 60% of them would be regressions of bugs we had already fixed. It was not uncommon to fix the same bug 5 times!

The project ran 100% longer than the initial deadline, this was a huge problem, because the client was extremely unimpressed that we would continue to miss deadlines and no matter how much work we put in we could never hit a deadline because all we did was fix bugs in features that kept breaking all the time. Now it was entirely our fault, we had to deal with interesting business decisions that made things significantly harder but the biggest issue was that we were never allowed to spend time fixing up anything and we were never allowed to write tests. As the project kept getting delayed, the development team kept getting blamed for the quality of the solution to which all we could say was "Add tests" but no matter what was said the response was always "we are already late we don't have time" here in lies the problem.

Testing is something you have to do from the start, at first it takes extra time to add tests, set everything up and factor into each ticket that you need time to add tests. However, after the first 2 months or so the codebase begins to decay as you add more and more features, but when you have tests you can easily see that adding x code broke y, you see that instantly. Furthermore, when it does come time and it always does where you need to refactor a large section or even a small section you know instantly that it broke something else. You know exactly what was broken and by who or what which makes it easy to fix and the turnaround time is quick.

Project B there was no way we could refactor anything because unless we tested every single section of the site we would most certainly add defects. This meant that no one was willing to refactor anything because you would add defects and then get blamed for missing the deadline. Which results in terrible codebases that are impossible to add anything too. If this project is to continue at this pace for the next 6 months the amount of features added will 10 times lower than it is now which is already 10 times lower than it was 6 months ago. The defect rate will be through the roof which means even breathing on the code will break something.

In retrospect

These two projects are night and day in their quality, after 12 months of project A it was dead easy to add any sort of functionality, even remove stuff because we knew exactly what was broken when we changed something and the simple philosophy of small incremental refactors meant that you could easily follow what was going on. Everything was broken up into small pieces so changing functions didn't affect large portions of other code.

Project B was impossible to add anything, something that previously worked was always broken no matter how careful you were, furthermore having manual QA meant that stuff fell through the cracks for weeks. Case in point, one feature was broken in 2 of the 7 sections of the project but no one picked it up for months because manual QA simply missed it, there were no tests to check that it worked and no automated QA to test that user flows worked.

You can't build quality software without tests and for anyone who says that the cost of quality software is not worth it, they are wrong. Not only is it cheaper to build with quality it is faster as well and if you need anymore convincing read this article by Martin Fowler he says it much better than I ever could and was the inspiration for this article.