Proof your thousand-line pull requests result in more bugs

May 14, 2020

As developers we instinctively know that large pull requests are not great. The bigger the change, the harder it is to inspect, the longer it takes to review, and the greater the chance bugs will slip through. And let’s not forget merge conflicts, or the pain and suffering huge pull requests inflict on the reviewer.

In an ideal world we should aim to merge small changes, often.

But exactly how large is too large? Is there anything we can use as a guide to tell us when our change is getting too big? Literally, what is the maximum number of lines of code our change should be?

Let’s look at the evidence…

Turns out there have been a number of studies⁽¹⁾ looking at the effectiveness of code review, and one of the findings common across a lot of them is that after around 60 to 90 minutes of review, the rate at which defects are discovered (bugs, or aspects of the code that could be improved) starts to fall. i.e. after an hour or so we become less effective at spotting problems.

There are a number of reasons why this might be, but the most obvious is that the longer we stare at something, the more blind to it we become. After an hour of looking at the same thing our brain becomes saturated and less effective at spotting issues.

So if we can conclude that we should spend at most an hour to 90 minutes reviewing a change, the next question is: how much code can we effectively review in that time frame?

Lines of Code…

Another 2006 study⁽²⁾ carried out by Smartbear at Cisco found that there was a link between the size of the change under review and the density of defects found. Below 200 lines of change, the density of defects discovered (defects per lines of code) was highest, and generally the more lines under review, the fewer defects per line of code where discovered.

And when the defects discovered per hour was measured against the size of the change they found the rate of defect discovery starts to fall above 300 lines, with a significant tailing off above 500 LOC.

Based on this data and the finding that review efficiency falls after 90 minutes, they concluded that a reviewer will be most effective reviewing no more than 400 lines of code in one go. Above 400 lines the rate of defect detection starts to fall and the chance that bugs will slip through increases significantly.

So there you have it. If you want to minimise the amount of bugs you ship, keep your pull requests below 400 lines.

Hold on one second…

Of course, it’s not quite that simple. This number is based on data specific to the codebase and team in the study. Depending on the particular dynamics of your codebase and team, the actual number that applies to you could be quite different.

But this study and the others it drew from provide us with a good indication that keeping the size of our changes down will result in fewer bugs.

How to keep your pull requests small

Sometimes the thing we’re working on is too big or complicated to realistically ship in a single 400 line change. So what can we do in these situations to keep our PRs small?

Here are a few suggestions for ways you can split larger pieces of work into smaller pull requests:

Split refactoring work into separate PRs that lay the groundwork for any new behaviour. Any refactoring work should preserve existing behaviour, which means it can reviewed and merged without impacting the end user.
Keep unrelated changes out of your feature branches. If you spot something unrelated to the work at hand and want to fix it, either note it down to deal with later, or if it needs dealing with now, make the change in a fresh branch and get it reviewed separately.
Can you split or slice the scope of the thing you are working on in such a way as to release it in smaller incremental steps?
For big, complex features that are likely to take some time to develop, use feature switches to enable you to progressively merge and release the changes in chunks whilst still keeping it hidden from end users until it’s ready.
If all else fails, make sure to create small, atomic commits that do one thing. That way a reviewer will have the option to tackle each commit individually as a way to reduce the scope of the change they are reviewing at one time.

References

1: Object-oriented inspection in the face of delocalisation, 2000, Alastair Dunsmore, Marc Roper, Murray Ian Wood; Further investigations into the development and evaluation of reading techniques for object-oriented code inspection, 2002 Alastair Dunsmore, Marc Roper, Murray Ian Wood; A Case Study of Code Inspection, 1991, Frank W. Blakely, Mark E. Boles.

2: Unfortunately the Smartbear study findings are behind an email capture form, but it can be downloaded here.

tekin.co.uk