Jekyll2023-08-08T16:25:50+00:00https://tekin.co.uk/atom.xmltekin.co.ukThis is the personal blog of Tekin Süleyman, Manchester UK based Ruby on Rails developer available for hire.tekinHow to use introspection to discover what is exhausting your ActiveRecord connection pool2023-08-06T00:00:00+00:002023-08-06T00:00:00+00:00https://tekin.co.uk/2023/08/introspecting-active-records-connection-pool<p>This week I wrote about the reasons why you might need an <a href="/2023/07/active-record-connection-timeout-errors-with-puma">ActiveRecord connection pool larger than the number of configured Puma threads</a>.</p>
<p>Since then <a href="https://ruby.social/@bensheldon">Ben Sheldon</a> has pointed out that apps running embedded job workers (for example Sidekiq in <a href="https://github.com/sidekiq/sidekiq/wiki/Embedding">embedded mode</a>, GoodJob in <a href="https://github.com/bensheldon/good_job#execute-jobs-async--in-process">async mode</a> or <a href="https://github.com/brandonhilkert/sucker_punch">Sucker Punch</a>) will also be creating extra threads and therefor may need a larger thread pool.</p>
<p>In this follow-up post I’m going to describe the technique I used to uncover the source of the connection pool contention for the app I’m working on, and how you can do the same if your seeing mysterious <code class="language-plaintext highlighter-rouge">ActiveRecord::ConnectionTimeoutError</code> exceptions and don’t know where they’re coming from.</p>
<h2 id="uncovering-the-source-of-activerecord-connection-pool-contention">Uncovering the source of ActiveRecord connection pool contention</h2>
<p>These connection timeouts were a long-standing mystery in our app. Although they didn’t happen with great frequency, they were happening often enough to warrant some further digging rather than letting them become another broken window.</p>
<p>To figure out what was going on I employed some introspection on the connection pool. Whenever a thread asks for a connection from ActiveRecord, it is assigned one from the connection pool. The connection itself stores a reference to the assigned thread as its <code class="language-plaintext highlighter-rouge">owner</code>. We can inspect the assigned threads in the connection pool to learn a bit more about them:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">.</span><span class="nf">connection_pool</span><span class="p">.</span><span class="nf">connections</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span> <span class="o">|</span><span class="n">connection</span><span class="o">|</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">owner</span><span class="p">.</span><span class="nf">present?</span> <span class="p">?</span> <span class="n">connection</span><span class="p">.</span><span class="nf">owner</span><span class="p">.</span><span class="nf">inspect</span> <span class="p">:</span> <span class="s2">"[UNUSED]"</span>
<span class="k">end</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div></div>
<p>You can try this in the rails console, although you’ll want to fire off a quick query first, otherwise there won’t be any connections assigned!</p>
<p>On the actual Puma web server, if connections are only assigned to Puma threads the output will look something like this:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#<Thread:0x00007f0bbab3b740@puma srv tp 003 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 sleep>
#<Thread:0x00007f0bbab3b3d0@puma srv tp 004 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 run>
#<Thread:0x00007f0bbab3b8f8@puma srv tp 002 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 run>
#<Thread:0x00007f0bbab3bbc8@puma srv tp 001 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 sleep_forever>
[UNUSED]
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">inspect</code> output tells us a bit about each thread. The key part is the path that points at the line of code where the thread was started, which in this case is inside the Puma thread pool <a href="https://github.com/puma/puma/blob/d79f59d69dd91cd1ea401ad5e9051e74b1ce0ebf/lib/puma/thread_pool.rb#L106">here</a>.</p>
<p>To narrow down the search for the mystery threads using up our connections I wired this code into our bug tracker such that we’d log this debug output whenever an <code class="language-plaintext highlighter-rouge">ActiveRecord::ConnectionTimeoutError</code> was raised:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Bugsnag</span><span class="p">.</span><span class="nf">configure</span> <span class="k">do</span> <span class="o">|</span><span class="n">config</span><span class="o">|</span>
<span class="n">config</span><span class="p">.</span><span class="nf">add_on_error</span><span class="p">(</span><span class="nb">proc</span> <span class="k">do</span> <span class="o">|</span><span class="n">event</span><span class="o">|</span>
<span class="k">if</span> <span class="n">event</span><span class="p">.</span><span class="nf">errors</span><span class="p">.</span><span class="nf">first</span><span class="p">.</span><span class="nf">error_class</span> <span class="o">==</span> <span class="s2">"ActiveRecord::ConnectionTimeoutError"</span>
<span class="n">connection_pool_info</span> <span class="o">=</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">.</span><span class="nf">connection_pool</span><span class="p">.</span><span class="nf">connections</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span> <span class="o">|</span><span class="n">connection</span><span class="o">|</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">owner</span><span class="p">.</span><span class="nf">present?</span> <span class="p">?</span> <span class="n">connection</span><span class="p">.</span><span class="nf">owner</span><span class="p">.</span><span class="nf">inspect</span> <span class="p">:</span> <span class="s2">"[UNUSED]"</span>
<span class="k">end</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="s2">"</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="n">event</span><span class="p">.</span><span class="nf">add_metadata</span><span class="p">(</span><span class="ss">:app</span><span class="p">,</span> <span class="ss">:connection_pool_info</span><span class="p">,</span> <span class="n">connection_pool_info</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>It didn’t take long for the culprit to materialise:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#<Thread:0x00007f32eca13ad0@puma srv tp 002 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 sleep_forever>
#<Thread:0x00007f32f23cd6e8 /app/.../gems/actionpack-7.0.5.1/lib/action_controller/metal/live.rb:341 sleep>
#<Thread:0x00007f32eca12a40@puma srv tp 003 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 sleep_forever>
#<Thread:0x00007f32f2936288 /app/.../gems/actionpack-7.0.5.1/lib/action_controller/metal/live.rb:341 sleep>
#<Thread:0x00007f32eca13e40@puma srv tp 001 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 run>
</code></pre></div></div>
<p>We can see that two of the connections are assigned to threads spawned by <code class="language-plaintext highlighter-rouge">ActionController::Metal::Live</code> <a href="https://github.com/rails/rails/blob/fabd0b5827a3af1f189d726fbc7669f9fbdeef5e/actionpack/lib/action_controller/metal/live.rb#L341">here</a>.</p>
<p>This was the missing piece of the puzzle. It didn’t take much tracing of code to discover that it was <code class="language-plaintext highlighter-rouge">ActiveStorage</code> proxy requests that spin up extra threads for streaming responses, and it was these threads that were putting the extra pressure on our connection pool and causing the sporadic timeout errors.</p>
<h2 id="introspection-is-good-actually">Introspection is good actually</h2>
<p>I wanted to write this post partly to serve as a pointer to others that might be seeing these connection timeouts, but also to make a broader point about the blurred lines between our code and the libraries our code depends on. Because when our app is running, it <em>all effectively becomes our code</em>. Getting comfortable diving into the source for gems you use will serve you well, and will sometimes be the only way to get to the origin of a strange bug, or unexpected behaviour.</p>
<p>One of my favourite ways to do this is using <a href="https://bundler.io/man/bundle-open.1.html"><code class="language-plaintext highlighter-rouge">bundle open</code></a>. It gets you to the literal gem code that your app will be using right there in your preferred editor, ready for exploring. You can even make changes to the gem code for a spot of <code class="language-plaintext highlighter-rouge">puts</code> debugging (although do so with care and always remember to undo your changes when you’re done!)</p>
<p>Aside: If you’re a VSCode user, you may want to set the <code class="language-plaintext highlighter-rouge">BUNDLER_EDITOR</code> environment variable to <code class="language-plaintext highlighter-rouge">"code -n"</code> to stop <code class="language-plaintext highlighter-rouge">bundle open</code> replacing your current project window.</p>Tekin SüleymanThis week I wrote about the reasons why you might need an ActiveRecord connection pool larger than the number of configured Puma threads.Why the advice to have a connection pool the same size as your Puma threads is (probably) wrong for you2023-07-31T00:00:00+00:002023-07-31T00:00:00+00:00https://tekin.co.uk/2023/07/active-record-connection-timeout-errors-with-puma<p>The <a href="https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server#database-connections">standard advice</a> goes: <a href="https://devcenter.heroku.com/articles/concurrency-and-database-connections#threaded-servers">set your Rails database’s connection pool to have as many connections as you have Puma threads</a>. The idea being that you should only need as many connections as you have concurrent threads. This advice is coming from a good place, as most of the time you will be constrained on the number of connections you have available to your database.</p>
<p>The nuance missing from that advice is that it assumes no additional threads are ever spawned during your apps operation!</p>
<p>Now although you might know your code inside and out and be 100% certain that you don’t create additional threads anywhere in your code, chances are Rails is creating additional threads without you realising it…</p>
<h2 id="an-aside-on-threads-and-activerecords-connection-pool">An aside on threads and ActiveRecord’s connection pool</h2>
<p>On a basic level, the way ActiveRecord’s connection pool works is that it assigns each thread that asks for a connection its <a href="https://github.com/rails/rails/blob/35a614c227620a62d7a2a242e375a43e7e2affc5/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L178-L185">own separate connection</a>, which the thread then releases once it’s finished. Using a pool like this means that many threads can be querying the database at the same time, so many more requests can be processed in parallel.</p>
<p>If your app’s Puma config sets the maximum number of threads to 5, then you can normally expect there to be at most 5 threads asking for their own database connection, hence the advice to set the pool size to the same as size as the number of threads.</p>
<p>If however one of those threads creates another thread of its own, and that thread needs a database connection to hit the database, it will need its own connection. <strong>The spawned thread does not share the connection of the thread that spawned it</strong>.</p>
<p>So if we have threads that spawn their own threads, it’s possible to end up in a situation where there are many more threads wanting a connection then there are available connections, and you could end up seeing <code class="language-plaintext highlighter-rouge">ActiveRecord::ConnectionTimeoutError</code> exceptions being raised.</p>
<h2 id="where-your-rails-app-might-be-spinning-up-additional-threads-without-you-realising-it">Where your Rails app might be spinning up additional threads without you realising it</h2>
<p>So, back to your app. Even if you are not explicitly spawning your own threads, Rails itself could be spinning up additional threads without you realising. The main culprit here is <code class="language-plaintext highlighter-rouge">ActiveStorage</code>, or more specifically, <a href="https://edgeguides.rubyonrails.org/active_storage_overview.html#proxy-mode">ActiveStorage configured in proxy mode</a> (commonly used if you’re serving assets via a CDN).</p>
<h3 id="threads-created-by-activestorage">Threads created by ActiveStorage</h3>
<p>ActiveStorage’s two <a href="https://github.com/rails/rails/blob/main/activestorage/app/controllers/active_storage/blobs/proxy_controller.rb">proxy</a> <a href="https://github.com/rails/rails/blob/main/activestorage/app/controllers/active_storage/representations/proxy_controller.rb">controllers</a> return <em>streamed responses</em>, which (you guessed it) are <em>processed in their own threads!</em> So to handle a request for an <code class="language-plaintext highlighter-rouge">ActiveStorage</code> file via one of the proxy controllers, two threads will be called into action, both of which need their own connection from the ActiveRecord connection pool.</p>
<p>Normally (hopefully?) your app is processing requests fast enough for you to expect connections to be freed up from one of the other threads and become available for the streaming thread before a timeout occurs. And in this way, most of the time, the limited connections available in the pool can be shared between more threads than there are connections. The problems start if your app receives many successive/concurrent <code class="language-plaintext highlighter-rouge">ActiveStorage</code> proxy requests, each of which spin up an additional thread to stream a response, and they take a long time to complete their work and free up their connections; either because they’re doing expensive/slow work server-side (downloading a large file or processing an image representation for the first time), or they’re streaming the response to a slow client. At the time of writing at least, both threads hang on to their connection until the entire response is complete.</p>
<h3 id="threads-created-by-activerecords-load_async">Threads created by ActiveRecord’s load_async</h3>
<p>Another place Rails will spin up additional threads is with the new <a href="https://edgeapi.rubyonrails.org/classes/ActiveRecord/Relation.html#method-i-load_async">load_async</a> method. This was introduced in Rails 7 as a way to parallelise expensive database queries. It’s less likely that these are going to cause connection timeout errors as (again, hopefully?) the expensive queries aren’t so expensive that they hang onto connections for an excessive amount of time. That said, if you’re making heavy use of <code class="language-plaintext highlighter-rouge">load_async</code>, you could again end up in a situation where you have much higher demand for connections than there are available connections.</p>
<h2 id="so-what-should-you-do-about-all-this">So what should you do about all this?</h2>
<p>The upshot of all this is: if you’re making use of either <code class="language-plaintext highlighter-rouge">ActiveStorage</code> in proxy mode, or calling <code class="language-plaintext highlighter-rouge">load_async</code> in our app, you probably want a connection pool size that is higher than the number of configured Puma threads. The theoretical maximum number of connections you’ll need will be whatever Puma’s configured thread count is x 2, based on the assumption that each thread can potentially spawn one additional thread, but you might be able to get away with a smaller multiple depending on the performance and load characteristics of your app.</p>
<p>And if you’re close to reaching or exceeding the connection limit offered by your database, consider using a separate connection pool like <a href="http://www.pgbouncer.org">pgbouncer</a> (available as a <a href="https://github.com/heroku/heroku-buildpack-pgbouncer">Heroku buildpack</a>) to increase the number of connections you can configure.</p>
<h2 id="update">Update</h2>
<p>There’s now a <a href="/2023/08/introspecting-active-records-connection-pool">follow-up post</a> outlining one of the techniques I employed to help figure out why our connection pool was being exhausted in our Rails app.</p>Tekin SüleymanThe standard advice goes: set your Rails database’s connection pool to have as many connections as you have Puma threads. The idea being that you should only need as many connections as you have concurrent threads. This advice is coming from a good place, as most of the time you will be constrained on the number of connections you have available to your database.List your Git branches by recent activity2021-11-25T00:00:00+00:002021-11-25T00:00:00+00:00https://tekin.co.uk/2021/11/listing-most-recent-git-branches<p>Even if you’re diligent and regularly <a href="/2020/01/clean-up-your-git-branches-and-repositories">delete merged and stale
branches</a>
you may still find it hard to pick out a particular branch from the
alphabetically-sorted output of <code class="language-plaintext highlighter-rouge">git branch</code>. How about something
more useful, like seeing them listed based on their freshness?</p>
<p>The <code class="language-plaintext highlighter-rouge">git branch</code> command accepts a <code class="language-plaintext highlighter-rouge">--sort</code> option which we can use to
list our branches based on the last committer date:</p>
<pre class="terminal"><code> <strong>$ git branch --sort=-committerdate</strong>
* main
redact-abandoned
log-downloads
i18n-lint
</code></pre>
<p>We can also use the <code class="language-plaintext highlighter-rouge">--format</code> option to include the exact time and
see just how fresh each branch is:</p>
<pre class="terminal"><code> <strong>$ git branch --sort=-committerdate --format="%(committerdate)%09%(refname:short)"</strong>
Thu Nov 25 10:29:48 2021 +0000 main
Fri Nov 19 16:08:49 2021 +0000 redact-abandoned
Fri Nov 19 16:04:11 2021 +0000 log-downloads
Thu Nov 18 21:53:37 2021 +0000 i18n-lint
Sun Jul 18 12:49:18 2021 +0100 without-routing-key-overrides
</code></pre>
<p>Or for something more friendly and easy-to-parse we can ask for a relative date:</p>
<pre class="terminal"><code> <strong>$ git branch --sort=-committerdate --format="%(committerdate:relative)%09%(refname:short)"</strong>
5 hours ago main
6 days ago redact-abandoned
6 days ago log-downloads
7 days ago i18n-lint
4 months ago without-routing-key-overrides
</code></pre>
<p>That’s better! Let’s add this as a <code class="language-plaintext highlighter-rouge">git recent</code> alias to our Git config:</p>
<pre class="terminal"><code> <strong>$ git config --global alias.recent 'branch --sort=-committerdate --format="%(committerdate:relative)%09%(refname:short)"'</strong>
</code></pre>
<p>Hat tip to <a href="https://twitter.com/tenderlove/status/1392957802163802112">Tenderlove</a>
for this particularly tasty snippet.</p>
<p>Here are some other Git aliases you might find useful:</p>
<ul>
<li><a href="/2020/06/jump-from-a-git-commit-to-the-pr-in-one-command">Jumping from a commit SHA to the PR on GitHub</a></li>
<li><a href="/2020/01/git-alias-to-push-and-set-upstream-trackng-on-a-branch">Upstreaming and track your current branch</a></li>
<li><a href="/2020/01/git-alias-for-amending-your-last-commit">Amending your most recent commit</a></li>
</ul>Tekin SüleymanEven if you’re diligent and regularly delete merged and stale branches you may still find it hard to pick out a particular branch from the alphabetically-sorted output of git branch. How about something more useful, like seeing them listed based on their freshness?How focused commits make you a better coder2021-01-29T00:00:00+00:002021-01-29T00:00:00+00:00https://tekin.co.uk/2021/01/how-atomic-commits-make-you-a-better-coder<p>One of the core practices I encourage in developers when joining a new team is shaping changes into small, focused, atomic commits. For folks that are used to committing code in a haphazard, laissez faire manner this is sometimes dismissed as pedantic fussiness: if the code works, it works and that’s what’s important! Well I strongly disagree. Here I want to present a few of the ways taking the time to shape small focused commits help you to be a more effective developer and ship better software.</p>
<p>First things first, let’s acknowledge the fact that doing this well does take a certain level of tooling knowledge and practice. That means getting comfortable selectively staging changes (<code class="language-plaintext highlighter-rouge">git add --patch</code>) and revising your working history as you go (<code class="language-plaintext highlighter-rouge">git rebase --interactive</code>). But once you get the hang of it, it becomes second nature and <em>not</em> doing it will start to feel a bit icky, like skipping test coverage for a new piece of code, or not brushing your teeth before bed.</p>
<h2 id="1-slowing-down">1. Slowing down</h2>
<p>As well as tooling and practice, it also take a level of discipline. That’s because to understand how a piece of work might be delivered as small focused changes, you have to slow down and think about how it could be sliced up and delivered iteratively. This is actually the first benefit in disguise! Figuring out a plan for the work that breaks it down in terms of small, iterative steps is a useful strategy for making a large or difficult problem manageable.</p>
<p>I personally do this with a checklist of tasks that I write down before starting a piece of work. I then update and adjust the list as I progress through the work and my understanding grows and changes. The tasks don’t always map one-to-one to commits, but they often do.</p>
<p><img src="/images/20210129/todo-list.jpg" alt="A text file containing a todo list of tasks" class="post-image" /></p>
<p><a href="https://twitter.com/tomstuart">@tomstuart</a> talks more about this and other strategies for breaking down large and difficult problems in his talk <a href="https://www.youtube.com/watch?v=TdBELZG0UMY">Get Off the Tightrope</a>.</p>
<h2 id="2-taking-small-steps">2. Taking small steps</h2>
<p>Whilst figuring out a series of small, digestible steps to tackle a problem can help get over the anxiety of a large or complex task, it’s not always possible to figure out the whole plan before you start; Sometimes you just need to take that first step. You might not be able to see the whole picture yet, but you see one small change that will get you a little closer. A small, focused commit is the perfect way to “bank” some progress, capturing any insight in the commit message, before continuing to chip away at the problem. Countless times I’ve found myself solving a seemingly complicated and difficult task this way, with only a rough idea of where I’m going, but edging ever closer to the solution one small commit at a time until <em>poof!</em>, the problem is solved.</p>
<h2 id="3-banking-your-progress">3. Banking your progress</h2>
<p>I love the idea of banking progress in neatly packaged focused commits. It’s a bit like reaching a save point in a video game. You feel a sense of progress. It gives you a perfect reset point if you happen to get lost (<code class="language-plaintext highlighter-rouge">git reset --hard HEAD</code>). It also allows you to unload what you’ve learnt (in the commit message) and reset your working memory, freeing up brain cycles to tackle the next part of the challenge.</p>
<h2 id="4-spotting-problems-early">4. Spotting problems early</h2>
<p>Making small, focused changes that you then wrap up in a commit also helps you get ahead of bugs and unintended side-effects. A banked commit is a perfect opportunity to trigger a build and make sure the change hasn’t broken something elsewhere, rather than waiting until all the work is finished and potentially face an avalanche of unforeseen failures that could have stemmed from any number of changes. Spotting and resolving a bug or issue is much easier when the surface area of the change is small and focused vs a 1000-line Hail Mary commit of unrelated changes. Which leads nicely into…</p>
<h2 id="5-easier-debugging">5. Easier debugging</h2>
<p>With small focused commits you unlock the full power of Git’s secret bug-smashing tool: <code class="language-plaintext highlighter-rouge">git bisect</code>. I’ve not had to use it very often, but when I have it’s felt like a super-power for finding the source of regressions and bugs. bisect will still work without small focused commits, just not quite as effectively.</p>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">I’ve successfully taught another person `git bisect` and used it to find an obscure bug in minutes that surely would’ve taken hours/days to find otherwise. If you find yourself saying “this used to work…” in your code, use `git bisect`.</p>— Mitchell Hashimoto (@mitchellh) <a href="https://twitter.com/mitchellh/status/1352011054377697281?ref_src=twsrc%5Etfw">January 20, 2021</a></blockquote>
<p>More about git bisect over in <a href="https://git-scm.com/docs/git-bisect">the docs</a>.</p>
<h2 id="6-less-painful-merge-conflicts">6. Less painful merge conflicts</h2>
<p>This one only dawned on me recently: resolving merge conflicts becomes more straightforward when commits are small and focused. Perhaps I assumed I’d just got better at doing it (which is still partly true; the more you do it, the better you get at doing it) but on reflection I’d now attribute most of it to those small, focused commits. Because each commit is small and doing one thing, it’s more staightforward figuring out what the delta between the upstream change and the one being applied in the commit should be.</p>
<h2 id="7-writing-more-disposable-code">7. Writing more disposable code</h2>
<p>This one sounds weird, but changes that are shaped into focused atomic commits tend to be easier to revert cleanly, making code written that way easier to delete! You might need to do this whilst your still working on the feature, perhaps if you decide to change direction, or even much later in the future as the software evolves. As <a href="https://twitter.com/tef_ebooks">@tef_ebooks</a> puts it: <a href="https://programmingisterrible.com/post/139222674273/how-to-write-disposable-code-in-large-systems">Write code that is easy to delete</a>. Shaping changes into small and focused commits is one thing that will help you do that.</p>
<h2 id="8-bringing-forward-changes">8. Bringing forward changes</h2>
<p>Related to that, focused commits are also easier to pull out and use elsewhere. Maybe you’ve fixed a bug or performed a refactoring whilst still working on a larger piece of work. You can cherry-pick and ship those commits early to get the benefit now whilst also shrinking the size of your work-in-progress.</p>
<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Spent a couple days working methodically through a Rails 6 upgrade, banking small, focused commits as we went. Made for calm, steady progress, with most of the commits shippable against Rails 5, shrinking the size and risk of the actual upgrade. Small, focused commits, always.</p>— Tekin Süleyman (@tekin) <a href="https://twitter.com/tekin/status/1352215564090015745?ref_src=twsrc%5Etfw">January 21, 2021</a></blockquote>
<h2 id="9-better-code-review">9. Better code review</h2>
<p>Saving one of my favourites for (almost) last, small focused commits lead to an infinitely better code review process, both for you and the reviewer. The reviewer gets the option to tackle the review in smaller commit-sized chunks, making it easier to review and less likely bugs will slip through.</p>
<p><img src="/images/20210129/pull-request-review.png" alt="A pull request review that reads: Really nice flow of commits, very easy to follow despite working in a messier area of the codebase. Everything looks good, have just left a few observations." class="post-image" /></p>
<p>For an example of what I mean, imagine trying to understand all the changes in this <a href="https://github.com/DFE-Digital/claim-additional-payments-for-teaching/pull/597">pull request</a> in one go versus reviewing the changes commit by commit, with the commit messages as guiding commentary.</p>
<p>The reviewer’s feedback can also be more targeted, and because the changes are in focused commits, chances are you’ll have a simpler time incorporating any feedback into the existing commits (for those of you that aren’t afraid of or against rewriting the history of an open PR to avoid those “PR feedback” commits that add noise and distractions to the history).</p>
<h2 id="10-a-searchable-history-for-future-code-archaeologists">10. A searchable history for future code archaeologists</h2>
<p>I won’t go into this one too much as I cover it extensively in <a href="/2019/02/a-talk-about-revision-histories">A Branch in time (a story about revision histories)</a>, but small focused commits with well-written commit messages make the software more maintainable long-term as they provide a commentary to the code’s evolution that <a href="/2020/11/patterns-for-searching-git-revision-histories">can be searched</a> by future developers to better understand the software and the way it has changed over time.</p>
<h2 id="further-exploration">Further exploration</h2>
<p>I’ll be writing more on the specific techniques that you can use to shape focused commits in the future. In the meantime if you want to explore these ideas further I’d recommend you check out the following talks:</p>
<ul>
<li><a href="/2019/02/a-talk-about-revision-histories">A Branch in time (a story about revision histories)</a> a talk I’ve done illustrating some of these ideas</li>
<li><a href="https://www.youtube.com/watch?v=mE8DZUfhdm4">Simplify writing code with deliberate commits</a> by <a href="https://twitter.com/joelchippindale">@joelchippindale</a></li>
<li><a href="https://www.youtube.com/watch?v=FQ4IdcrOUz0">Getting more from Git</a> by <a href="https://twitter.com/alicebartlett">@alicebartlett</a></li>
</ul>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>Tekin SüleymanOne of the core practices I encourage in developers when joining a new team is shaping changes into small, focused, atomic commits. For folks that are used to committing code in a haphazard, laissez faire manner this is sometimes dismissed as pedantic fussiness: if the code works, it works and that’s what’s important! Well I strongly disagree. Here I want to present a few of the ways taking the time to shape small focused commits help you to be a more effective developer and ship better software.Why Git blame sucks for understanding WTF code (and what to use instead)2020-11-13T00:00:00+00:002020-11-13T00:00:00+00:00https://tekin.co.uk/2020/11/patterns-for-searching-git-revision-histories<p>You’re happily working your way through a codebase when you happen upon some code that makes you stop and think: What the…!? Maybe it’s a method that’s doing something surprising. Or perhaps it’s doing something completely unsurprising, it’s just doing it in a surprising way.</p>
<p><img src="/images/20201113/math-lady.jpg" alt="" class="post-image" /></p>
<h2 id="git-blame-to-the-rescue">Git blame to the rescue?</h2>
<p>When this happens you might instinctively reach for <code class="language-plaintext highlighter-rouge">git blame</code> to help you figure out what’s going on. After all, <code class="language-plaintext highlighter-rouge">git blame</code> gives you the commit that most recently touched the line, and often that’s enough to point you in the right direction. But often it isn’t and you’re none the wiser. That’s because:</p>
<ul>
<li><strong>git blame is too coarse</strong>: it reports against the whole line. If the most recent change isn’t related to the part of the line you’re interested, you’re out of luck.</li>
<li><strong>git blame is too shallow</strong>: it only reports a single change; the most recent one. The story of the particular piece of code you’re interested in may have evolved over several commits.</li>
<li><strong>git blame is too narrow</strong>: it only considers the file you are running blame against. The code you are interested in may also appear in other files, but to get the relevant commits on those you’ll need to run blame several times.</li>
</ul>
<p>If you only use <code class="language-plaintext highlighter-rouge">git blame</code> you’re limiting yourself to a one-dimensional perspective of the code you’re trying to understand. Wouldn’t it be better to view things in 3D!?</p>
<h2 id="a-more-powerful-way-to-trace-the-history-of-wtf-code">A more powerful way to trace the history of WTF code</h2>
<p>Thankfully Git has some pretty powerful search tools built right in. Let’s take a closer look at some of the tools at our disposal.</p>
<h2 id="find-all-commits-containing-a-particular-piece-code">Find all commits containing a particular piece code</h2>
<p>If <code class="language-plaintext highlighter-rouge">git blame</code> is entry-level history search, <code class="language-plaintext highlighter-rouge">git log -S</code> (also known as “the pickaxe”) is how you take things to the next level. It lets you search for all commits that contain a given string:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git log <span class="nt">-S</span> <span class="s2">"method_name"</span>
</code></pre></div></div>
<p>This will return all the commits that contain <code class="language-plaintext highlighter-rouge">method_name</code> across the entire repository, giving you a bespoke history of a piece of code across both multiple commits and files. You can use the pickaxe to examine the entire history of a particular snippet: all the places it’s been used; how it’s usage has changed; if it’s moved file; when it was first introduced, and more.</p>
<h2 id="see-the-diffs-alongside-the-commit-messages">See the diffs alongside the commit messages</h2>
<p>If you include the <code class="language-plaintext highlighter-rouge">-p</code> option (short for <code class="language-plaintext highlighter-rouge">--patch</code>) you get the full diff alongside the commit messages, giving you even more context and making it easier to spot the relevant changes:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log -S "method_name" -p
</code></pre></div></div>
<h2 id="find-the-commit-that-first-added-some-code">Find the commit that first added some code</h2>
<p>Say you want to find the first commit that introduced a particular class, method or snippet of code. You can use the pickaxe combined with the <code class="language-plaintext highlighter-rouge">--reverse</code> option to get the commits in reverse-chronological order so the commit where the code first appears is listed at the top:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log -S "method_name" -p --reverse
</code></pre></div></div>
<h2 id="find-when-a-piece-of-code-was-deleted">Find when a piece of code was deleted</h2>
<p>Because the pickaxe search finds both additions and deletions, you can even use it to find code that has since been deleted:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log -S "deleted code" -p
</code></pre></div></div>
<p>Assuming the snippet no longer exists in the codebase the first commit returned will be the one where it was removed.</p>
<h2 id="limit-the-scope-of-the-search">Limit the scope of the search</h2>
<p>Because we’re using <code class="language-plaintext highlighter-rouge">git log</code> you can also provide a path to focus the search to a given file or folder:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log -S "some code" -p app/models/user.rb
</code></pre></div></div>
<h2 id="search-deeper-with-a-regular-expression">Search deeper with a regular expression</h2>
<p>I mostly stick to the pickaxe because I find working with regular expressions cumbersome, but if you want to perform a more advanced search using a regular expression you can do so with <code class="language-plaintext highlighter-rouge">-G</code> instead:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log -G "REGEX HERE"
</code></pre></div></div>
<p>It’s also worth noting that searching with <code class="language-plaintext highlighter-rouge">-S</code> has a subtle limitation that <code class="language-plaintext highlighter-rouge">-G</code> does not: <code class="language-plaintext highlighter-rouge">-S</code> only shows commits that either added or removed the search string. If the string you are searching for moved within the same file but was otherwise unchanged, <code class="language-plaintext highlighter-rouge">-S</code> won’t include that commit, whereas <code class="language-plaintext highlighter-rouge">-G</code> will.</p>
<h2 id="search-the-commit-messages-themselves">Search the commit messages themselves</h2>
<p>Sometimes searching for literal code is too narrow, or the information you’re after isn’t actually in the code but in the commit messages themselves. In which case you can use <code class="language-plaintext highlighter-rouge">--grep</code> to search commit messages instead:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git log --grep "commit message search"
</code></pre></div></div>
<h2 id="in-summary">In summary</h2>
<p>Next time some code has you puzzled and you want to understand more about it, look beyond <code class="language-plaintext highlighter-rouge">git blame</code> and dig deeper into the history using the pickaxe and friends:</p>
<ul>
<li>Find the entire history of a snippet of code with <code class="language-plaintext highlighter-rouge">git log -S</code></li>
<li>Include <code class="language-plaintext highlighter-rouge">-p</code> to see the diff as well as the commit messages</li>
<li>Include <code class="language-plaintext highlighter-rouge">--reverse</code> to see the commit that introduced the code listed first</li>
<li>Scope search to specific folders or files by including a path</li>
<li>Search with a regular expression using <code class="language-plaintext highlighter-rouge">git log -G</code></li>
<li>Search commit messages using <code class="language-plaintext highlighter-rouge">git log --grep</code></li>
</ul>
<h2 id="epilogue">Epilogue</h2>
<p>Of course searching a codebase’s history will only be fruitful if the history itself is helpful. If the history hasn’t been constructed with atomic commits that have useful commit messages, searching it will likely be a frustrating and unhelpful experience.</p>
<p>To learn more about the how and why of putting together more useful histories, check out <a href="/2019/02/a-talk-about-revision-histories">A Branch in Time (a talk about revision histories)</a>.</p>Tekin SüleymanYou’re happily working your way through a codebase when you happen upon some code that makes you stop and think: What the…!? Maybe it’s a method that’s doing something surprising. Or perhaps it’s doing something completely unsurprising, it’s just doing it in a surprising way.Better Git diff output for Ruby, Python, Elixir, Go and more2020-10-16T00:00:00+00:002020-10-16T00:00:00+00:00https://tekin.co.uk/2020/10/better-git-diff-output-for-ruby-python-elixir-and-more<p>The regular Git users amongst you will be familiar with the diff output that breaks down into “hunks” like so:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">@@ -24,7 +24,7 @@</span> class TicketPdf
ApplicationController.render(
"tickets/index.html.haml",
layout: "tickets",
<span class="gd">- assigns: { tickets: tickets }
</span><span class="gi">+ assigns: { tickets: tickets, event_name: event_name }
</span> )
end
</code></pre></div></div>
<p>The first line (starting <code class="language-plaintext highlighter-rouge">@@</code>) is known as the hunk header, and is there to help orientate the change. It gives us the line numbers for the change (the numbers between the <code class="language-plaintext highlighter-rouge">@@..@@</code>), but also a textual description for the enclosing context where the change happened, in this example <code class="language-plaintext highlighter-rouge">"class TicketPdf"</code>. Git tries to figure out this enclosing context, whether it’s a function, module or class definition. For C-like languages it’s pretty good at this. But for the Ruby example above it’s failed to show us the immediate context, which is actually a method called <code class="language-plaintext highlighter-rouge">tickets_as_html</code>. That’s because out of the box Git isn’t able to recognise the Ruby syntax for a method definition, which would be <code class="language-plaintext highlighter-rouge">def ticket_as_html</code>.</p>
<p>What we really want to see is:</p>
<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">@@ -24,7 +24,7 @@</span> def tickets_as_html
ApplicationController.render(
"tickets/index.html.haml",
layout: "tickets",
<span class="gd">- assigns: { tickets: tickets }
</span><span class="gi">+ assigns: { tickets: tickets, event_name: event_name }
</span> )
end
</code></pre></div></div>
<p>And it’s not just Ruby where Git struggles to figure out the correct enclosing context. Many other programming languages and file formats also get short-changed when it comes to the hunk header context.</p>
<p>Thankfully, it’s not only possible to configure a custom regex specific to your language to help Git better orient itself, there’s even a pre-defined set of <a href="https://github.com/git/git/blob/master/userdiff.c">patterns for many languages and formats right there in Git</a>. All we have to do is tell Git which patterns to use for our file extensions.</p>
<p>We can do this by defining a <a href="https://git-scm.com/docs/gitattributes">gitattributes file</a> inside our repo that maps the Ruby file extensions to the diff pattern for Ruby:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">*</span>.rb <span class="nv">diff</span><span class="o">=</span>ruby
<span class="k">*</span>.rake <span class="nv">diff</span><span class="o">=</span>ruby
</code></pre></div></div>
<p>Some open source projects define their own <code class="language-plaintext highlighter-rouge">.gitattributes</code> file. There’s <a href="https://github.com/rails/rails/blob/master/.gitattributes">one in Rails</a>. There’s even <a href="https://github.com/git/git/blob/master/.gitattributes">one in the Git source</a> that enables the diff patterns for Perl and Python.</p>
<h2 id="configure-a-global-gitattributes-file">Configure a global .gitattributes file</h2>
<p>Instead of adding a <code class="language-plaintext highlighter-rouge">.gitattributes</code> file to every repo we can configure a global <code class="language-plaintext highlighter-rouge">.gitattributes</code> file. Just create a <code class="language-plaintext highlighter-rouge">.gitattributes</code> file in your home directory, fill it with all the file formats you are interested in and point Git at it:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git config <span class="nt">--global</span> core.attributesfile ~/.gitattributes
</code></pre></div></div>
<p>I’ve put together an <a href="https://gist.github.com/tekin/12500956bd56784728e490d8cef9cb81">example .gitattributes file</a> with some common file formats to get you started.</p>
<p>I have no idea why Git doesn’t have these file format patterns configured by default. Thanks to Tom whose <a href="https://twitter.com/tomstuart/status/1304401459069452290">exasperated tweet</a> brought this non-obvious feature to my attention.</p>
<h2 id="update-bonus-diff-pattern-for-rspec-users">Update: Bonus diff pattern for Rspec users</h2>
<p>Thanks to <a href="https://twitter.com/jryan727">@jryan727</a> for sharing a custom pattern for Rspec. Add this to your Git config:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[diff "rspec"]
xfuncname = "^[ \t]*((RSpec|describe|context|it|before|after|around|feature|scenario)[ \t].*)$"
</code></pre></div></div>
<p>And <code class="language-plaintext highlighter-rouge">*_spec.rb diff=rspec</code> to your <code class="language-plaintext highlighter-rouge">.gitattributes</code> file for improved hunk headers for Rspec files.</p>Tekin SüleymanThe regular Git users amongst you will be familiar with the diff output that breaks down into “hunks” like so:Exclude linting & formatting commits when running Git blame2020-09-17T00:00:00+00:002020-09-17T00:00:00+00:00https://tekin.co.uk/2020/09/ignore-linting-and-formatting-commits-when-running-git-blame<p>Automatic code linters and formatters are a handy way to get consistent
formatting across a codebase and avoid those painful bikeshedding arguments over
exactly where the curly braces should go. But running one on an existing
codebase has always come at a cost: the mess it creates in a commit history as
the formatting commits insert themselves all over the <code class="language-plaintext highlighter-rouge">git blame</code> output,
diverting you from the commits that made the last meaningful change. In fact
this is often cited as the main reason folks won’t run a formatter on older
codebases where the history is an important source of documentation.</p>
<p>Well as of Git 2.23 there is a way to configure a list of commits for <code class="language-plaintext highlighter-rouge">git blame</code>
to ignore: the <a href="https://git-scm.com/docs/git-blame#Documentation/git-blame.txt---ignore-revs-fileltfilegt"><code class="language-plaintext highlighter-rouge">--ignore-revs-file</code></a>`
option. Let’s see how this works…</p>
<h3 id="1-create-a-file-listing-all-the-formatting-commits">1. Create a file listing all the formatting commits</h3>
<p>First, create a file in your repository called something like <code class="language-plaintext highlighter-rouge">.git-blame-ignore-revs</code>
and add all the SHAs for the commits that you want to be ignored. If you’re not
sure which commits these are, you can use <code class="language-plaintext highlighter-rouge">git log</code>’s search functionality to
dig them out. For example <code class="language-plaintext highlighter-rouge">git log --grep "lint"</code> will list all the commits that
contain the word “lint” in the commit message. One gotcha to be aware of: you’ll
need to use the full commit SHAs, otherwise it won’t work.</p>
<h3 id="2-configure-your-local-repo-to-automatically-use-this-ignore-file">2. Configure your local repo to automatically use this ignore file</h3>
<p>Once you have this file, you can either use it explicitly every time you run
<code class="language-plaintext highlighter-rouge">git blame</code>:</p>
<pre class="terminal"><code> <strong>$ git blame some-file.rb --ignore-revs-file .git-blame-ignore-revs</strong>
</code></pre>
<p>Or configure Git to automatically use the ignore list:</p>
<pre class="terminal"><code> <strong>$ git config blame.ignoreRevsFile .git-blame-ignore-revs</strong>
</code></pre>
<p>Note this sets the configuration for your local copy of the repository. Your
work colleagues will need to do the same if they want to take advantage of the
ignore list.</p>
<p>So there you have it. A way that we can blame away without getting distracted by
those pesky formatting commits.</p>Tekin SüleymanAutomatic code linters and formatters are a handy way to get consistent formatting across a codebase and avoid those painful bikeshedding arguments over exactly where the curly braces should go. But running one on an existing codebase has always come at a cost: the mess it creates in a commit history as the formatting commits insert themselves all over the git blame output, diverting you from the commits that made the last meaningful change. In fact this is often cited as the main reason folks won’t run a formatter on older codebases where the history is an important source of documentation.A Git command to jump from a commit SHA to the PR on GitHub2020-06-11T00:00:00+00:002020-06-11T00:00:00+00:00https://tekin.co.uk/2020/06/jump-from-a-git-commit-to-the-pr-in-one-command<p>A <a href="https://blog.mocoso.co.uk/talks/2015/01/12/telling-stories-through-your-commits/">carefully constructed commit history</a> can be a goldmine of useful information for helping us understand our codebase. Anyone who has ever wondered <em>“but why!?”</em> and got the answer after running <code class="language-plaintext highlighter-rouge">git blame</code> and getting a well-written commit message will know what I mean.</p>
<p>But sometimes a single commit message isn’t enough and we want to see the wider context for a change. In such cases I find it can be helpful to view the original pull request that introduced it. And whilst it’s possible to jump to a pull request from a commit on the GitHub website, doing so is slow and cumbersome. I wanted a quick way to jump straight to a PR from a commit SHA by running a single command. Something like <code class="language-plaintext highlighter-rouge">git pr COMMIT-SHA</code>.</p>
<p>If that sounds useful to you, read on!</p>
<p><em>Note: the following information assumes your project uses <a href="https://guides.github.com/introduction/flow/">GitHub flow</a> to merge in changes with <strong>explicit merge commits</strong>. If you are using an implicit merge strategy or squash merges, then you’re out of luck. The various commands below also assume the project is hosted on GitHub, but it shouldn’t be too difficult to adapt them to work with other sites such as GitLab.</em></p>
<h2 id="tldr">TL;DR</h2>
<p>For those that just want the meat and potatoes, running the following commands will furnish your Git configuration with a handful of useful aliases, including the afore-mentioned <code class="language-plaintext highlighter-rouge">git pr</code>, which given a commit SHA should open the PR in your browser. For those on MacOS the commands to create the aliases are:</p>
<pre class="terminal"><code> git config --global alias.merge-commits '!funct() { git log --merges --reverse --oneline --ancestry-path $1..origin | grep "Merge pull request"; }; funct'
git config --global alias.pr-number '!funct() { git merge-commits $1 | head -n1 | sed -n "s/^.*Merge pull request #\\s*\\([0-9]*\\).*$/\\1/p"; }; funct'
git config --global alias.web-url '!funct() { git config remote.origin.url | sed -e"s/git@/https:\/\//" -e"s/\.git$//" | sed -E "s/(\/\/[^:]*):/\1\//"; }; funct'
git config --global alias.pr '!funct() { open "`git web-url`/pull/`git pr-number $1`" ;}; funct'
</code></pre>
<p>If you are on Linux you will need to replace <code class="language-plaintext highlighter-rouge">open</code> with <code class="language-plaintext highlighter-rouge">xdg-open</code> in the last command.</p>
<p>For those of you interested in the details, read on…</p>
<h2 id="step-1-finding-the-merge-commit">Step 1: Finding the merge commit</h2>
<p>First let’s list all the merge commits that are on the commit graph between origin (i.e. the tip of the main branch) and the commit in question, oldest to newest:</p>
<p>`
git log –merges –reverse –oneline –ancestry-path COMMIT-SHA..origin
`</p>
<p>Let’s look at an example output for a commit in the <a href="https://github.com/rails/rails">Rails codebase</a>:</p>
<pre class="terminal"><code> <strong>$ git log --merges --reverse --oneline --ancestry-path a21dd9f2f82a4fb6b70efa816ec153950a74e355..origin</strong>
8d597b5435 Merge pull request #39240 from rails/dym-hack
7494f1be33 Merge pull request #39259 from kamipo/fix_type_cast_aggregation_on_association
9fa0f01e6b Merge pull request #39273 from LukasSkywalker/remove-indexer-from-rails-guides
0bed64793e Merge pull request #39274 from kamipo/aggregation_takes_attribute_types
89043b7f7f Merge pull request #39264 from kamipo/fix_type_cast_pluck
1c3e75bf09 Merge pull request #39268 from kamipo/fix_merging_multiple_left_joins
ac65e560db Merge pull request #39269 from kamipo/improve_performance_loaded_association_first
</code></pre>
<p>Assuming you are not squash-merging, the commit that appears first (the oldest) will more often than not be the merge commit for the pull request. I say “more often than not” because if the branch in question had another branch merged into it (i.e. someone merged the main branch back in to bring it up-to-date), then those merge commits will also appear, as in this example:</p>
<pre class="terminal"><code> <strong>$ git log --merges --reverse --oneline --ancestry-path 638cc381b11d421f30670ea3cf9aa780d710b7bf..origin</strong>
232372bfd1 Merge branch 'master' into collection-refactor
fc4ef77d47 Merge pull request #38594 from rails/collection-refactor
931f958695 Merge pull request #38810 from kamipo/restore_compatibility_for_lookup_store
e2cf0b1d78 Merge pull request #38812 from ecbrodie/ecbrodie-patch-validations-docs
83dd0d53d6 Merge pull request #38814 from eugeneius/test_runner_trailing_slash
f41730c95a Merge pull request #38827 from schmijos/patch-1
86eac9b2b4 Merge pull request #38834 from olleolleolle/simpler-workflow-rubocop
</code></pre>
<p>To get around this we can pipe the output through <code class="language-plaintext highlighter-rouge">grep</code> so we only see merge commits for pull requests:</p>
<p>`
git log –merges –reverse –oneline –ancestry-path COMMIT-SHA..origin | grep “Merge pull request”
`</p>
<p>We’ll turn this command into a Git alias so we can run it succinctly as <code class="language-plaintext highlighter-rouge">git merge-commits COMMIT-SHA</code>:</p>
<pre class="terminal"><code> git config --global alias.merge-commits '!funct() { git log --merges --reverse --oneline --ancestry-path $1..origin | grep "Merge pull request"; }; funct'</code></pre>
<p><em>(Note: if you get an “ambiguous argument” error when you run this command in your repository, it’s probably because your local repo doesn’t have a symbolic reference set for <code class="language-plaintext highlighter-rouge">origin/HEAD</code>. This normally happens when the repo was created by yourself, rather than cloned from the remote. Running <code class="language-plaintext highlighter-rouge">git remote set-head origin -a</code> will set <code class="language-plaintext highlighter-rouge">origin/HEAD</code> and resolve this.)</em></p>
<h2 id="step-2-extracting-the-pull-request-number">Step 2: Extracting the pull request number</h2>
<p>The next step is to extract the PR number from the commit log we’ve just generated. First we pipe to <code class="language-plaintext highlighter-rouge">head</code> to get the first line and then to <code class="language-plaintext highlighter-rouge">sed</code> to extract the PR number:</p>
<p>`
git merge-commits COMMIT-SHA | head -n1 | sed -n “s/^.<em>Merge pull request #\s</em>\([0-9]<em>\).</em>$/\1/p”
`</p>
<p>As before, let’s turn that into a Git alias. This one we’ll call <code class="language-plaintext highlighter-rouge">git pr-number</code>:</p>
<pre class="terminal"><code> git config --global alias.pr-number '!funct() { git merge-commits $1 | head -n1 | sed -n "s/^.*Merge pull request #\\s*\\([0-9]*\\).*$/\\1/p"; }; funct'</code></pre>
<h2 id="step-3-constructing-and-opening-the-url-for-the-pull-request">Step 3: Constructing and opening the URL for the pull request</h2>
<p>Now we have the PR number we are almost ready to construct the URL for the pull request. But first we need the web URL for the repository on GitHub. We can construct that from the remote URL configured in the repository, which will either look like <code class="language-plaintext highlighter-rouge">git@github.com:rails/rails.git</code> or <code class="language-plaintext highlighter-rouge">git@github.com:rails/rails</code> or maybe even <code class="language-plaintext highlighter-rouge">https://github.com/rails/rails.git</code>. Here’s a command that will return the web address for the repo:</p>
<p>`
git config remote.origin.url | sed -e”s/git@/https:\/\//” -e”s/.git$//” | sed -E “s/(\/\/[^:]*):/\1\//”
`</p>
<p>Once again we’ll make this a <code class="language-plaintext highlighter-rouge">git web-url</code> alias for convenience:</p>
<pre class="terminal"><code> git config --global alias.web-url '!funct() { git config remote.origin.url | sed -e"s/git@/https:\/\//" -e"s/\.git$//" | sed -E "s/(\/\/[^:]*):/\1\//"; }; funct'</code></pre>
<p>Finally, armed with our <code class="language-plaintext highlighter-rouge">web-url</code> and <code class="language-plaintext highlighter-rouge">pr-number</code> aliases we are ready to combine them to construct the URL for the pull request and open it in a browser! Here is the final alias <code class="language-plaintext highlighter-rouge">git pr</code> for those of you on MacOS:</p>
<pre class="terminal"><code> git config --global alias.pr '!funct() { open "`git web-url`/pull/`git pr-number $1`" ;}; funct'</code></pre>
<p>Those of you on Linux will need to use <code class="language-plaintext highlighter-rouge">xdg-open</code> instead of <code class="language-plaintext highlighter-rouge">open</code> to open a browser:</p>
<pre class="terminal"><code> git config --global alias.pr '!funct() { xdg-open "`git web-url`/pull/`git pr-number $1`" ;}; funct'</code></pre>
<p>And now we can jump from a commit SHA straight to the PR that introduced it with one simple command.</p>
<p><img src="/images/20200609/git-pr-demo.gif" alt="" class="normal" /></p>Tekin SüleymanA carefully constructed commit history can be a goldmine of useful information for helping us understand our codebase. Anyone who has ever wondered “but why!?” and got the answer after running git blame and getting a well-written commit message will know what I mean.Proof your thousand-line pull requests result in more bugs2020-05-14T00:00:00+00:002020-05-14T00:00:00+00:00https://tekin.co.uk/2020/05/proof-your-thousand-line-pull-requests-create-more-bugs<p>As developers we instinctively know that large pull requests are not great. The bigger the change, the harder it is to inspect, the longer it takes to review, and the greater the chance bugs will slip through. And let’s not forget merge conflicts, or the pain and suffering huge pull requests inflict on the reviewer.</p>
<p>In an ideal world we should aim to merge small changes, often.</p>
<p>But exactly how large is too large? Is there anything we can use as a guide to tell us when our change is getting too big? Literally, <strong><em>what is the maximum number of lines of code our change should be?</em></strong></p>
<h2 id="lets-look-at-the-evidence">Let’s look at the evidence…</h2>
<p>Turns out there have been a number of studies<sup id="code-inspection-studies-src"><a href="#code-inspection-studies">(1)</a></sup> looking at the effectiveness of code review, and one of the findings common across a lot of them is that after around 60 to 90 minutes of review, the rate at which defects are discovered (bugs, or aspects of the code that could be improved) starts to fall. i.e. after an hour or so we become less effective at spotting problems.</p>
<p>There are a number of reasons why this might be, but the most obvious is that the longer we stare at something, the more blind to it we become. After an hour of looking at the same thing our brain becomes saturated and less effective at spotting issues.</p>
<p>So if we can conclude that we should spend at most an hour to 90 minutes reviewing a change, the next question is: how much code can we effectively review in that time frame?</p>
<h2 id="lines-of-code">Lines of Code…</h2>
<p>Another 2006 study<sup id="smartbear-study-src"><a href="#smartbear-study">(2)</a></sup> carried out by Smartbear at Cisco found that there was a link between the size of the change under review and the density of defects found. Below 200 lines of change, the density of defects discovered (defects per lines of code) was highest, and generally the more lines under review, the fewer defects per line of code where discovered.</p>
<p>And when the defects discovered per hour was measured against the size of the change they found the rate of defect discovery starts to fall above 300 lines, with a significant tailing off above 500 LOC.</p>
<p>Based on this data and the finding that review efficiency falls after 90 minutes, they concluded that a reviewer will be <strong><em>most effective reviewing no more than 400 lines of code in one go</em></strong>. Above 400 lines the rate of defect detection starts to fall and the chance that bugs will slip through increases significantly.</p>
<p>So there you have it. If you want to minimise the amount of bugs you ship, keep your pull requests below 400 lines.</p>
<h2 id="hold-on-one-second">Hold on one second…</h2>
<p>Of course, it’s not quite that simple. This number is based on data specific to the codebase and team in the study. Depending on the particular dynamics of your codebase and team, the actual number that applies to you could be quite different.</p>
<p>But this study and the others it drew from provide us with a good indication that keeping the size of our changes down will result in fewer bugs.</p>
<h2 id="how-to-keep-your-pull-requests-small">How to keep your pull requests small</h2>
<p>Sometimes the thing we’re working on is too big or complicated to realistically ship in a single 400 line change. So what can we do in these situations to keep our PRs small?</p>
<p>Here are a few suggestions for ways you can split larger pieces of work into smaller pull requests:</p>
<ul>
<li>Split refactoring work into separate PRs that lay the groundwork for any new behaviour. Any refactoring work should preserve existing behaviour, which means it can reviewed and merged without impacting the end user.</li>
<li>Keep unrelated changes out of your feature branches. If you spot something unrelated to the work at hand and want to fix it, either note it down to deal with later, or if it needs dealing with now, make the change in a fresh branch and get it reviewed separately.</li>
<li>Can you split or slice the scope of the thing you are working on in such a way as to release it in smaller incremental steps?</li>
<li>For big, complex features that are likely to take some time to develop, use feature switches to enable you to progressively merge and release the changes in chunks whilst still keeping it hidden from end users until it’s ready.</li>
<li>If all else fails, make sure to create small, atomic commits that do one thing. That way a reviewer will have the option to tackle each commit individually as a way to reduce the scope of the change they are reviewing at one time.</li>
</ul>
<h2 id="references">References</h2>
<p><a id="code-inspection-studies" href="#code-inspection-studies-src">1</a>: <a href="https://dl.acm.org/doi/10.1145/337180.337343">Object-oriented inspection in the face of delocalisation</a>, 2000, Alastair Dunsmore, Marc Roper, Murray Ian Wood; <a href="https://dl.acm.org/doi/10.1145/581339.581349">Further investigations into the development and evaluation of reading techniques for object-oriented code inspection</a>, 2002 Alastair Dunsmore, Marc Roper, Murray Ian Wood; <a href="https://dl.acm.org/doi/10.1145/337180.337343">A Case Study of Code Inspection</a>, 1991, Frank W. Blakely, Mark E. Boles.</p>
<p><a id="smartbear-study" href="#smartbear-study-src">2</a>: Unfortunately the Smartbear study findings are behind an email capture form, but it can be downloaded <a href="https://smartbear.com/resources/ebooks/best-kept-secrets-of-code-review/">here</a>.</p>
<!-- Bugs are bad. The earlier we can discover bugs, the cheaper the impact. A bug that is discovered during code review is going to be less costly than one discovered during Q/A, which is less costly than one that isn't discovered until it's live in our system. The earlier we can discover a bug, the less impact it's going to have.
The same applies to problems in our software's design: it's usually simpler and quicker to fix issues with the structure of our code the earlier they are spotted.
One of the strategies software teams employ to catch these issues early is code review using pull requests.
-->Tekin SüleymanAs developers we instinctively know that large pull requests are not great. The bigger the change, the harder it is to inspect, the longer it takes to review, and the greater the chance bugs will slip through. And let’s not forget merge conflicts, or the pain and suffering huge pull requests inflict on the reviewer.Git tip: keep your personal business out of .gitignore files2020-03-13T00:00:00+00:002020-03-13T00:00:00+00:00https://tekin.co.uk/2020/03/maintain-a-global-git-ignore-file<p>Chances are most of the Git repositories you work with contain a <code class="language-plaintext highlighter-rouge">.gitignore</code>
file. This tells Git the files and directories that you want kept out of
the repository.</p>
<p>The <code class="language-plaintext highlighter-rouge">.gitignore</code> file is usually programming language or framework specific, and
is a great way to keep the repository clean and free of things like log files
and build artefacts.</p>
<p>What it‘s less good for, however, is ignoring files that are specific to your
particular operating system, IDE, or environment. Firstly, it‘s a pain to have
to add your own system-specific exclusions to the <code class="language-plaintext highlighter-rouge">.gitignore</code> file of every
repository you work with, but also it makes for a messy <code class="language-plaintext highlighter-rouge">.gitignore</code> file,
especially when there are many developers all with their own particular set-up.</p>
<p>Thankfully there is a simple solution: maintain your own global ignore file
which will apply to all the repositories you work with on your computer.</p>
<h2 id="configuring-a-global-git-ignore-file">Configuring a global Git ignore file</h2>
<p>From your terminal or command line, run the following to configure Git with a
global ignore file called <code class="language-plaintext highlighter-rouge">.gitignore_global</code> in your home directory:</p>
<pre class="terminal"><code> $ git config --global core.excludesfile ~/.gitignore_global
</code></pre>
<p>Now you can fill it with all the system-specific exclusions you like, without
polluting the ignore files of the repositories you work with.</p>
<h2 id="what-to-add-to-your-global-git-ignore-file">What to add to your global Git ignore file</h2>
<p>My global ignore file contains operating system specific things like macOS’
<code class="language-plaintext highlighter-rouge">.DS_Store</code> file, but also things specific to my IDE, for example the <code class="language-plaintext highlighter-rouge">.tags</code>
file generated by <a href="https://ctags.io">Ctags</a>.</p>
<p>For a comprehensive library of operating system and editor specific exclusions
check out the community-maintanted
<a href="https://github.com/github/gitignore/tree/master/Global">github/gitignore</a>
repository on GitHub.</p>
<h2 id="there-is-a-third-way">There is a third way…</h2>
<p>Sometimes you may have a file you want to exclude from a repository that doesn’t
belong in the repository’s ignore file, but also doesn’t belong in your
global ignore file. In those probably rare cases, you can list the exclusion in
a file in your local repository called <code class="language-plaintext highlighter-rouge">.git/info/exclude</code>.</p>
<!-- image from https://pixabay.com/photos/birds-decoration-figurines-276191/ -->Tekin SüleymanChances are most of the Git repositories you work with contain a .gitignore file. This tells Git the files and directories that you want kept out of the repository.