tekin.co.uk

How to use introspection to discover what is exhausting your ActiveRecord connection pool

This week I wrote about the reasons why you might need an ActiveRecord connection pool larger than the number of configured Puma threads.

Since then Ben Sheldon has pointed out that apps running embedded job workers (for example Sidekiq in embedded mode, GoodJob in async mode or Sucker Punch) will also be creating extra threads and therefor may need a larger thread pool.

In this follow-up post I’m going to describe the technique I used to uncover the source of the connection pool contention for the app I’m working on, and how you can do the same if your seeing mysterious ActiveRecord::ConnectionTimeoutError exceptions and don’t know where they’re coming from.

Uncovering the source of ActiveRecord connection pool contention

These connection timeouts were a long-standing mystery in our app. Although they didn’t happen with great frequency, they were happening often enough to warrant some further digging rather than letting them become another broken window.

To figure out what was going on I employed some introspection on the connection pool. Whenever a thread asks for a connection from ActiveRecord, it is assigned one from the connection pool. The connection itself stores a reference to the assigned thread as its owner. We can inspect the assigned threads in the connection pool to learn a bit more about them:

ActiveRecord::Base.connection_pool.connections.map do |connection|
  connection.owner.present? ? connection.owner.inspect : "[UNUSED]"
end.join("\n")

You can try this in the rails console, although you’ll want to fire off a quick query first, otherwise there won’t be any connections assigned!

On the actual Puma web server, if connections are only assigned to Puma threads the output will look something like this:

#<Thread:0x00007f0bbab3b740@puma srv tp 003 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 sleep>
#<Thread:0x00007f0bbab3b3d0@puma srv tp 004 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 run>
#<Thread:0x00007f0bbab3b8f8@puma srv tp 002 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 run>
#<Thread:0x00007f0bbab3bbc8@puma srv tp 001 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 sleep_forever>
[UNUSED]

The inspect output tells us a bit about each thread. The key part is the path that points at the line of code where the thread was started, which in this case is inside the Puma thread pool here.

To narrow down the search for the mystery threads using up our connections I wired this code into our bug tracker such that we’d log this debug output whenever an ActiveRecord::ConnectionTimeoutError was raised:

Bugsnag.configure do |config|
  config.add_on_error(proc do |event|
    if event.errors.first.error_class == "ActiveRecord::ConnectionTimeoutError"
      connection_pool_info = ActiveRecord::Base.connection_pool.connections.map do |connection|
        connection.owner.present? ? connection.owner.inspect : "[UNUSED]"
      end.join("\n")

      event.add_metadata(:app, :connection_pool_info, connection_pool_info)
    end
  end
end

It didn’t take long for the culprit to materialise:

#<Thread:0x00007f32eca13ad0@puma srv tp 002 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 sleep_forever>
#<Thread:0x00007f32f23cd6e8 /app/.../gems/actionpack-7.0.5.1/lib/action_controller/metal/live.rb:341 sleep>
#<Thread:0x00007f32eca12a40@puma srv tp 003 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 sleep_forever>
#<Thread:0x00007f32f2936288 /app/.../gems/actionpack-7.0.5.1/lib/action_controller/metal/live.rb:341 sleep>
#<Thread:0x00007f32eca13e40@puma srv tp 001 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 run>

We can see that two of the connections are assigned to threads spawned by ActionController::Metal::Live here.

This was the missing piece of the puzzle. It didn’t take much tracing of code to discover that it was ActiveStorage proxy requests that spin up extra threads for streaming responses, and it was these threads that were putting the extra pressure on our connection pool and causing the sporadic timeout errors.

Introspection is good actually

I wanted to write this post partly to serve as a pointer to others that might be seeing these connection timeouts, but also to make a broader point about the blurred lines between our code and the libraries our code depends on. Because when our app is running, it all effectively becomes our code. Getting comfortable diving into the source for gems you use will serve you well, and will sometimes be the only way to get to the origin of a strange bug, or unexpected behaviour.

One of my favourite ways to do this is using bundle open. It gets you to the literal gem code that your app will be using right there in your preferred editor, ready for exploring. You can even make changes to the gem code for a spot of puts debugging (although do so with care and always remember to undo your changes when you’re done!)

Aside: If you’re a VSCode user, you may want to set the BUNDLER_EDITOR environment variable to "code -n" to stop bundle open replacing your current project window.

Get more fab content like this straight to your inbox

You'll get an email whenever I have a fresh insight or tip to share. Zero spam, and you can unsubscribe whenever you like with a single click.

More articles on Rails

Authored by Published by