Dispense: the-n+1-queries-problem-and-rails

Matteo Vaccari > Applicazioni Web

The N+1 Queries Problem and Rails

The “N+1 queries” is a common performance problem that can happen when we deal with an entity that has an association to another. For instance, a Tweet belongs_to an User and has_many Comments. When we display a page with all Tweets we might use

@tweets = Tweet.all

And then in the template we might call

<% for tweet in @tweets %>
<div class="tweet">
  <p><%= tweet.text %></p>
  <p><%= tweet.user.name %></p>
</div>
<% end %>

You might guess that we are performing a first query to find the tweets, and then an additional query for finding the user of every tweet we found. We can see that from the Rails log:

Started GET "/" for 127.0.0.1 at 2013-01-18 13:15:43 +0100
Processing by TweetsController#index as HTML
  Tweet Load (0.2ms)  SELECT "tweets".* FROM "tweets" ORDER BY created_at desc
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" IN (10, 9)
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ?  [["id", 10]]
  CACHE (0.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ?  [["id", 10]]
  CACHE (0.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ?  [["id", 10]]
  CACHE (0.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ?  [["id", 10]]
  CACHE (0.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ?  [["id", 10]]
  User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ?  [["id", 9]]
  CACHE (0.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ?  [["id", 9]]
  CACHE (0.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ?  [["id", 9]]
  CACHE (0.0ms)  SELECT "users".* FROM "users" WHERE "users"."id" = ?  [["id", 9]]
  Rendered shared/_tweets.html.erb (6.4ms)
  Rendered tweets/index.html.erb within layouts/application (8.9ms)
Completed 200 OK in 65ms (Views: 11.3ms | ActiveRecord: 1.6ms)

This is obviously bad for performance. It's much more efficient to issue one (or two) big queries than many small queries. One idea would be to select from a join of tweets with users (one query). Another idea is to first select all the tweets, than inspect which user_ids they contain, and then issue a second query for all the users with id in the ids that we found earlier (two queries in all).

ActiveRecord (which is Rails object-relational mapping library) offers a very convenient way to solve the problem. Whenever we issue a query for an array of entities, and we know that we are going to ask those entities for an associated entity, we can use the "includes" clause. For instance

@tweets = Tweet.includes("user").all

After this change, we can observe what happens in the Rails log:

Started GET "/" for 127.0.0.1 at 2013-01-18 16:35:48 +0100
Processing by TweetsController#index as HTML
  Tweet Load (0.3ms)  SELECT "tweets".* FROM "tweets" ORDER BY created_at desc
  User Load (0.2ms)  SELECT "users".* FROM "users" WHERE "users"."id" IN (10, 11, 9)
  Rendered shared/_tweets.html.erb (4.4ms)
  Rendered tweets/index.html.erb within layouts/application (7.6ms)
Completed 200 OK in 15ms (Views: 10.8ms | ActiveRecord: 0.7ms)

Much better! We can see that ActiveRecord chose the "two queries" option.

The choice to use a join or two separate queries depends on the particular case; in general the join is less efficient but not always. In general it is easier to get reasonable performance out of two simple queries than a complex one. Usually it's best to let ActiveRecord choose. In rare cases ActiveRecord's choice might not be good; in those rare case we can code the SQL by hand.

Please note that THIS IS NOT A RAILS-SPECIFIC PROBLEM! Every object-relational technology has this problem. Rails happens to solve it in a very convenient way.