The “N+1 queries” is a common performance problem that can happen when we deal with an entity that has an association to another. For instance, a Tweet belongs_to
an User and has_many
Comments. When we display a page with all Tweets we might use
@tweets = Tweet.all
And then in the template we might call
<% for tweet in @tweets %>
<div class="tweet">
<p><%= tweet.text %></p>
<p><%= tweet.user.name %></p>
</div>
<% end %>
You might guess that we are performing a first query to find the tweets, and then an additional query for finding the user of every tweet we found. We can see that from the Rails log:
Started GET "/" for 127.0.0.1 at 2013-01-18 13:15:43 +0100
Processing by TweetsController#index as HTML
Tweet Load (0.2ms) SELECT "tweets".* FROM "tweets" ORDER BY created_at desc
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" IN (10, 9)
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? [["id", 10]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? [["id", 10]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? [["id", 10]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? [["id", 10]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? [["id", 10]]
User Load (0.1ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? [["id", 9]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? [["id", 9]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? [["id", 9]]
CACHE (0.0ms) SELECT "users".* FROM "users" WHERE "users"."id" = ? [["id", 9]]
Rendered shared/_tweets.html.erb (6.4ms)
Rendered tweets/index.html.erb within layouts/application (8.9ms)
Completed 200 OK in 65ms (Views: 11.3ms | ActiveRecord: 1.6ms)
This is obviously bad for performance. It's much more efficient to issue one (or two) big queries than many small queries. One idea would be to select from a join of tweets with users (one query). Another idea is to first select all the tweets, than inspect which user_ids
they contain, and then issue a second query for all the users with id in the ids that we found earlier (two queries in all).
ActiveRecord (which is Rails object-relational mapping library) offers a very convenient way to solve the problem. Whenever we issue a query for an array of entities, and we know that we are going to ask those entities for an associated entity, we can use the "includes" clause. For instance
@tweets = Tweet.includes("user").all
After this change, we can observe what happens in the Rails log:
Started GET "/" for 127.0.0.1 at 2013-01-18 16:35:48 +0100
Processing by TweetsController#index as HTML
Tweet Load (0.3ms) SELECT "tweets".* FROM "tweets" ORDER BY created_at desc
User Load (0.2ms) SELECT "users".* FROM "users" WHERE "users"."id" IN (10, 11, 9)
Rendered shared/_tweets.html.erb (4.4ms)
Rendered tweets/index.html.erb within layouts/application (7.6ms)
Completed 200 OK in 15ms (Views: 10.8ms | ActiveRecord: 0.7ms)
Much better! We can see that ActiveRecord chose the "two queries" option.
The choice to use a join or two separate queries depends on the particular case; in general the join is less efficient but not always. In general it is easier to get reasonable performance out of two simple queries than a complex one. Usually it's best to let ActiveRecord choose. In rare cases ActiveRecord's choice might not be good; in those rare case we can code the SQL by hand.
Please note that THIS IS NOT A RAILS-SPECIFIC PROBLEM! Every object-relational technology has this problem. Rails happens to solve it in a very convenient way.