Dispense: the-n+1-queries-problem

Matteo Vaccari > Applicazioni Web

The “N+1 queries” is a common performance problem that can happen when we deal with an entity that has a one-to-many relation to another. For instance, a Quote has-many Comments. When we display a page with all quotes we might use

def find_all_quotes(category_id)
  quotes = db_select
    "select * from quotes where category_id = #{category_id} limit 100"
  for quote in quotes
    quote[:comments] = db_select 
        "select * from comments where quote_id = #{quote.id}"
  end
end

You see that we are performing a first query to find the quotes, and then an additional query for every quote we found. This is obviously bad for performance. It's much more efficient to issue a single big query than many small queries.

For instance:

def find_all_quotes(category_id)
  quotes = db_select 
    "select * from quotes where category_id = #{category_id} order by quote_id limit 100"
  comments = db_select 
    "select * from comments where quote_id in " +
    "(select quote_id from quotes where category_id = #{category_id} limit 100)" 
  # loop through results and attach each comment to its quote
  # ... 
end

An alternative way would be to use a join like

select * from quotes q, comments c 
where c.quote_id = q.quote_id
  and category_id = #{category_id}

The choice to use a join or two separate queries depends on the particular case; in general the join is less efficient but not always.

In both cases some work must be done after the queries are done. The query results must be fixed so that the result of the find_all_quotes function is an array of quotes where each quote contains an array of comments. This is not a significant overhead, compared to doing N+1 queries.

Note: in these examples we assume that category_id was previously cleaned to protect us from SQL injection attacks.