Extreme Enthusiasm

Two design episodes in Rails

After the fact, it’s easy to find better solutions. I present two examples from my experience with an ecommerce Rails application.

How to avoid joins, part I

We needed to make both product categories and discounts appear or disappear at specific times. The canonical solution seemed, at the time, to declare


class Schedule < ActiveRecord::Base
  belongs_to :schedulable, :polymorphic => true

  def active?
    Time.now.between?(self.start, self.end)
  end  
end

class Discount < ActiveRecord::Base
  has_one :schedule, :as => :schedulable
  
  # ...
end

class Category < ActiveRecord::Base
  has_one :schedule, :as => :schedulable
  
  # ...
end

The schedules table contains the two columns “start” and “end”. The logic for a schedule is pretty simple: it’s “active?” if “now” is between “start” and “end”.

A lot of thought and discussion went into deciding when to save a Schedule object in the schedules table; we wanted to write


class Category < ...
  def active?
    self.schedule.active?
  end
end

but this presumes that there indeed is a schedule saved in the database for a_product, so it had to be


class Category < ...
  def active?
    self.schedule && self.schedule.active?
  end
end

and similar complication happened when adding or changing the schedule. Selecting all the active categories, got more complicated for we needed an extra join, and a left outer join at that. That impacted all queries on categories or discounts, in particular the free search query that really didn't need to get more complicated. We decided to add a post_create callback so that every new discount or category would always have an associated schedule. The decision to go with the "polymorphic has-one association" led to complification and increased coupling.

So what is a simpler, caveman's solution? Well, why not add "validity_start" and "validity_end" columns to both categories and discounts tables? Our Ruby code would become:


module Schedulable
  def active?
    Time.now.between?(self.validity_start, self.validity_end)
  end  
end

class Discount < ActiveRecord::Base
  include Schedulable
  
  # ...
end

class Category < ActiveRecord::Base
  include Schedulable
  
  # ...
end

So all it takes to make a model "schedulable" is to include the Schedulable module, and add two more columns to the model table.

Analysis of the "caveman" solution:

Duplication? In the definition of the two tables, perhaps a bit. But no duplication in Ruby code, and far less code to write.
Denormalized? Actually no. A "schedule" is not something relevant to the business; it doesn't need an identity. It is not a business entity so it's proper that we implement it as a collection of attributes rather than with its own table.
Good Rails design? I think it is. The decision to make Schedule a model was bad design, as a Schedule is not a business entity.

So the "caveman" solution actually is what a good data modeler would have chosen from the start. We were fascinated by how easy it was to use the "polymorphic association" to remove the duplication of the two extra columns, that we ended up complicating our life for no good reason.

Lesson learned: always consider what a caveman would do. He might be smarter than you!

Lesson learned: Rails is an effective way to put a web GUI in front of a database. Think like a data modeler. Think Entity-Relationship.

How to avoid joins, part II

Later in the same project, we needed to make the website respond in English or Italian. Rails 2 is well equipped for localizing the GUI out-of-the-box; but it will not deal with the problem to translate the properties of your model. In our case, we needed to translate the names and descriptions of products.

The canonical Rails solution at the time was to use the Globalize2 gem. It's actually a good gem, but in retrospect it was not a good fit for our problem.

Fact: we didn't need to support 1000 languages. Just 2. Maybe 3 or 4 in the next 5 years.

Fact: we didn't need to translate 100 attributes in 100 models. Just 3 properties in 2 models.

The Globalize2 gem adds a join to a "globalize_translations" table to every query. That didn't do any bit of good to the free search query, that was awfully complicated already! So while in theory Globalize2 is transparent, in practice you have to modify many queries to take it into account.

Once again, what would our friend the caveman do? You guessed it, add extra columns instead of a join. Replace columns "name" and "description" with "name_en", "name_it", "description_en", "description_it". You add a bit of drudgery to the schema definition, but your queries turn out to become simpler. Taking advantage of the fact that schema migrations in Rails are very easy, it would not be a big problem to add a new language by adding the few extra columns.

And all the Ruby code it would take to produce the globalized descriptions in the web pages would be to define


class Product < ...
  def globalized_name
    if self.attributes.include? "name_#{current_locale}"
      self.attributes["name_#{current_locale}"]
    else
      # fallback to Italian
      self.name_it
    end
  end
end

My solution is not transparent, but it's simple and easy to understand; while Globalize2 tries to be transparent but does not quite succeed.

Lesson learned: less magic please. If you can't achieve complete transparency, then go for being explicit *and* simple.

This entry was posted on 25 February 2011 at 17:06 and is filed under Agile. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

Tweets that mention Extreme Enthusiasm » Blog Archive » Two design episodes in Rails -- Topsy.com Says:
February 25th, 2011 at 20:12

[…] This post was mentioned on Twitter by Luca Mearelli, Matteo Vaccari. Matteo Vaccari said: Two design episodes in Rails: http://matteo.vaccari.name/blog/archives/511 […]

Carlo Pescio Says:
March 1st, 2011 at 08:53

Matteo,
interesting read. I share your appreciation for simple code and simple queries. Still, I guess you’ll agree that in the end what we want is simple maintenance / extension.
Simple code works toward that goal, unless it’s too simple, in which case maintenance / extension requires a complete rewriting. In worst case, a complete rewriting of scattered code.

That said, as your customer :-)) I’d like to ask for a simple change (or two)

– change 1
my schedule isn’t really 9 to 5. is more like 7:30-9:00 + 19:30-21:00, that is multiple intervals.

– change 2
no no, wait, I changed my mind again. I want a different schedule monday and friday.

Would you still go caveman in this case? (not that I particularly like complex joins, so the real design challenge here would be to make this stuff flexible *and* easy :-))

matteo Says:
March 1st, 2011 at 09:19

Hi Carlo, it’s a pleasure.

You make an interesting point. The schedule I was talking about was not a daily schedule, it was an interval of dates, as in “I will show the new team shirt on the day of the official presentation”, not “I will allow you to buy shoes from 9 to 5”. Anyway I take your challenge.

If the requirement were to *remember* when a product *was/will be* scheduled, then the way to go would be a separate table.

If your requirement is a complex daily/weekly schedule, like in your example of 7:30-9:00 + 19:30-21:00, I would consider the use of an expression language, kind of like what Unix Cron does. That would not be easy to evaluate in SQL, though. You would need to move complex logic to SQL, kind of stored procedure, or else evaluate the the schedule logic in application code, and that makes paginating free search results more difficult.

In any case, the caveman’s solution code is not scattered, at least on the application code side; it’s well factored in a single Ruby module. The columns it needs, though, would be duplicated on all tables that should support “schedulable”.

What would you do?

Carlo Pescio Says:
March 1st, 2011 at 19:23

Matteo: I got it like “discount is active only on monday morning”, something that is rather frequent in ecommerce sites. Still, it’s a reasonable change in business requirements :-).

About what I would do: my approach to database design changed quite a bit over the years, and a comprehensive explanation of *why* I would do what follows would require more space than reasonable for a blog comment. Actually, I think it’s a good fit for an “idea paper”, discussing a simple but realistic design issue with the necessary depth of scrutiny. I’ll provide a short rationale anyway, because it’s probably not the mainstream approach.

Looking back at the original problem, I don’t truly like the idea of merging Discount and Schedule at the database level, despite code simplification. The fundamental issue here is lack of modularity / separation of concerns. We are coupling two different concepts:

– the concept of discount, which is a fundamental domain concept in sales.

– the concept of activating a discount on a specific date range (or in my case with different rules), which is a transient business/marketing policy, that may change at any time.

Without proper modularity, we get an entangled database structure; in some cases [not yours], we also get entangled code (I discussed a slightly related problem in an old post: http://www.carlopescio.com/2008/04/can-aop-inform-oop-toward-soa-too-part.html).

Unfortunately, modularity is not part of mainstream database culture, which is why we usually end up with tightly coupled database schemas, where nobody feels confident about removing a field (not to mention a table). However, we have several ways to decouple things, also at the database level.

So, what I would do? I can see these forces in the problem setting:

– we want to do selections on the database side (efficiency)

– we want a flexible scheduling policy (I would call this business agility :-). more properly, we want to change scheduling policy with ease. I don’t need an overblown solution.

– selection takes the scheduling policy *result* as part of its conditions.

To make a long story short, this forcefield is suggesting me to:

– use a database decoupling technique to hide the concrete scheduling policy to whoever is querying Discount, while still keeping everything on the database side (as far as possible).

In practice, I would simply use a database view. The view will tell me if a Discount is Active, so structurally I would just see a Discount Id and a boolean.

What is behind the structural side of the view (that is, in the SQL)? It depends!
– in your case, a very simple query (of course, I need a Schedule table, but this is hidden by the view)
– in my case, if the schedule language is simple (like ORing several time/date intervals), probably a slightly more complex query, and a different schedule table, but everything is still manageable in plain SQL.
– if I need a very powerful language, I can use whatever mechanism I choose (even a timer-driven evaluation of conditions) to populate a table, and make the view a mirror of that table.
– etc.

You can easily see that the structural side (the returned fields) of the view takes the role of a Strategy pattern interface, while the procedural/SQL side takes the role of the Strategy implementation. Since every decent database does a predicate pushing on views, efficiency won’t be compromised. On the efficiency side, one may also want to experiment with materialized views (not in this case, I would say, but may prove useful in similar cases).

Once you do that, you still have to live with a join (between Discount and the view). This is unfortunate, and in my opinion, is mostly a consequence of the lack of attention to modularity (and modularity-enforcing techniques) in the database community. For instance, Oracle 11g introduced the concept of virtual field (which would be perfect for this), with some ad-hoc constraints that do not apply to views.
Besides, too many people think of views, triggers, stored procedures etc like kludges, because they are never formally taught to use them as engineering tools, and they learn on the job to use them just to patch things around. But that would be another long story :-)

matteo Says:
March 2nd, 2011 at 19:18

Hi Carlo,

I dig the use of views to simplify the DB interface. In a way you are pushing for a different solution than mine, but I agree with you that giving more thought to DB design is important. Only you are a better data modeler than me!

The reason why many people, including me, tend to avoid stored procedures, triggers, etc as much as possible is that this scatters logic between application and DB, and places logic in the place that makes it much more difficult to version, to test, to deploy. Not to mention that it makes schemas much more difficult to port between vendors.

Uberto Says:
March 4th, 2011 at 11:12

I’m not a Rails expert, but why you cannot add a new Table/Entity Schedule with a optional join to Category and Discount?

I’m assuming in most of the cases when you load a category (CRUD) you don’t care much about if it’s active or not, and when you need only the active ones it would be just a simple SQL join, or you load them all and you can validate the schedule rule by code if more complex (like (“Mondays AND NOT Festivity AND BETWEEN(“9:00” AND “19:00”)) OR SPECIAL_OFFER_COUPON

Probably I’m missing something, but as much as I like caveman solutions, they are often hiding a real problem…

matteo Says:
March 4th, 2011 at 16:09

Uberto: of course you can add a new Schedules table, and that’s exactly what we did. My observation is that in retrospective it’s not worth it to make a separate table, when all I need is to add a “valid_from_date” and “valid_until_date” columns.

In fact, in all the queries that matter (i.e. those that show data to the user) you must take validity into account, and that includes filtering valid products in search results; and since results are to be paginated I don’t want to check validity in application code. That would lead to a very ugly can of worms. The overhead is not just in changing all queries to use a join (that could be taken care of with a view, as Carlo suggests), but also in checking if the schedule already exists just about everywhere you mention a schedule. (That could be taken care of with a Ruby module, in fact… I suspect that the polymorphic has_one was not a very good choice.)

houses for sale in janesville wi Says:
August 11th, 2012 at 05:59

The milk state is now open for property business. I thank God. Gosh, what was a realtor to do?

Two design episodes in Rails

How to avoid joins, part I

How to avoid joins, part II

8 Responses to “Two design episodes in Rails”

Leave a Reply

About

Pages

Archives

Categories