Archive for February, 2011

Two design episodes in Rails

Friday, February 25th, 2011

After the fact, it’s easy to find better solutions. I present two examples from my experience with an ecommerce Rails application.

How to avoid joins, part I

We needed to make both product categories and discounts appear or disappear at specific times. The canonical solution seemed, at the time, to declare

class Schedule < ActiveRecord::Base
  belongs_to :schedulable, :polymorphic => true

  def active?
    Time.now.between?(self.start, self.end)
  end  
end

class Discount < ActiveRecord::Base
  has_one :schedule, :as => :schedulable
  
  # ...
end

class Category < ActiveRecord::Base
  has_one :schedule, :as => :schedulable
  
  # ...
end

The schedules table contains the two columns “start” and “end”. The logic for a schedule is pretty simple: it’s “active?” if “now” is between “start” and “end”.

A lot of thought and discussion went into deciding when to save a Schedule object in the schedules table; we wanted to write

class Category < ...
  def active?
    self.schedule.active?
  end
end

but this presumes that there indeed is a schedule saved in the database for a_product, so it had to be

class Category < ...
  def active?
    self.schedule && self.schedule.active?
  end
end

and similar complication happened when adding or changing the schedule. Selecting all the active categories, got more complicated for we needed an extra join, and a left outer join at that. That impacted all queries on categories or discounts, in particular the free search query that really didn't need to get more complicated. We decided to add a post_create callback so that every new discount or category would always have an associated schedule. The decision to go with the "polymorphic has-one association" led to complification and increased coupling.

So what is a simpler, caveman's solution? Well, why not add "validity_start" and "validity_end" columns to both categories and discounts tables? Our Ruby code would become:

module Schedulable
  def active?
    Time.now.between?(self.validity_start, self.validity_end)
  end  
end

class Discount < ActiveRecord::Base
  include Schedulable
  
  # ...
end

class Category < ActiveRecord::Base
  include Schedulable
  
  # ...
end

So all it takes to make a model "schedulable" is to include the Schedulable module, and add two more columns to the model table.

Analysis of the "caveman" solution:

  • Duplication? In the definition of the two tables, perhaps a bit. But no duplication in Ruby code, and far less code to write.
  • Denormalized? Actually no. A "schedule" is not something relevant to the business; it doesn't need an identity. It is not a business entity so it's proper that we implement it as a collection of attributes rather than with its own table.
  • Good Rails design? I think it is. The decision to make Schedule a model was bad design, as a Schedule is not a business entity.

So the "caveman" solution actually is what a good data modeler would have chosen from the start. We were fascinated by how easy it was to use the "polymorphic association" to remove the duplication of the two extra columns, that we ended up complicating our life for no good reason.

Lesson learned: always consider what a caveman would do. He might be smarter than you!

Lesson learned: Rails is an effective way to put a web GUI in front of a database. Think like a data modeler. Think Entity-Relationship.

How to avoid joins, part II

Later in the same project, we needed to make the website respond in English or Italian. Rails 2 is well equipped for localizing the GUI out-of-the-box; but it will not deal with the problem to translate the properties of your model. In our case, we needed to translate the names and descriptions of products.

The canonical Rails solution at the time was to use the Globalize2 gem. It's actually a good gem, but in retrospect it was not a good fit for our problem.

Fact: we didn't need to support 1000 languages. Just 2. Maybe 3 or 4 in the next 5 years.

Fact: we didn't need to translate 100 attributes in 100 models. Just 3 properties in 2 models.

The Globalize2 gem adds a join to a "globalize_translations" table to every query. That didn't do any bit of good to the free search query, that was awfully complicated already! So while in theory Globalize2 is transparent, in practice you have to modify many queries to take it into account.

Once again, what would our friend the caveman do? You guessed it, add extra columns instead of a join. Replace columns "name" and "description" with "name_en", "name_it", "description_en", "description_it". You add a bit of drudgery to the schema definition, but your queries turn out to become simpler. Taking advantage of the fact that schema migrations in Rails are very easy, it would not be a big problem to add a new language by adding the few extra columns.

And all the Ruby code it would take to produce the globalized descriptions in the web pages would be to define

class Product < ...
  def globalized_name
    if self.attributes.include? "name_#{current_locale}"
      self.attributes["name_#{current_locale}"]
    else
      # fallback to Italian
      self.name_it
    end
  end
end    
  

My solution is not transparent, but it's simple and easy to understand; while Globalize2 tries to be transparent but does not quite succeed.

Lesson learned: less magic please. If you can't achieve complete transparency, then go for being explicit *and* simple.

What I’ve been up to? Nine lessons learned.

Monday, February 7th, 2011

What I’ve been up to?

It’s been some time. This last eight months or so I haven’t been doing much software development. Most of my work has been in support. I’ve done consulting work.

This autumn I was hired to help a team with an application that was constantly crashing. The application was closed source, so we could not easily change it or trace how it worked. The first thing we did was to start monitoring stats. I wrote a simple monitor that tracks the health of a Tomcat process. This gives a feeling for what’s happening inside the application, for what happens in normal times and in times when it’s stressed. We kept a monitor open with baretail open on these log files and watched what was happening.

Every two or three days, the count of busy threads in Tomcat started rising, from the normal level of 2-3 up to the maximum of 150. At that point there was nothing to do but restart the server.

Lesson 0: set up watch dogs. It’s easy to write a custom status page sniffer. Nagios keeps an eye over many variables (memory, CPU load, failing services and the like.) New Relic is a very good all-round solution when you don’t know where to start.

The usual suspect in such cases is the database, and watching the process monitor on SQL Server revealed that many connections were blocking each other. We called in one consultant, then another, and finally a third one who knew how to deal with this problem.

Lesson 1: a consultant may know how to deal with your problem, or he may be out of his depth. You better make sure that any consultant you call in know what they’re doing. Warning signs is when they try stuff at random (I read in a blog that changing parameter XYZ could be helpful!) or just give you canned responses (uh, better install the latest service pack…)

The last SQL Server consultant really knew his stuff, and was able to solve the SQL Server hangups problems by cleaning up and optimizing the SQL Server installation. It turned out that this database has been running with no maintenance for years.

Lesson 2: if you run a business that depends on database-backed applications, you better have a full-time DBA looking over maintenance.

After the SQL Server tune up the application ran considerably better. But there were more problems on the horizon. We still had a few occasions when the application did block. I didn’t know what to do to get insight on what happened in those cases. I wanted a thread dump of the Java Virtual Machine, but on Windows it turns out it’s not an easy thing to do.

Lesson 3: Windows works. You can’t really say it doesn’t. And this is the most charitable thing I can say about it. All the same, Unix is so much better than Windows in every which way. It has always been.

In the end I found a neat utility called psexec that helped us solve that problem. We set up a script that checked the number of busy threads on Tomcat every 30 seconds, and created a thread dump for us. This showed that our Tomcat threads were busy trying to deliver a mail message to a SMTP server. A quick investigation found that the application was somehow configured to use a temporary, slow SMTP server nobody remembered about. We changed the configuration to point to a fast server and this problem was fixed.

The last failure mode we encountered happened rarely; at times the memory consumption in Tomcat started growing from the usual 100-150MB up to its configured maximum of 500MB, and stay there. At that point, the application was slowing to a crawl, and there was nothing but, again, reboot it. Try to imagine our feelings of helplessness while this was happening!

Our next tool here was to run JMap to get a memory dump, and then analyzing it with the Eclipse Memory Analyzer. The culprit seemed to be some data objects that we knew the application was allocating to contain the result of queries. Whenever a user was requesting a report of, say, 50K records, the app allocated memory to hold all of the result in memory, and saved it… guess where…. in the user session! Then it showed the first 100 rows to the user. Since that query was slow, the user was staring at a blank screen for a minute or two, and then reload (F5!), which caused Tomcat to perform another query. Each of these 50K rows queries was allocating 50MB of ram in the JVM. Boom, boom, boom, once this started it brought Tomcat to its knees.

Lesson 4: Don’t store more than a few bytes in the user session. Being stateless is the key to scaling. Which brings us to the next…

Lesson 5: Pagination is best done in the database. Caching is best done in the database. Filtering is best done in the database. Heavy computations with the data are best done in the database. Brush up on your SQL skills! If you’re doing business software, your app’s performance and capabilities will be dominated by how well you use your database.

We could not change the application code, but luckily there was a configuration parameter that limited the maximum number of returned rows. Setting it to 1000 solved the problem.

Lesson 6: whenever you perform a query, you must be certain that the number of rows it will return is bounded.

And so our job with this application was finished. We went on tackling the next… One amazing thing that I learned in the process is that placing Apache in front of Tomcat is not always gueranteed to work well. We had some really rough time when we discovered we had this devastating Apache bug, and later we found a different bug in a different app with the same setup.

Lesson 7: try to use as few pieces (processes, computers, things) as possible. Every piece you have may be a cause of failure. Whomever says “… and by using XYZ, we get ZYX for free” is a fool! Nothing comes for free.

One of the collegues who valiantly supported me in this adventure was so much overwhelmed by the stressful situation that he resigned. I don’t know what to think of this; certainly this was not the sort of job that we were trained to deal with. This was Operations, not Software Development. And this was the life of the consultant. Diving into a problem not knowing the solution, but with the confidence that we will find the means to find the means of solving it. This is the opposite of “you tell me what I must do and I do it”, which is an unfortunately common misinterpretation of “customer collaboration”.

Lesson 8: Act as a consultant, not as a contractor (as @p_pugliese would say). Find your own ways of gathering information, making plans, trying solutions. Find your own ways of checking if you’re making progress.

In the end, my collegue will have to decide for himself if this is a crisis that leads to coming back to this job stronger than before, or a signal that his deep desire is to do something else.