Refactor to remove duplication

March 17th, 2013

The gist of the message I got from Extreme Programming in the early years of this century was that good code would just emerge if we followed the four rules of Simple Design, as advocated by Kent Beck:

The code is simple enough when it:

  1. Runs all the tests
  2. Has no duplicated logic
  3. States every intention important to the programmers
  4. Has the fewest possible classes and methods

in this order.

Various formulation of this list exist. Given that “Runs all the test” is granted (we don’t want broken code) and “Has the fewest possible classes and methods” is also granted (simplicity seems to imply “few things”), I often wondered how the other two rules can be used to generate good design.

J.B. Rainsberger says:

Put more simply, if you master removing duplication and fixing bad names, then I claim you master object-oriented design

That’s a bold claim! How can this be?

In TDD By Example, Kent Beck says that the third step of TDD is not just “refactor”, but refactor to remove duplication. The point then is “what is duplication?” Is that just duplicated lines of code?

Over time I started collecting interesting examples of duplication. It turns out that seeing duplication is a skill that has to be learned.

Obvious duplication

The most obvious form of duplication is duplicated lines of code. That is usually cured by extracting the common lines in a method. The extraction of a method is often followed by moving the method to a new class, or another existing class.

This is just one way the four rules push towards a better design. There are others.

Duplication in names

There is a beautiful simple exercise in chapter 1 of Essential Skills for Agile Development. It asks to remove duplication in code that looks like this one:

class Organization {
  String primaryContactFirstName;
  String primaryContactLastName;
  String primaryContactPhone;
  String secondaryContactFirstName;
  String secondaryContactLastName;
  String secondaryContactPhone;
  ...
}    

There is no “code” here, just data declarations. There is quite obvious duplication in the names though. The string “primaryContact” is repeated three times. The sequence “firstName, lastName, phone” is repeated two times. The point is to arrive at something like

class Organization {
  Contact primaryContact;
  Contact secondaryContact;
  ...
}    

Primitive obsession

Primitive Obsession is one of the classic smells from the Refactoring book. It happens when we use primitive types in place of our own types. You can see it in the Organization example above, where we use a String to represent a phone number, and again String to represent names of persons.

Primitive Obsession is one of the most common smells. Most developers are not even aware that it is a problem. It is also a smell that takes a long time to remove. And … it is a form of duplication. We are duplicating the choice of representation for a concept. We can see the cost of this choice when we decide that, say, we want to stop using Java native Date type and start using the Joda Time replacement; then it takes ages to convert all of our code to the new type.

But when we do this replacement, we are still falling in the same trap; if we change our mind again, it will again take a very long time to convert the code. What is the answer to this problem? Should we think long and hard before we start our project on which Date class we want to use, to minimize the chance of having to change it later? Would that work?

Of course not! What we really want is to be able to change our mind quickly, whenever we learn that our first choice was not adequate. Again, the key is in removing the duplication: make the choice in a single place; all of our code will use our own XDate class. Our XDate could use java.util.Date internally, or something else, but code outside XDate does not know and does not care.

(Aside. Using our own Date type has the beneficial effect that all the static method that are usually found in a number of DateUtils classes become instance methods in XDate. Another smell is removed! End Aside.)

Using third party APIs directly

The primitive obsession smell is just an example of a more general problem: when we use someone else’s API, we use it directly throughout our code. When we need, say, to download a data file from some service, what do we do? Do we start littering our code with calls to java.net.URL and java.net.HttpUrlConnection? Heavens, no! That would be committing ourselves to a particular way to solve the http connection problem, and duplicating it in all the places where we need to do it.

That would also be a violation of what Robert Martin calls the “Dependency Inversion Principle”: depend on abstractions, not on concretions. The java.net.URL is a concrete way to solve the “download a remote file” problem.

A better solution would be to hide the decision inside a class. Just define a RemoteFile class with a download method. This way we avoid duplication.

Alas, this is another smell that most developers don’t even know about.

Hiding design decisions

This brings us to a connection that was not obvious: removing duplication leads to the famous paper by David Parnas On the criteria to be used in decomposing systems into modules. Quote:

We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others. Since, in most cases, design decisions transcend time of execution, modules will not correspond to steps in the processing.

There you have it: a class is a way to codify and centralize a design decision. A design decision that is duplicated should be moved to its proper home in a single place. This is one of the things that an XPer must learn, in order to become effective at becoming faster than the customer :-)

Further reading: an essay by James Shore on why removing duplication is key to making continuous design work: it mitigates the risk that unforeseen circumstances will force us to change an earlier decision.

Another personal update

February 22nd, 2013

Almost one year ago I started again as an independent consultant, after five years in Sourcesense/XPeppers. How’s it going?

I have worked with great customers. They are all good places to work, both technically and from the point of view of the fundamental values: first of all, respect. (If you’re a good Java, C#, Php or Ruby programmer and you could be interested in working for one, just ask me.)

What value do I bring to these customers? What kind of work do I do?

When I do my best work, I look for the biggest problem and keep working on it, trying different solutions until it gets better (a simple recipe that I learned from Pascal.) It’s not always clear what the biggest problem is, but mostly I look for things that are out of sync with what an Agile team should do. Two chief things to look for:

  1. How often do we ship?
  2. Are the customers (business owners, CEO, other stakeholders) happy? (Which requires sometimes to have difficult conversations)

Actually, I don’t really invent or apply the solutions myself; I try to get the organization’s people to do that. This makes it far more likely that the solutions make sense and work :-)

I’ve been fairly busy, and this is why I didn’t write here for a long time. Right now I’m happy with the way it’s going.

TDD no mundo real

February 22nd, 2013

TL;DR Mauricio Aniche‘s TDD No Mundo Real is a very good introduction to TDD.

I learned about Mauricio Aniche’s book on the Growing Object-Oriented Software mailing list. At first I thought I would have to wait for a translation, as I don’t speak Portuguese. But I soon discovered that I can follow the meaning well enough; in a few spots I got help from Google translate. I imagine that any speaker of Latin languages can read this book; just try!

This book is a modern introduction to TDD. The thing I like the most about it is that it explains the fundamentals using very simple examples. This for me is high praise: explaining things in a simple way is not easy. And it’s always worth it to work on the fundamentals, even when you think you’re an expert.

Chapters 1 to 3 are introductory. Here Mauricio writes for someone who does not know anything about TDD; he explains how it works and why it’s much better than traditional development techniques, such as writing little “main” methods to test Java classes.

Chapter 4 is about “Simplicity and baby steps”. This is a fundamental point about TDD, that you work in small increments. One good point that Mauricio makes is that this concept is often misunderstood. Some programmers take “the simplest thing that could possibly work” to mean “I’ll do the simplest possible modification to the current code.” This is wrong! It really means “make the code as simple as possible with respect to the current tests.” Sometimes this means making big, sweeping refactorings in order to make the system simpler.

In chapter 5, Mauricio talks about TDD seen as a design technique. Most TDD practicioners agree that TDD is more about design than tests; yet not everyone can explain well how is it that TDD helps you to design. The book gives a few examples of how the feedback provided by tests gives you an indication that the production code should be changed.

“Quality of test code” is the subject of chapter 6. Here we learn how to keep our tests DRY by using Setup and Teardown.

In chapters 7 and 8 Mauricio shows how tests alert us that the production code has problems of cohesion or coupling. This is probably the part that even old hands at TDD will find most interesting.

Then chapter 9 explains some fundamental things about encapsulation, “Tell, don’t ask” and the Law of Demeter.

Chapter 10 is an explanation of integration tests.

In conclusion, this book is a concise and complete introduction to TDD. It’s very simple and addresses the kind of problems that most working programmers, expecially those working in business software, are likely to find. What are its shorcomings? I can think of two: it’s not available in English, and there’s no treatment of scenarios, acceptance tests, breaking down features in a list of test.

Let’s just hope that Mauricio will translate his book to English, so that it can be circulated more widely.

On having a boss. On being a boss.

October 1st, 2012

I was at Better Software in Florence last week, where I heard my friend Alberto Brandolini talk with his usual flair about how problems within organizations relate to problems in the IT departments of organizations. I agree with most of what Brando said, except for a couple of things.

You can hear it from just about every cool lean and agile speaker these days. It’s the idea that hierarchical control is evil. For these speakers, the company organizational chart is the symbol of everything that’s going wrong in the enterprise. Cool modern speakers about “agile management” propose alternative models based on non-hierarchical, ad hoc, informal communication structures. As far as I understand management, these alternative organization models are deeply flawed.
Read the rest of this entry »

Exercise: a Pesky Error Message

August 28th, 2012

We have some files containing sequences of integers, one number per line. We want to compute the sum of the numbers. The catch is that the numbers are written as roman numerals.

This code works, but there is a problem: every time we run the tests, a nasty error message appears on standard error. We’d like to be able to turn off the error message when we run the tests. The error message should still appear when the code is run in production.

Your job: make it so that the error message does not appear when the tests are run.

See here the starting code. Copy it in a file pesky.rb and run it with the commmand ruby pesky.rb.

class RomanNumeral
  def initialize(string)
    @value = string
  end

  def to_i
    case @value
    when "I"; 1
    when "II"; 2
    when "III"; 3
    when "IV"; 4
    else
      STDERR.puts("Invalid numeral '#{@value}'")
      0
    end
  end
end

class RomanSequence
  def initialize(input)
    @input = input
  end

  def sum
    result = 0
    @input.lines.each do |line|
      roman = RomanNumeral.new(line.chomp)
      result += roman.to_i
    end
    return result
  end  
end

require "minitest/autorun"
require "stringio"

class RomanSequenceTest < MiniTest::Unit::TestCase
  def test_adds_roman_numerals
    input = StringIO.new("I\nII\nIII\n")
    assert_equal 6, RomanSequence.new(input).sum
  end

  def test_skips_invalid_numerals
    input = StringIO.new("I\nY\nIII\n")
    assert_equal 4, RomanSequence.new(input).sum
  end  
end

PLEASE

Solve the exercise before continuing.

*            *
*

Read the rest of this entry »

About Kent Beck’s Stepping Stone strategy

August 15th, 2012

Most papers in computer science describe how their author learned what someone else already knew. — Peter Landin

And blog posts too. — Matteo

This is a follow-up to my previous post.

In his Responsive Design presentation, Kent Beck talks about the Stepping Stone strategy. He talks about two kinds of stepping stones: one is about building abstractions that might make your work easier.

The first kind of Stepping Stone

He talks about the abstractions that allow Google people to perform their magic, like BigTable which is built upon Google File System, which is built upon a Distributed Lock Manager. I can relate with this view of “design”. It’s powerful. I like this quote from Daniel Jackson

Part of what allowed thousands of engineers to build scalable systems at Google is that really smart engineers like Jeff Dean and Sanjay Ghemawat built simple but versatile abstractions like MapReduce, SSTable, protocol buffers, and the like. Part of what allowed Facebook engineering to scale up is the focus on similarly core abstractions like Thrift, Scribe, and Hive. And part of what allows designers to build products effectively at Quora is that Webnode and Livenode are fairly easy to understand and build on top of.

Keeping core abstractions simple and general reduces the need for custom solutions and increases the team’s familiarity and expertise with the common abstractions. The growing popularity and reliability of systems like Memcached, Redis, MongoDB, etc. have reduced the need to build custom storage and caching systems. Funneling the team’s focus onto a small number of core abstractions rather than fragmenting it over many ad-hoc solutions means that common libraries get more robust, monitoring gets more intelligent, performance characteristics get better understood, and tests get more comprehensive. All of this helps contribute to a simpler system with reduced operational burden.

The interesting thing about this strategy is that it’s the one that you *don’t* do when you TDD. Kent Beck is saying, in effect, “you can do plenty of work with TDD, but sometimes it’s beneficial to stop doing TDD, and try instead to guess that you will need some intermediate or related thing. Then you build that something, and the original problem that you had might be much easier to solve.”

Let’s have an example, shall we? The Sudoku Solver, for instance. What Stepping Stones could be useful? From Artificial Intelligence classes taken long ago, and from Peter Norvig’s excellent article on the subject, I know that two ways to solve a Sudoku are depth-first search and constraint propagation. If I wanted to try depth-first search, my Stepping Stone would be the depth-first search algorithm itself.

Aside. What is depth-first search? Depth-first search means that you model your problem as a search through the space of possible moves. Given any position you’re in, you try the first possible move, which brings you in a new position.

  1. If the new position is a winning position, you’re done.
  2. If the new position is not a winning one, you make the first possible move from here, and repeat.
  3. If there are no possible moves in the new position, then you back-track to the previous one, and try the second possible move.

This algorithm is very simple. It works for a huge variety of problems; all you need is that you can enumerate moves from any position. It does not always find a solution; for instance, on some problems it can enter an infinite loop. But this can’t happen with Sudoku, so depth-first is an excellent algorithm to try with Sudoku.

The depth-first algorithm is easy to test-drive. Once I have this algorithm, the original problem “solve Sudoku” is reduced to “from any given Sudoku board, enumerate possible moves”, which is much simpler.

Not only the problem is made simpler; it’s also easier to check that I implemented it correctly.

What about the second way to solve Sudoku, that is constraint propagation? This brings us to the next section.

The second kind of Stepping Stone

The other kind of Stepping Stone that Kent Beck mentions in the video is related to the old Lisp maxim

Q. How do you solve a difficult problem?
A. You implement a language in which the problem is trivial to solve.

This is the opposite of the Simplification strategy. Sometimes, the easiest way to solve a problem is to generalize it. Let’s make an example. The Http access log report problem from my last post would be easy to solve with SQL. Assuming that the access log lines are in a database table, I could do

select date,
  sum(if(status >= 200 and status < 300, 1, 0)) as 2xx-results,
  sum(if(status >= 300 and status < 400, 1, 0)) as 3xx-results,
  sum(if(status >= 400 and status < 500, 1, 0)) as 4xx-results,
  sum(if(status >= 500 and status < 600, 1, 0)) as 5xx-results,
from access_log
group by date

How’s that for simplicity? But why couldn’t I write a similar program in Ruby? I would start by imagining what the solution would look like:

Report.new(access_log).
  group_by(:date).
  select(:date).
  sum(:2xx_results) {|x| (200...300) === x.status ? 1 : 0 }.
  sum(:3xx_results) {|x| (300...400) === x.status ? 1 : 0 }.
  sum(:4xx_results) {|x| (400...500) === x.status ? 1 : 0 }.
  sum(:5xx_results) {|x| (500...600) === x.status ? 1 : 0 }  

Then I could implement the various needed features of Report one by one. That would give me a nice list of small, safe steps to implement.

It seems a paradox that solving a more general problem may be easier, but we can see from this example how this can be true. Another example is in my post Feeling like carrying bags of sand.

Getting back to the Sudoku solver, a possible Stepping Stone of the second variety could be to implement a simple constraint propagation system. If you’re interested, there is a nice discussion in Structure and Interpretation of Computer Programs.

Conclusion

Stepping Stone is an antidote to the tendency to carry YAGNI too far, to the point that we refuse to write anything that is even slightly more general than what is exactly required. We’re not violating YAGNI when it’s simpler to write a Domain-Specific Language than trying to solve a complicated problem directly.

XP warns against premature abstraction. Let’s leave aside for the moment the observation that generalization is not the same thing as abstraction. It is true that by creating abstractions you run the risk of creating things that turn out not to be needed. Kent Beck warns against this in the video. Yet, I believe Stepping Stones are a very important dimension to design, one that is not used when we do TDD and that should be considered whenever we have a difficult programming problem.

Further reading

The idea of “layers of abstractions” is (I believe) due to Edsger Dijkstra. See his paper The Structure of the “THE”-Multiprogramming System. It’s interesting to compare how his idea of powerful abstractions differs from the weaker concept of “layers” in business applications.

Kent Beck’s Simplification strategy

August 4th, 2012

I finally got round to watching Kent Beck’s video on Responsive Design. It’s a very interesting video, for me. I mean, it’s very interesting from the point of view of someone who has been doing test-driven development for years, and who for years has watched people work with TDD.

I’d like to comment on an interesting part of this video, the one where he talks about the Four Strategies, which are: Leap, Parallel, Stepping Stone and Simplification. Carlo Pescio says he is unsure about the difference between Stepping Stone and Simplification; but I think I grok what Kent means. I think that Simplification is a crucial strategy that is enabled by TDD; it would be too difficult and expensive to do without tests. On the other hand, if you do TDD, you need to understand Simplification, otherwise your tests will not sing.

So what is the Simplification strategy? Kent says (around 00:58)

I can imagine that I want to get over here, so what is the least I can do that would be progress? Suppose I want a big multi-threaded application, so I think, I can’t do a big multi-threaded application in one step. It’s too big a step, it’s not a safe step. So what am I going to do? Well, there are different possibilities, but the one I do all the time is to say, “Well, which of those am I going to eliminate? Is it going to be a multi-threaded application doing something trivial, or is it going to be a single-threaded application doing something more substantial?” […] I do this *further* than most people that I encounter. Someone says “I want a Sudoku solver and I do a 16 by 16 grid” and I say “Well, how about a 1 by 1 grid?”

I think that the Sudoku example nails it. Let’s see where the “one-cell Sudoku” leads. What would the test look like? I suppose it would be like

def test_solve_one_cell_sudoku
  sudoku = Sudoku.new(1)
  sudoku.solve
  assert_equal "1", sudoku[0,0]
end

Any number would solve the one-cell Sudoku, so I arbitrarily chose “1” as the solution. What would the next test look like? I would write a two-cell Sudoku. Mind you, not a 2-by-2 Sudoku; that would contain four cells! Just a non-square, two cells Sudoku:

def test_solve_two_cells_sudoku
  sudoku = Sudoku.new(1, 2) # one row, two columns
  sudoku.solve
  assert_equal "1", sudoku[0,0]
  assert_equal "2", sudoku[0,1]
end

Again, any two numbers would solve the two-cell Sudoku, as long as they are different. So I arbitrarily chose “1” and “2”.

Now, these arbitrary choices are starting to bug me. If it’s true that any other two distinct numbers would do, why do I over-constrain the test? By forcing “1” and “2” I’m placing an extra constraint that might make things more difficult later. I could make my tests less brittle by specifying exactly what I want:

def test_solve_one_cell_sudoku
  sudoku = Sudoku.new(1)
  sudoku.solve
  assert_is_digit sudoku[0,0]
end

def assert_is_digit(string)
  assert_match /\d/, string
end

and

def test_solve_two_cells_sudoku
  sudoku = Sudoku.new(1, 2) # one row, two columns
  sudoku.solve
  assert_is_digit sudoku[0,0]
  assert_is_digit sudoku[0,1]
  assert_not_equal sudoku[0,0], sudoku[0,1]
end

OK, now the tests are less brittle, but they are also much less clear! I like my tests to by very simple examples of what the production code does. I don’t like it when I have to think about what the tests mean

Big flash!

I now understand that the indeterminacy of the original tests is in the fact that the solver can choose from an alphabet of symbols that is larger that the Sudoku problem. If I constrain the first test to “choose” its symbol from the alphabet that contains only the “1” symbol, then the indeterminacy disappears.

def test_solve_one_cell_sudoku
  sudoku = Sudoku.new(["1"], 1)
  sudoku.solve
  assert_equal "1", sudoku[0,0]
end

Same for the second test:

def test_solve_two_cells_sudoku
  sudoku = Sudoku.new(["1", "2"], 1, 2)
  sudoku.solve
  assert_equal "1", sudoku[0,0]
  assert_equal "2", sudoku[0,1]
end

So the Simplification strategy led me to discover a concept, the alphabet, that I was previously ignoring. Continuing along this route, I will probably be led to “discover” the concept of constraint, that will be a useful Stepping Stone to solve the whole Sudoku; not to mention that constraints will be useful to solve Sudoku variants as well. On the other hand, if I try to solve the whole 9×9 Sudoku problem from the start, I will probably end up writing procedural crap, just as I previously did :-)

Let me give another example of Simplification: suppose that you have to write a batch command that produces a report out of web access log files. This is not a theoretical example; I had to do that more than once myself, and the team I’m currently coaching had to solve this exact problem. Suppose that the output we want looks like this:

date          2xx-results    3xx-results   4xx-results   5xx-results
29/Jul/2011          1223             23           456            12
01/Aug/2011          1212             24            11           123

As a TDD beginner I would have started with testing an empty report (that is easy to do, but does not teach you much) and then, for a second test, a one line report like this:

date          2xx-results    3xx-results   4xx-results   5xx-results
29/Jul/2011             1              0             0             0

The nasty thing is that this second test contains most of the complexity of the whole problem. It’s too big a step. If I try to solve it in one Leap, it will probably lead to procedural crap. It is better to use the Child Test pattern (from TDD By Example, p. 143). But I don’t have much guidance on how to choose my Child Test. What objects do I need? As a TDD beginner I would often come up with silly objects that would be just procedural crap disguised by objects.

Here is where I would do best to use a Strategy. The Stepping Stone strategy could help: for instance, I could invent a DSL for web access reports. If I did that, then writing my original report would be easy!

Or I could use the Simplification strategy: start with a one-column, one-row report. That would give me the guidance I need to find the next small, safe step. I prepared a kata for this problem; I will probably publish it on github some day.

Design tip: Name things after what they ARE, not after where they are used!

July 9th, 2012

This is another thing I often see people do. Consider this code fragment:

<div id="sidebar">
  <%= render partial: "sidebar" %>
</div>

This basically says that in a portion of a layout that is called “sidebar” we want to render a partial template that is also called “sidebar”. Hmmmm. Duplication here! Why are these two things called the same? Let’s look into the “sidebar” template; we discover that it contains code that produces a navigation menu. Aha! Why not call the partial “navigation_menu” then?

The mistake in the above code was that the partial template was named after where it is used, not after what it is. When you correct that problem, what you get is

<div id="sidebar">
  <%= render partial: "navigation_menu" %>
</div>

and the duplication problem also disappears.

Another example I’ve seen in production code:

class Money {
  public int getValueForAuthorizationSystem() {
    return value.scaleByPowerOfTen(2).intValue();
  }   
}

Class Money should not really care what the Authorization System wants… it should be concerned with returning the amount in cents!

class Money {
  public int inCents() {
    return value.scaleByPowerOfTen(2).intValue();
  }   
}

Why is it that programmers name things after where they are used and not after what the things really are? I think that one major cause is the desire to break up a monolithic thing, because this is what design should be about, but doing it poorly. Suppose I have a 100 lines long method.

def long_method
  # ... ... ...
end

I use extract method a number of times and I end up with something like this:

def long_method
  # ...
  step_one(foo, bar, baz)
  step_two(foo, bar, zork)
  step_three(agh, foo, bar, zork)
  step_four(agh, bar, baz, zork)
end

what happened here? We took a procedure and cut it in arbitrary pieces. The result is that we have a number of methods that we can’t give meaningful names to, other than “step_X” and that have an excessive number of arguments. This means that we cut the procedure poorly. Note that each of the methods can only be used in the place where it is called; it depends on the context and can’t be used anywhere else.

What to do then? What I would do is to get back to the 100-lines method and try to cut it another way. Perhaps we will arrive at something like

def formerly_long_method
  do_something(foo)
  do_something(bar)
  do_something(baz, do_something(zork))
end

where the same helper method has been used in more than one way.

In conclusion: names like foobarForX should always be challenged. They might be fixed by looking again at what “foobarForX” really is; or they might be an indication that the design is cut the wrong way. In that case you try the cut again and see if you can improve the situation. Look for things that are meaningful concepts in the problem domain!

Design tip: Watch how you cut!

July 9th, 2012

The number one design problem I see people do is writing code that is overspecific. By that I mean code that can only be used in one place and can never be reused. Programmers should always try to write code that is generic, reusable and independent from the context where it is used.

Design is (mainly) about breaking the solution in small, manageable pieces. However, how you do the cut is important; if you cut poorly, you end up with a worse mess than when you had a single monolithic piece.

This is an example I saw last week. It is Rails-specific but I think non-Rails readers will be able to follow the logic. The application had a general HTML layout that contained something like this:

<table>
  <tr>
    <!-- main application content -->
    <td valign="top"><%= yield %></td>

    <!-- right sidebar -->
    <%= yield :right_sidebar %>
  </tr>
</table>

Ignore for a second the use of a table for layout (that requires a separate post!) Can you see the design flaw?

Yes, the flaw is that all pages that want to add something to the right sidebar must remember to wrap it in <td valign="top">...</td>. Like this:

<% content_for :right_sidebar do %>
<td valign="top">
  BANNER GOES HERE
</td>
<% end %>

In practice the code that goes in the right sidebar is dependent on the context where it is used. The solution in this case is simple: keep the context in the calling code, not in the called code. Like this:

<table>
  <tr>
    <!-- main application content -->
    <td valign="top"><%= yield %></td>

    <!-- right sidebar -->
    <td valign="top">
      <%= yield :right_sidebar %>
    </td>
  </tr>
</table>

This has the problem that if the page does not want to put anything in the right banner, it will still have an empty td element. Fortunately, Rails has a clean solution for this:

<table>
  <tr>
    <!-- main application content -->
    <td valign="top"><%= yield %></td>

    <!-- right sidebar -->
    <% if content_for? :right_sidebar %>
    <td valign="top">
      <%= yield :right_sidebar %>
    </td>
    <% end %>
  </tr>
</table>

The code that uses the right sidebar is now simplified to

<% content_for :right_sidebar do %>
  BANNER GOES HERE
<% end %>

And this code will not have to change when the application layout is changed to something that does not use tables.

Watch how you cut!

Names objects after things, not actions!

June 22nd, 2012

The book Object Oriented Programming by Peter Coad and Jill Nicola is a very enjoyable introduction to object thinking. One of the gems it contains is the following:

The “-er-er” principle. Challenge any class name that ends in “-er.” If it has no parts, change the name of the class to what each object is managing. If it has parts, put as much work in the parts that the parts know enough to do themselves.

(For instance: if you have a PersonManager, delete it and move all of its method to class Person.)

Carlo Pescio wrote an excellent blog post on this. Since I discovered this, I found that choosing good names for objects makes it much easier to write good code. It’s much easier to decide where a responsibility belongs.

Now, the team I’m currently coaching with has a Rails project. As a consequence, I’m reading the latest things on object-oriented programming and Ruby and Rails. I read Objects On Rails and from there I discovered Sandi Metz and her chapter on interfaces in Ruby. Sandi really has a way with explaining objects; she has the knack of explaining deep principles in very easy terms; and she does it with examples. Three cheers for examples!

In this video from 2009 on SOLID Object-Oriented Design, Sandi shows how to take an everyday Ruby script and turn it into an OO thinking lesson. Please go and watch it; return when you have :-)

*              *
*

Did you notice the one wart in Sandi’s design? Yes, it’s the FtpDownloader class!! Can we get rid of the “-er” name? What would a better name be?

“Downloading” is an action. What are we doing the action to? We’re doing it to a file. So we could call the class File, but it’s not a great name. You don’t download just any file; you only download remote files. Bingo! Why don’t we call it RemoteFile? My design would have


RemoteFile.new("ftp://server/dir/file.txt").download

This gets rid of a bit of duplication in Sandi’s design. While she would have a configuration like

ftp_host: server
ftp_path: /dir
ftp_file: file.txt
downloader_class: FtpDownloader

I would just have

file_url: ftp://server/dir/file.txt

Instead of providing multiple FtpDownloader, HttpDownloader, SftpDownloader classes I would have a single class that decides how to download based on the protocol in the url.

class RemoteFile
  def initialize(url)
    @url = URI(url)
  end

  def download
    self.send("#{@url.scheme}_download")
  end
  
  def ftp_download; ... end
  def sftp_download;  ... end
  def http_download;  ... end
end

This makes it in my opinion more cohesive than having many single-method downloader classes. If you disagree, you could still use the trick that Sandi uses. But I would still not calling them FtpDownloader. Since they represent a protocol, I would maybe call them something like FtpProtocol. The name of a thing, not of an action.

   
class RemoteFile
  def initialize(url)
    @url = URI(url)
  end

  def download
    # if @url is "http://foo", we get HttpProtocol
    class_name = @url.scheme.capitalize + "Protocol"
    Kernel.const_get(class_name).new.get(@url)
  end
end

If I have an FtpProtocol object, I can ask it to get or put a file at a url. It becomes a magnet for all the functionality that belongs to the Ftp protocol. This would be more cohesive that having both FtpDownloader and FtpUploader :-)