Refactor to remove duplication
Sunday, March 17th, 2013The gist of the message I got from Extreme Programming in the early years of this century was that good code would just emerge if we followed the four rules of Simple Design, as advocated by Kent Beck:
The code is simple enough when it:
- Runs all the tests
- Has no duplicated logic
- States every intention important to the programmers
- Has the fewest possible classes and methods
in this order.
Various formulation of this list exist. Given that “Runs all the test” is granted (we don’t want broken code) and “Has the fewest possible classes and methods” is also granted (simplicity seems to imply “few things”), I often wondered how the other two rules can be used to generate good design.
J.B. Rainsberger says:
Put more simply, if you master removing duplication and fixing bad names, then I claim you master object-oriented design
That’s a bold claim! How can this be?
In TDD By Example, Kent Beck says that the third step of TDD is not just “refactor”, but refactor to remove duplication. The point then is “what is duplication?” Is that just duplicated lines of code?
Over time I started collecting interesting examples of duplication. It turns out that seeing duplication is a skill that has to be learned.
Obvious duplication
The most obvious form of duplication is duplicated lines of code. That is usually cured by extracting the common lines in a method. The extraction of a method is often followed by moving the method to a new class, or another existing class.
This is just one way the four rules push towards a better design. There are others.
Duplication in names
There is a beautiful simple exercise in chapter 1 of Essential Skills for Agile Development. It asks to remove duplication in code that looks like this one:
class Organization { String primaryContactFirstName; String primaryContactLastName; String primaryContactPhone; String secondaryContactFirstName; String secondaryContactLastName; String secondaryContactPhone; ... }
There is no “code” here, just data declarations. There is quite obvious duplication in the names though. The string “primaryContact” is repeated three times. The sequence “firstName, lastName, phone” is repeated two times. The point is to arrive at something like
class Organization { Contact primaryContact; Contact secondaryContact; ... }
Primitive obsession
Primitive Obsession is one of the classic smells from the Refactoring book. It happens when we use primitive types in place of our own types. You can see it in the Organization example above, where we use a String to represent a phone number, and again String to represent names of persons.
Primitive Obsession is one of the most common smells. Most developers are not even aware that it is a problem. It is also a smell that takes a long time to remove. And … it is a form of duplication. We are duplicating the choice of representation for a concept. We can see the cost of this choice when we decide that, say, we want to stop using Java native Date type and start using the Joda Time replacement; then it takes ages to convert all of our code to the new type.
But when we do this replacement, we are still falling in the same trap; if we change our mind again, it will again take a very long time to convert the code. What is the answer to this problem? Should we think long and hard before we start our project on which Date class we want to use, to minimize the chance of having to change it later? Would that work?
Of course not! What we really want is to be able to change our mind quickly, whenever we learn that our first choice was not adequate. Again, the key is in removing the duplication: make the choice in a single place; all of our code will use our own XDate class. Our XDate could use java.util.Date internally, or something else, but code outside XDate does not know and does not care.
(Aside. Using our own Date type has the beneficial effect that all the static method that are usually found in a number of DateUtils classes become instance methods in XDate. Another smell is removed! End Aside.)
Using third party APIs directly
The primitive obsession smell is just an example of a more general problem: when we use someone else’s API, we use it directly throughout our code. When we need, say, to download a data file from some service, what do we do? Do we start littering our code with calls to java.net.URL and java.net.HttpUrlConnection? Heavens, no! That would be committing ourselves to a particular way to solve the http connection problem, and duplicating it in all the places where we need to do it.
That would also be a violation of what Robert Martin calls the “Dependency Inversion Principle”: depend on abstractions, not on concretions. The java.net.URL is a concrete way to solve the “download a remote file” problem.
A better solution would be to hide the decision inside a class. Just define a RemoteFile class with a download method. This way we avoid duplication.
Alas, this is another smell that most developers don’t even know about.
Hiding design decisions
This brings us to a connection that was not obvious: removing duplication leads to the famous paper by David Parnas On the criteria to be used in decomposing systems into modules. Quote:
We propose instead that one begins with a list of difficult design decisions or design decisions which are likely to change. Each module is then designed to hide such a decision from the others. Since, in most cases, design decisions transcend time of execution, modules will not correspond to steps in the processing.
There you have it: a class is a way to codify and centralize a design decision. A design decision that is duplicated should be moved to its proper home in a single place. This is one of the things that an XPer must learn, in order to become effective at becoming faster than the customer :-)
Further reading: an essay by James Shore on why removing duplication is key to making continuous design work: it mitigates the risk that unforeseen circumstances will force us to change an earlier decision.