Design by Contract vs. Test-Driven Development

A great many years ago I was fascinated by Bertrand Meyer’s book “Object Oriented Program Construction.” One of the many remarkable things in that book is the idea of “Design By Contract”, where you specify what a method does by means of a logical pre– and post–condicion. Consider the square root function:

  pre: x ≥ 0
  post: abs(y*y - x) < epsilon

This is a very good specification:

  • It’s efficiently executable.
  • The intent is clear.
  • Gives no hint about how to implement it, i.e., it does not contain design ideas.

Now I’m reading the Scrumban book by Corey Ladas. One thing Corey says is that Test-Driven Development is good, but not as good as Design By Contract; in fact, he says, TDD might be a stepping stone to DBC.

I have never met someone who does DBC. This by itself does not mean a lot, as I’m not widely travelled, in working experience. I’d be quite interested in reading experiences about this. Anyway I suspect that there are fundamental reasons why it’s not widely practiced. My hypothesis is that the square root is an exceptionally good DBC example, but most functions are not as easily specified by contract.

Consider for example the function “specified” by the following TDD-style tests:

  it "returns empty string for empty string" do
    assert_equal "", squeeze("")
  end
  
  it "replaces sequences of equal characters with one" do
    assert_equal "abca", squeeze("aaabbccccaa")
  end

I think it’s quite clear how we should implement this function, even though we are given just two examples of its behaviour. How would we specify it with DBC? I’m not sure what is the best way. I thought about this for a long time (believe me) and this is the best I came up with:

pre: true // any string is valid input
post: 
  let s be the input string.
  let s' the output string.
  (∀ i: s'[i-1] ≠ s'[i])
  ∧
  (∀ i,j: i < j ⇒ 
    (∃ k,l: k < l ∧ s'[i] = s[k] ∧ s'[j] = s[l] ))
  )

You read it like this:

  • The output string does not contain consecutive equal values (that was easy!)
  • Two distinct values in the output string must have been present in the input string and in the same order.

Now is this a correct specification? Can I *prove* that this specification is correct? Well, this is a specification, so it’s supposed to be self-evidently correct. But is it? I don’t think it is self evident at all. I think it takes a bit of thought to understand it. It’s not easy to find a way to prove it correct. One way is to look for counterexamples, that is, find a pair (s,s’) that satisfies the specification yet contradicts my intuitive notion of what “squeeze” should do. (In fact, I can think of at least two examples that prove that this specification has holes. Can you find them?) In other words, it turns out that the only way to convince myself that this specification is correct is by testing it on carefully chosen examples!


*                      *
*

Well it’s not true that I can’t find a clearer specification. Consider this other one:

  pre: true
  post: let squeeze behave like f where
    f([]) = []
    f([x]) = [x]
    f([x,x|rest]) = f([x|rest])
    f([x,y|rest]) = x + f([y|rest]) if x ≠ y    

The notation [x,y|rest] means “a string that begins with character x, then character y, then 0 or more other characters.” My observations follow.

This is a purely mathematical specification of function squeeze, as pure as the previous one. It’s arguably clearer than the previous specification. I think this one is correct. I could test it against a few cases just to make sure, but it seems OK to me.

This spec can be executed efficiently, while checking an input-output pair against the other spec requires quadratic time.

But there’s a problem; this spec is actually a program. Once I have this spec I can use *this* for an implementation and work no more. I already have my implementation.

In conclusion, this is my objection to DBC: I suspect that the square root is an exception; most methods are too complex to specify with pure logic, or require us to write a functional program that solves the problem. In both cases we are left with the need to test our specification on specific examples.

This is why I think TDD is for most purposes more effective than DBC. In general, concrete examples are by orders of magnitude simpler to write and more self-evident than universal statements. Most universal statements will have to be tested against examples anyway, or we wouldn’t be confident in their correctness.

Long live examples!

16 Responses to “Design by Contract vs. Test-Driven Development”

  1. Simone Busoli Says:

    I’m not doing DBC but I think you didn’t get it completely right.

    DBC, as far as I understood it, is about specifying /quantitative/ characteristics of return values (in the case of post-conditions), not about qualitative or semantics. If you have to write the same body of your method in the post conditions then it is certainly wrong.

    Post conditions are about “if you respect pre-conditions, you’ll get a return value which respects the post-conditions”. This leaves a lot to be defined, but in your example a meaningful post-condition would simply entail that the return value is a non-empty string, for non-empty inputs.

    In the end DBC does not replace TDD, nor testing in general.

  2. Ilias Bartolini Says:

    Hi Matteo,
    I tried Eiffel on a small dummy project many years ago.
    I agree with all your statements: specification by example is often more readable and easy to modify or maintain.

    I think that one missing point is that TDD and DBC are *not* alternative practices.
    If I’m using a language that support DBC I would still write use TDD.

    Contracts are code executed at runtime and their constraints checked also in your production code. You still can use tests if you want to execute them, specify examples, drive your emergent design, etc… still using tests on top of DBC.

    One point point why DBC may be considered better (don’t know if this was Corey intention) is that its “safety net” and “check constraint” can be more strict than one given writing tests.

    I heard many people saying DBC is more difficult way of programming and developers are not used to that… but many people were saying the same thing about TDD many years ago and we found its not true :)

    ciao,
    Ilias

  3. matteo Says:

    @Simone: yes, I suspect most programmers use DBC as a way to specify necessary constraints (for instance, this parameter must be >0) rather than providing a precise, complete definition of what the function should do. But if you leave it at that, the value of DBC is small. I think the value of DBC is not in the runtime assertions; it’s in the /thought/ that goes in defining the contracts!

    @Ilias: Yes, I suppose you can combine the two techniques, and I do think DBC has value. But I think that formal specification is hard, and we don’t see enough examples! Where are the DBC screencasts? Where can I read a “DBC Recipes” handbook?

  4. Jacopo Says:

    when I first learnt DBC, one question quickly came to my mind: where do we write contracts? specification or documentation was initial answer, but I later thought automatic tests were an even better idea!

    more on examples as a specification media: “Specification By Examples”, by (omniscent) Fowler
    * http://martinfowler.com/bliki/SpecificationByExample.html

    ciao!
    -jacopo-

  5. Carlo Pescio Says:

    (this is partially tongue-in-cheek; I’m not a heavy user of DBC, anyway)

    a design method qualifies as a design method :-) when it’s actually helping/poking you as you shape your code. If it doesn’t, it’s not really a design method.

    In your case, I would say that DBC is pushing you to stop thinking of squeeze as a function (yeah, I know, it’s kinda hard :-)).
    Once you give up and redefine squeeze as a class, perhaps SqueezedString, you’ll quickly see that:

    – the invariant of the class is the “easy part” of your spec (no consecutive equal values)

    – a Concat( char ch ) method is trivial to implement: if the last char is the same as ch, do nothing; otherwise, append ch.
    A useful/robust postcondition for Concat is then equally trivial:
    – ch must indeed be at the end of the string
    – the first len’ characters must be unchanged [where len’ is the lenght before Concat]
    – len’ <= len <= len'+1
    (the invariant must still be respected, as usual).
    [Note that this is a contract and not an executable spec]

    – if you really want a constructor (or Append method) taking a string, the safest/easier way would then be to base it on Concat. In this case, there is a strong temptation to underspecify the post-condition, lulled by the sense of security of a trivial implementation (a simple loop over a well-specified method). For instance, you may just check that every char in the source string is also in the squeezed string, and ignore the order constraint.

    – of course, a faster implementation based on a string builder might be needed, but the contract can withstand that.

    [Slightly] more seriously, every specification technique has its own sweet spot. DBC is hard to apply to functions with "complex" internal logic which cannot be abstracted in a contract, so it may push you to split your function or even to turn it into a class. This is either a boon or a hindrance, depending on how you're looking at things :-) [and honestly, depending on the problem you're trying to solve]

  6. matteo Says:

    @Jacopo: good link, I think Fowler agrees with my post!

    @Carlo: thanks for your comment, it’s insightful. I didn’t see that DBC could be pushing towards breaking the function in “smaller pieces”, so to speak, that can be specified separately.

  7. Mario Says:

    I’m not a DBC practitioner, at least not in the form proposed by Bertrand Mayer, but I do tend to use a lot of assertions in my code, and I find them extremely useful.

    What I think you’re missing here is that assertions, postconditions and class invariants don’t have to completely specify a method’s behaviour: they can be very useful even if they check some necessary but not sufficient conditions on the result. So you can just write the parts of a contract that are easy to write, and ignore the rest. This is analogous to what happens with tests: the mere fact that your squeeze function returns the correct value for the input “aaabbccccaa” doesn’t mean it will always return the correct value. In other words, passing all tests is a necessary but not sufficient condition for the correctness of piece of code. But that doesn’t mean carefully chosen tests are not useful. So why should it be any different for contracts?

    And if you subscribe to this view, you’ll find that one nice thing about assertions/preconditions/postconditions/… is that writing them requires very little effort (while testing often involves writing a lot of code) and that they have a pretty high ROI. So why not use them alongside testing?

  8. matteo Says:

    Hi Mario,

    it’s true that PRE and POST assertions don’t need to specify the output completely. But then it’s difficult to decide when a method has been specified precisely enough. Do we stop refining our assertions when they get “too difficult” to refine? If I assert that for “squeeze” the output string length must be ≤ than the input string length, would it give me much value? Many assertions that are very easy to write are not worth much.

    On the other hand, a test proves that the method works correctly in 1 concrete case. If I choose the case well, it will give me confidence that many other similar cases work correctly.

    You say that you use a lot of assertions in your code and find them extremely useful. Do you write the assertions before the code? Do you find that the usefulness comes from the insight that comes when you write them, or from the runtime checks?

  9. Mario Says:

    I totally disagree with your statement that assertions that are easy to write are not worth much. Actually, that’s the kind of statement that I would expect from someone who has never really tried to use assertions in practice. But let me try to explain in more detail why I think that, and give you some examples.

    First of all, some quick answers: I write my assertion while I write the code, sometimes afterwards, never before. (and I do the same with tests: I find that neither gives any useful non-trivial insight in the design phase. Though I guess you would disagree here). Most of the time, if some assertion/contract is too complex, I usually don’t bother to code it (but see what I say at the end of the post). And in the case of a function like “squeeze” for example, I can’t see any contract worth enforcing, so I would just rely on a few tests there.

    Also, many assertions that I write are not pre/postconditions, but are just embedded in the body of functions. Whenever the code I’m writing depends on some non-trivial condition, but I’m not entirely sure the surrounding code maintains it, I usually prefer to enforce it with an assertion.

    Before writing this post I looked at some code I wrote recently that I believe can make a pretty good example, since it’s the kind of relatively complex algorithmical code that’s next to impossible for me to get right without some very serious testing. It’s the solutions to some (about 10 of them) of the puzzles posted by Facebook on their web site (see http://www.facebook.com/careers/puzzles.php). I found that my code had about one assertion every 20 “real” lines of code. So it’s hardly a comprehensive coverage. Many functions/methods didn’t have any. All of them (except one) where just simple one-liners that took me at most ten seconds to write. Did they make testing redundant, or did they uncover all the bugs in the code? Surely not. Not even close to that, actually. But in practice they turned out to be very useful, allowing me to catch many bugs early and with very little effort.

    In this particular case, would I have caught those bugs anyway eventually? I guess so, given how thoroughly I tested the code (but again, read the end of the post). Although I’m sure it would have taken both more time and more effort. But in practice not all code allows for such a complete testing in any reasonable amount of time (just to give you an example, think of a method that updates a large and complex data structure. How do you test that?). And in my experience, assertions/contracts do help find bugs that would otherwise escape testing.

    To answer you last question, I find them useful in many ways. As you mentioned, the most important part is the runtime checks. But there are also other factors:

    – Most of the time they pinpoint the offending lines of code. That’s something I do not get with testing alone. If a function fails a test, more often that not that doesn’t clearly show me where the error originated.

    – When used as preconditions, they are a useful form of documentation. They remind me what conditions the parameters must satisfy.

    – They sometimes also help me to understand better how my code works, by writing down some conditions that I expect to be true during program execution.

    I also would like to spend a few words to defend another technique that I believe you dismissed too quickly in your post. You said that the second version of the postcondition is actually executable code, so that it could be used as an implementation. But quite apart from the fact that even in your simple case that’s not true (I believe that implementation would cause a stack overflow if the input string were long enough), that’s again missing the point. Let me use again the Facebook puzzles example. To test my code, in all cases except one, I didn’t write any standard tests. I just wrote two versions of it: I first coded a very simple brute-force solution, that wouldn’t have been usable in practice because too slow. Then I wrote the optimized algorithm, that was sometimes nearly an order of magnitude more complex. And then I randomly generated thousands of test cases, run them through both algorithms and compared the results. In the end, every piece of code I submitted passed all the Facebook tests the first time, so the technique was certainly effective in removing bugs (Facebook sources say that on average only about 10% of the submissions they receive manage to pass all their tests). And all with a reasonably small effort. Now, you can make up your own mind about whether this tecnique should be classified as a form of testing (comparison testing, as I believe it’s usually called in the literature) or as a thorough use of postconditions. But I’m pretty sure that if I had tried to debug my code using only “example testing” not only I would have had to do much more work (and very boring work at that), but I wouldn’t have got anywhere close to the same level of testing.

    In general, while I agree that “example testing” is in many case necessary, and that you can get reasonably bug-free code with it alone if you work hard enough at it, I also believe you’ll get better result with significantly less effort if you combine it with assertions/contracts and comparison testing (and maybe other techniques).

  10. matteo Says:

    Hi Mario,

    I have nothing in general against the use of assertions; I just find them less effective than TDD. They are a different technique and I find that while assertions are very useful when coding in a non-TDD way, I don’t miss them when I do TDD.

    That’s because when you do TDD you get to write code that’s much shorter; you have methods that rarely are longer than 10 lines. You get to break your problem in much smaller pieces that you test and think about more thouroughly.

    Even doing DBC would get you to much smaller pieces, as the comment by Carlo Pescio suggests (though I don’t have real experience in DBC).

    I’m all in favor of doing plenty of testing; I think your approach of comparing two algorithms is a very good idea. But we are comparing different things: *testing* techniques versus *design* techniques.

    DBC and TDD are *design* techniques, not test techniques. It happens that they produce test code as a side-product, which is good; but the goal is to get to a good design, not (primarily) to ensure that the code is correct.

    (About your remark on the recursive implementation of squeeze: sure, you could rewrite it as a tail-recursive function that does not stack-overflow, but then you’d make it less readable. The goal in DBC is to have obviously correct specs. The version I give is obviously correct from a mathematical point of view.)

  11. Balthazar Says:

    I have been contemplating about getting into DbC, and searching for discussions about whether it’s really good in practice. That’s how I stumbled across this page. One reason for this is that for my latest project I started sprinkling my code with asertions all over, and it surprised me how I immediately found bugs that would have taken so much longer to track down earlier. Therefore, I started thinking about whether I should take it a step further, to write formal specifications that would act both as documentation for my suppliers, and at the same time act as early bug catchers during development.

    I must say it seems people don’t really understand what DbC is, and your post is a great example. It’s super easy to write the contract you want. Basically, you write a function that checks whether all characters in the output string are unique. Then, you use the function as your post-condition, something like this:

    require
    input string is not null: inputString /= NULL; // Important check!!!

    ensure
    all characters are unique: allCharactersUnique( returnValue );

    invariant
    // various sanity checks that you never thought could fail here.

    This is written in a sort of imagined DbC language that is a bit like Eiffel and a bit like C… :) The invariant is defined at class level, the pre and post conditions at method level. The precondition here is very important. It is not true, as you suggest, that all input is valid. The point is exactly that both the client and supplier behaviour is checked, as a result of your formal contracts. Your intent is clear, correctness is better ensured (but not guaranteed), and you both get to check supplier AND client behaviour. I don’t know much about test-driven development (maybe I should check it out), but in your examples given, (A) you only get to check for a specific set of inputs, (B) you only check supplier behaviour, not client behaviour.

    The main hazard during developing may not be that your suppliers don’t work correctly, because when you write them your first time, you will probably put much thought into it. But later you may forget the details about what the class requires, your clients may use them incorrectly, or some other bug somewhere causes a function to be called with the wrong arguments, or you may make a change somewhere that you didn’t think could affect other parts of the program. This will produce incorrect results also for the kind of tests you suggest as “TDD-style tests”, but you will catch them later, and the sooner they are caught the easier they are to find. This saves much time. All these points are of course even more true in a big project with many developers. The best thing would be if, as soon as something was in an unexpected state, the debugger would stop at that line where the error happened. This may not be so easy to achieve with any development style.

    Apart from that, I believe no matter what design principle you choose, it must be tested with a broad range of data. Each supplier must, of course be thoroughly tested on its own. If that’s what TDD means, then I suppose I agree with a previous poster, both techniques should be used together.

  12. Balthazar Says:

    Oh, and one more thing: One possible reason why DbC is not widespread, is that there aren’t any good DbC-solutions for existing languages out there. DbC is part of the design process, and should preferably be an integrated part of the language. Those languages that do support it natively, are at least not very widespread. And one precondition check I forgot to put into my example which is very important: Check that your string really is a valid string. Depending on the language used, a memory-related or some other type of bug might cause the string to contain invalid characters. If the function is only intended for valid strings, it should be specified in the contract. Again, this memory-related bug might be caught by the type of tests you suggest, but it will happen at a much later time in the execution, and will be much harder to track down. Particularly if you are using a library or some supplier code that you didn’t write yourself.

  13. matteo Says:

    Hello Balthazar,

    indeed your post-condition is very easy to write, but it’s not a very strong postcondition. I could implement your specification with

    function squeeze(x)
    return “abc”;
    end

    and still pass your assertions. You should add

    ensure
    all input characters can be found in output: …
    all output characters are in the same order as in the input: ….

    And I would find it difficult to write those assertions in a way that is super-clear what they mean.

    I mean, your assertion is certainly a useful check. It’s not a sufficient check; you would need to test your code on several examples. And your assertion does not drive you toward a solution. As Carlo says above (please read his comment carefully, as he has some useful and non-obvious things to say), if it does not help you find the shape of your code, then it’s not a design method.

    If all you do is check simple properties of your output then what you have is a tool for protecting you from errors, not a design tool. Valuable, yes, but I would still want something else for a design tool.

    Matteo

  14. Balthazar Says:

    Yes, I know. I was focusing on the precondition part because that’s where your more obvious lack of understanding for what DbC meant where to be found.

    I could pass your own test, like this:
    function squeeze(x)
    if x equals “” then return “”
    if x equals “aaabbccccaa” then return “abca”
    end

    Also, it seemed you didn’t understand that the check could be implemnted as functions or class methods. Regarding your two additional checks, I might write each of them too as functions. Then the function name, plus the description, which is part of the contract, would be very self explanatory in my view. It seems to me as you are confusing Can’t Solve All The Problems In The World for unusable. DbC can’t solve all the problems in the world, and you may need to write some additional specs (like you would in your TDD examples) but, not having tried it out, I believe it can be a good help in structuring your code.

    Of course, I would need to test my code with several examples. DbC is not meant to be a substitute for testing, rather and aid to it. When you write “And your assertion does not drive you toward a solution. As Carlo says above (please read his comment carefully, as he has some useful and non-obvious things to say), if it does not help you find the shape of your code, then it’s not a design method”. I couldn’t disagree more!! It most certainly does help my designing of the program. By clarifying my contracts, I am lying out the structure of the program in terms of obligations and benefits between each different part of the program, so, it could be a huge help in defining the overall structure of the program. How to the parts play TOGETHER, that’s what DbC seems to focus on, not how should each feature be IMPLEMENTED. You seem to be focusing on implementation, not on interplay/communication between different parts of the software. The principle here is that we see each class, method, function whatever as either a supplier or a client, and in most cases I guess both, and the contracts help define the relationships and rules between them. Implementation solutions and software design are not the same thing. And manually parsing a long list of test cases like the one you suggest in your post doesn’t seem like a good help in figuring out how the main structure of the software works, or how to use or extend a class. On the other hand, preconditions seem like a stronger part of the DbC design paradigm than postconditions. That’s meaybe the reason why I focus on the pres and you focus on the posts.

    I have tried, instead of reading Carlo’s post (which I will read later), to get a bit more educated on TDD. So far, it seems extremely tedious and unusable for my case. In my case I am working on a piece of software that requires a lot of concurrency and lock free data structures, because it is a realtime app. It also requires a VERY complex GUI, which is what I am currently working on. I have been thinking about how I should write the test cases for the GUI, and have problems figuring out how. I would have to write some sort of simulator, that simulates each mouseclick, mousedragged or mousemoved action, plus combinations with modifier keys, which would be perhaps thousands of events during a user session, and don’t forget that there are million trillion gazzillion possible combinations of those… I thought of writing an event recorder, which would be a very different approach from the TDD process anyway, but that wouldn’t help me much. Soon as I change some sizes and values in my code, the “test cases” would be invalidated, because graphical elements (in this case cells in a tree-like matrix) would get laid out differently on the screen. So I believe I would have to write some sort of test cases manually (HUGE JOB) that were tightly integrated with the implementation of my graphical layout system, and then it turns out writing and maintaining the test cases could get much harder and time consuming than writing the actual code. Sure I could end up with something useful, but if I have then overlooked some user interaction sequence, and invoke that sequence during live testing, the already written test cases would not help me catching the bug.

    That’s where I thought DbC might come handy. DbC focuses not only on how the suppliers behave under a constrained set of data, but on how the pieces play TOGETHER. So, if I invoke a new bug in my program during live testing, the error will be caught at a much earlier time. Of course this could be easily combined with an event recorder which helps me reproduce the error. This focus on how the parts play together is what we all rely on, actually, when we start using a new library. Your test cases can’t help ensure that you use your libraries right, but assertions in (most of) these libraries help ensure this to some degree, at least. And this functionality is in fact a very basic application of DbC design. But, if you really are able to predict all possible user interactions, and really stick to the TDD mantras in a die-hard way, I agree that TDD seem to be a far more bulletproof way of ensuring bug free code than DbC.

  15. Balthazar Says:

    Update: I have been reading a bit more about TDD, and thought about the big obstacles to employing DbC to a project. The biggest obstacle to DbC is that there is no native language support for the c family of languages. This makes the benefits diminish. In addition, there are the many difficulties associated with writing posconditions. Say you have an hash table, writing a postcondition for the set function would be easy, but writing a postcondition for the get function much harder. But then again, in many cases DbC could be great. Say you had a layout system which could enforce right margin alignment to text, like what you have in the comments section on this page. In that case, writing the postcondition as a function that checks that each word is correctly aligned wouldn’t be hard. A class invariant could check whether the distance between each word on a line is always equal, which would work equally well for right alignment and block alignment.

    Anyway, because of the drawbacks to DbC, and after reading many posters being happy with TDD, I have decided I’ll try to adopt TDD instead as a start. The two systems are clearly not mutually exclusive, but TDD is available right out-of-the-box in my development API’s/IDE. Since my application is largely GUI-centric, there are parts of it that can’t be developed that way, but looking at my codebase, that is currently just a rather thin view layer that would need to be tested by a human. It is already fairly modular and decoupled as TDD-ers recommend. Also, my application requires much lock free concurrency, and needs to host third-party plug-ins, so I believe there’s no way I can get around live testing. DbC would be a great add-on for this, in addition to TDD, but it’s just not available. (at least *real* DbC).

  16. matteo Says:

    Balthazar,

    Yes, I know. I was focusing on the precondition part because that’s where your more obvious lack of understanding for what DbC meant where to be found.

    I could pass your own test, like this:
    function squeeze(x)
    if x equals “” then return “”
    if x equals “aaabbccccaa” then return “abca”
    end

    The precondition is obvious and uninteresting for the point I want to make in this article.

    Consider that the simple TDD test proves that the implementation works correctly in at least one case. This is a crucial point; your weak postcondition does not even do that. It does not matter that I can easily “fool” the TDD tests by answering exactly the cases that are being tested. When I do TDD, I write the tests to support my coding. TDD tests are whitebox tests, not blackbox tests.

Leave a Reply