Mathematics cannot prove the absence of bugs

Everyone is familiar with Edsger W. Dijkstra’s famous observation that “Program testing can be used to show the presence of bugs, but never to show their absence!” It seems to me that mathematics cannot prove the absence of bugs either.

Consider this simple line of code, that appears in some form in *every* business application:

database.server.address=10.1.2.3

This is a line of software. It’s an assignment of a value to a variable. And there is no mathematical way to prove that this line is correct. This line is correct only IF the given IP address is really the address of the database server that we require.

Not even TDD and unit testing can help to prove that it’s correct. What would a unit test look like?

assertEquals("10.1.2.3", 
    config.getProperty("database.server.address"));

This test is just repeating the contents of the configuration file. It will pass even if the address 10.1.2.3 is wrong.

So what is the *only* way to prove that this line of code is correct? You guessed it. You must run the application and see if it works. This test can be manual or automated, but still we need a *test* of the live system to make sure that that line of code is correct.

8 Responses to “Mathematics cannot prove the absence of bugs”

  1. J. B. Rainsberger Says:

    Indeed. A computer program can only constrain which parts can fit together which way, but can never be constrained (perfectly) to “do what you appear to want”, nor even “do what you appear to have intended to do”. We can use sampling (tests) to increase confidence that the program does what we think we want. We can use design principles to reduce the likelihood of misunderstanding, which should increase our confidence further.

    We can’t even run the program to prove that it is correct, because for any useful program, we’d have to run it longer than the projected age of the solar system to have a *proof*. :)

  2. Giorgio Sironi Says:

    A colleague of mine does not run the application at all, but runs a pre-deploy step that tries to connect to the configured hosts with the correct driver, before switching from the old to the new version. This originates from his Java experience where he had dozens of queue names in the configuration and they were modified very frequently.

    Returning to the mathematics issue, it seems you can substitute anything into the post and still get the same conclusion: it seems to me that “load testing” or “static analysis” or “code reviews of pull requests on GitHub” cannot prove the absence of bugs either, because I can still put a wrong configuration file into the production environment. Like J.B. said, each one of these techniques covers just a sample of what we need.

    That said, there are contexts where mathematics, in the sense of formal verification, can prevent bugs that example-based testing can’t. Whenever there are concurrency or distributed systems involved, you should definitely not try to solve consensus by inventing your own algorithm: just use Paxos. Do not try to invent your own replication mechanism, use something from the literature or Aphyr and Jepsen will haunt you and demonstrate you lose data, like it happened for Redis.

    So mathematics is one of the tools; TDD is another; configuration testing before deployment is yet another. What matters is we respond to the risks we encounter and start to introduce in our projects the best risk mitigation strategy considering cost and value. To introduce a practice you must know it; yet everyone is preaching TDD but I don’t see anyone saying reading papers is important…

  3. matteo Says:

    Hello Giorgio,

    please read my blog post in perspective. I wrote a whole Ph.D. thesis about how to write software that’s mathematically proved correct! And yet maths is not all.

    Your objection about “static analysis”, “load testing” etc is good except that nobody in those fields claims that their field is the answer to all problems in software… as good old Dijkstra seems to have implied about maths. The irony of it all is that correctness is not the most important, and surprisingly, not the most difficult, part of writing software! Fitness for purpose, cost of maintainance over the years, speeding up delivery time, are all problems that are FAR more difficult to solve than correctness, and vastly less popular in academia as subjects of study.

  4. Giorgio Sironi Says:

    I agree all these non-functional requirements are actually the reasons projects fail, first of all maintenance cost. But why choosing this example (a configuration file) to pick on math? There are so many fitting examples such as complex business rules with special cases, user interfaces, integration with 3rd parties.

  5. matteo Says:

    Giorgio: I present this example because it’s so simple… complex business rules can arguably be attacked with maths and you might also stand a good chance of conquering them, if you use maths for understanding and simplifying. Yes I am aware of what Udi Dahan says about maths (http://www.udidahan.com/2014/11/16/watch-out-for-superficial-invariants/) but I think that if you use maths as Dijkstra meant it to be used, then it’s a powerful tool for insight. I’m not picking on maths…. I’m picking on the attitude that all you need is maths and tests are a second class citizen.

  6. Uberto Says:

    I think that’s exactly the point, a program without bugs is a program that works as expected, not a program which is always correct.
    Even if you test manually the db and it works it still doesn’t mean much, it can be that ip was changed 1 second after your test, or you are actually connecting to a wrong db instance that happens to have same data (this is happened to me).

  7. Stefano Says:

    Hi Matteo,
    If I understood your post well, I totally concur with Uberto. He touches two very valid points:

    1. a program without bugs is a program that works as expected, not a program which is always correct; therefore I much prefer to see TDD as a mean to express requirements, not a testing framework.

    2. if we could, testing manually, assure that a sw is always correct, we could then do it with a program too (and with math eventually). as said, this is impossible. therefore manual testing does not add anything in terms of correctness.

  8. matteo Says:

    Hi Stefano, just to reiterate: math can prove program correctness for all inputs. Correctness does not guarantee fitness for purpose. What we really care about is fitness for purpose; reasoning on the program is a way to make fitness for purpose more likely. Program testing is another way to make fitness for purpose more likely. TDD is part reasoning, part testing.

    Uberto is right that not even manual testing can assure you that you are connected to the right database. I guess that all these reasonings lead to the same conclusion, that is, that absence of bugs is a matter of probabilities, and never a boolean property that can be proved mathematically.

    Regarding your point 2: manual testing can check things that automated tests cannot check: a human testers can look for different things every time. A mechanical test does the opposite: it’s always checking for the same things. Both styles are useful.

Leave a Reply