Let’s talk about ranges of integers. Quick: how many numbers are in the range 16 ≤ x ≤ 37, that is in the set {16, 17, …, 37}?
If you answered 21 then you got it wrong: the correct answer is 22. When you express a range inclusive of both ends, the formula to compute the number of elements is
size of [lower, upper] = upper - lower + 1
The “+1” in the above formula is the source of many “off-by-one” or fenceposts errors. Can we get rid of the “+1”? Yes, there is a way. Use asymmetric bounds, including the lower bound and excluding the upper bound. The formula becomes
size of [lower, upper) = upper - lower
I’m using square and round brackets as a shorthand notation for ranges with inclusive and exclusive bounds:
[a, b] = { x | a ≤ x ≤ b }
[a, b) = { x | a ≤ x < b }
Another reason in favor of asymmetric bounds: how do you express an empty range? With symmetric inclusive ranges, you can’t: the smallest range you can express contains one element. For instance [3, 3] contains only the number 3. With asymmetric bounds, you can express an empty range with, for instance, [3, 3), which is empty.
Asymmetric bounds work very well in conjunction with counting from zero. Java and C arrays are indexed from 0, so that the index range is [0, N). The upper bound is now equal to the number of elements! Expressing it with symmetric bounds gives [0, N-1], which is ugly because of the “-1”.
If you need to iterate on a C or Java array, the standard pattern is
int[] array = new int[N];
for (int i=0; i<N; i++) { ... }
You see how the body is executed when i==0
, through i==N-1
, but not when i==N
, since we are using i<N
as a boundary condition. How many times is the loop is executed? Exactly N
! It’s now easy to see that the iteration is performed the correct number of times.
Asymmetric bounds can be concatenated easily. Suppose you have a function that paginates results on a web page. You are on the page which contains results 30, 31, … 39. If you express the range with two parameters from and to, the page url would look like
http://example.com/results?from=30&to=40
And the links to the next and previous page would be
<a href="/results?from=20&to=30">previous page</a>
<a href="/results?from=40&to=50">next page</a>
where the bounds for the next or previous page are easily computed by adding or subtracting the page size from the current bounds. Compare with
http://example.com/results?from=30&to=39
<a href="/results?from=20&to=29">previous page</a>
<a href="/results?from=40&to=49">next page</a>
it’s more difficult to check that the bounds are correct here: you have to think where the “-1” must be applied.
So, if you define “+” as the operation of concatenation of ranges, you get this nice little law:
[a, b) + [b, c) = [a, c)
It’s this sort of properties that make it easy to combine software objects together. This helps towards the goal of programming with lego-like bricks.
In conclusion: express ranges with asymmetric bounds! You will avoid off-by-one errors, get rid of “+1″s from your code, and make your programs more modular.
References
The book C Traps and Pitfalls by Andrew Koenig has a good explanation of asymmetric bounds.