A summary of Edsger Dijkstra's "Why numbering should start at zero"
This is cross-posted from my reading-notes blog, because I thought it would be nice to see one of the giants of computer science discussing clearly and simply a topic so fundamental to how we work with arrays.
Addendum: I don’t note in the post Dijkstra’s observation that some prominent (old) programming languages, such as FORTRAN, ALGOL-60, and Pascal, do not follow the conventions he recommends!
How should we denote an arbitrary sequence of consecutive natural numbers, i.e., the set of non-negative integers? Edsger Dijkstra considers the four exhaustive possibilities, using the example 2, 3, …, 12:
a) 2 <= i < 13
b) 1 < i <= 12
c) 2 <= i <= 12
d) 1 < i < 13
He notes that only in a)
and b)
is the difference between the bounds equal to the length of the subsequence. This is important for being able to calculate (easily, or intuitively) the length of the sequence from the notation, so is a reason to exclude c)
and d)
.
Then he notes that because there is a smallest natural number (0
), b)
cannot denote the empty sequence (a sequence with 0
elements) without using non-natural numbers, so is a reason for preferring a)
to b)
:
a) 0 <= i < 0
b) -1 < i <= -1
(We could try something like 1 < 0 <= 0
to denote an empty sequence using notation b)
, but that would violate the requirement that nondecreasing sequences be denoted using left-to-right nondecreasing numbers.)
This brings us to the question of how to denote the index of the starting element of a sequence of length N
: 0
or 1
? If we use the convention we’ve already decided upon, we get the following when we start with 0
and 1
respectively:
0 <= i < N
1 <= i < N + 1
Echoing the prior discussion of denoting sequences, choosing 1
requires using a number larger than the number of elements in the sequence; choosing 0
does not. Hence, as Dijkstra observes, we can define an element’s index to denote the number of elements preceding it.