Eric writes why, from the C# perspective, the .NET Framework doesn’t “really” support arrays with lower bounds other than zero. He trots out history as the reason why C# doesn’t support them, saying that
Understanding how zero-based indexing works is the secret handshake of the programming world. […] We’re not going to try to change our brain wiring just because some young whippersnapper is having trouble remembering that the first index is zero. […] Or, to put it another way, developers have a huge investment in hardwired things like this, and changing them will not make your customer happy.
Of course, this argument only applies to programmers schooled solely in C-derived languages. For VB programmers, the opposite is true: prior to VB .NET 2002, VB had no “secret handshake” for programmers to learn. We allowed programmers to declare arrays that had bounds from 1 to 20, or from 1001 to 2000 or whatever. Even more puzzling, Eric points out that the CLR does support such arrays. So if there’s no historical problem and there’s no technical problem, why doesn’t VB have non-zero lower bound arrays?
Good question.
You see, when the CLR designers sat down to come up with their array design, they were caught between two competing camps. On one side of the aisle was VB, which had arrays that could have any lower bound you liked, and on the other side of the aisle was the C-derived languages, which didn’t. One solution to this dilemma would have been to simply allow arrays to have non-zero lower bounds, and then just let the C-derived languages not allow their developers to use a lower bound other than zero. However, there was a major problem with this scheme. Many (though certainly not all) C-derived language programmers tend to be, well, a bit obsessed about performance. And the simple fact than an array could have a non-zero lower bound meant that the JIT optimizer would not be able to perform certain kinds of code optimization that C-derived language programmers are used to. This loss of performance (which in some cases could be non-trivial) was simply unacceptable to the C-derived languages.
To finesse this issue, the CLR designers came up with a compromise: there would be two kinds of arrays in the CLR. One kind, which I’ll call “arrays,” were just like normal VB arrays – they could have non-zero lower bounds. The other kind, which I’ll call “vectors,” were a restricted type of array: they could only be 1-dimensional, and their lower bound was fixed to be zero. This compromise allowed VB to have its arrays, and also allowed the C-derived languages to optimize the most common array case. Everyone was happy, right?
Well, not exactly. You see, the problem is that in the compromise scheme that the CLR devised there are actually two kinds of 1-dimensional arrays: vectors and 1-dimensional arrays. The only difference between the two is that the former has a fixed lower bound of zero and the second doesn’t. But because vectors are so highly optimized, they don’t store their lower bound in the array instance – it’s just assumed. What this means it that vectors and 1-dimensional arrays are not assignment compatible. This means that you can’t take a 1-dimensional array and convert it to a vector. (Weirdly enough, I think the CLR actually allows you to convert a vector to a 1-dimensional array and they handle the fact that the two have different layouts, but I may be wrong on that point.)
This becomes a major problem because many APIs in the base class library are written using C#, which uses vectors. If VB only used non-zero lower bound arrays, then you wouldn’t be able to pass a 1-dimensional array from VB into many of the base class library APIs! This was clearly unacceptable. But what could we do about it?
We did consider exposing this array mess to the user through the type system. In this way of thinking, VB would have two kinds of arrays: those that could have lower-bounds of zero and those that couldn’t. However, we discarded this idea for two reasons:
- Having two different types of arrays that were not assignment compatible seemed incomprehensible from a user’s perspective. It seemed likely that nobody would ever keep them straight and remember which was supposed to be used when. We felt it would be a disaster.
- As Eric points out, even if C-derived language programmers can’t declare a non-zero lower bound array, they’d still have to deal with one that was passed to them by a VB programmer. Given the amount of thought many C-derived language programmers give to VB, this was likely to cause serious problems – even in our own base class libraries – when people attempted to use non-zero lower bound arrays in APIs that weren’t equipped to deal with them.
In the end, we decided that we had to give in and just have zero-lower bound arrays. It was a loss for many VB programmers, but the alternatives, given the situation, were worse…
(Actually, if you can believe it, this is a very simplified version of the long saga of arrays on the CLR. Suffice it to say, this was one of the most complicated aspects of getting .NET out of the door.)