Eric writes why, from the C# perspective, the .NET Framework doesn’t “really” support arrays with lower bounds other than zero. He trots out history as the reason why C# doesn’t support them, saying that
Understanding how zero-based indexing works is the secret handshake of the programming world. […] We’re not going to try to change our brain wiring just because some young whippersnapper is having trouble remembering that the first index is zero. […] Or, to put it another way, developers have a huge investment in hardwired things like this, and changing them will not make your customer happy.
Of course, this argument only applies to programmers schooled solely in C-derived languages. For VB programmers, the opposite is true: prior to VB .NET 2002, VB had no “secret handshake” for programmers to learn. We allowed programmers to declare arrays that had bounds from 1 to 20, or from 1001 to 2000 or whatever. Even more puzzling, Eric points out that the CLR does support such arrays. So if there’s no historical problem and there’s no technical problem, why doesn’t VB have non-zero lower bound arrays?
You see, when the CLR designers sat down to come up with their array design, they were caught between two competing camps. On one side of the aisle was VB, which had arrays that could have any lower bound you liked, and on the other side of the aisle was the C-derived languages, which didn’t. One solution to this dilemma would have been to simply allow arrays to have non-zero lower bounds, and then just let the C-derived languages not allow their developers to use a lower bound other than zero. However, there was a major problem with this scheme. Many (though certainly not all) C-derived language programmers tend to be, well, a bit obsessed about performance. And the simple fact than an array could have a non-zero lower bound meant that the JIT optimizer would not be able to perform certain kinds of code optimization that C-derived language programmers are used to. This loss of performance (which in some cases could be non-trivial) was simply unacceptable to the C-derived languages.
To finesse this issue, the CLR designers came up with a compromise: there would be two kinds of arrays in the CLR. One kind, which I’ll call “arrays,” were just like normal VB arrays – they could have non-zero lower bounds. The other kind, which I’ll call “vectors,” were a restricted type of array: they could only be 1-dimensional, and their lower bound was fixed to be zero. This compromise allowed VB to have its arrays, and also allowed the C-derived languages to optimize the most common array case. Everyone was happy, right?
Well, not exactly. You see, the problem is that in the compromise scheme that the CLR devised there are actually two kinds of 1-dimensional arrays: vectors and 1-dimensional arrays. The only difference between the two is that the former has a fixed lower bound of zero and the second doesn’t. But because vectors are so highly optimized, they don’t store their lower bound in the array instance – it’s just assumed. What this means it that vectors and 1-dimensional arrays are not assignment compatible. This means that you can’t take a 1-dimensional array and convert it to a vector. (Weirdly enough, I think the CLR actually allows you to convert a vector to a 1-dimensional array and they handle the fact that the two have different layouts, but I may be wrong on that point.)
This becomes a major problem because many APIs in the base class library are written using C#, which uses vectors. If VB only used non-zero lower bound arrays, then you wouldn’t be able to pass a 1-dimensional array from VB into many of the base class library APIs! This was clearly unacceptable. But what could we do about it?
We did consider exposing this array mess to the user through the type system. In this way of thinking, VB would have two kinds of arrays: those that could have lower-bounds of zero and those that couldn’t. However, we discarded this idea for two reasons:
- Having two different types of arrays that were not assignment compatible seemed incomprehensible from a user’s perspective. It seemed likely that nobody would ever keep them straight and remember which was supposed to be used when. We felt it would be a disaster.
- As Eric points out, even if C-derived language programmers can’t declare a non-zero lower bound array, they’d still have to deal with one that was passed to them by a VB programmer. Given the amount of thought many C-derived language programmers give to VB, this was likely to cause serious problems – even in our own base class libraries – when people attempted to use non-zero lower bound arrays in APIs that weren’t equipped to deal with them.
In the end, we decided that we had to give in and just have zero-lower bound arrays. It was a loss for many VB programmers, but the alternatives, given the situation, were worse…
(Actually, if you can believe it, this is a very simplified version of the long saga of arrays on the CLR. Suffice it to say, this was one of the most complicated aspects of getting .NET out of the door.)
If we can have compatibility libraries for stuff like Len, why not have a compatibility lib that did the conversion from a VB array to a vector, when the compiler finds an array that does not have a 0 based LBound??
I do miss my 1 based arrays…:-(
Consindering the alternatives, you guys made the right choice. If only we could drop the ( ) for [ ] now…make code just a little clearer 🙂 I find once you learn the reason c arrays are 0 based, it always makes sense and never causes problems.
maybe i’m dense, but I still don’t see why you can’t have the compiler convert non-0 based arrays to 0 based. It’s just the matter of subtracting the base from the index every time the array is addressed. It could be done at compile time in VB.
Miles/Anand: The problem is that conversion doesn’t remove the need to have two separate types. The issue comes down to things like overriding methods defined in C# or implementing interfaces defined in C#: the CLR requires that you supply exactly matching types, which means that we have to surface the distinction in the language.
If it doesn’t make sense, I wouldn’t feel bad. You wouldn’t believe the number of times I had to go over this issue until it finally made sense. Maybe I’ll revisit this in a little while and see if there’s a simpler way to explain it…
Problem: Because vectors are so highly optimized, they don’t store their lower bound in the array instance – it’s just assumed.
Solution: Store the lower bound. What’s four bytes, compared to all this confusion? (Tell the whiners to shaddup and buy another gig. <g>)
Hi. I am trying to pass a C-style array to a COM object written in VB. Since the COM object expects a 1-based array I get a "lower bound must be 1" error whenever I execute the COM object. Does anybody know how to work back and forth between the 2 array bases? Thanks. And thanks for the great explanation of the problem.
This is just one of the cases that the MSDN Magazine guys basically screwed VB developers. Instead of thinking rationally and coming up with a solution that would favor both C# and VB.NET, they decided to sacrifise VB for their new Java-like lanauge. If the CLR can’t handle something simple as this, how can it be a "Universal Framework" for programming. How can other programming languages, with many more limitations, be able to integrate in CLR. I assume not even Microsoft knows the answer to this question. I guess in the end of every problem like this they said "well let’s see how Java does it?"
I don’t have a problem with the 0-based array syntax. What I have a problem with is the weird way VB.NET uses the size parameter.
In pretty much any other language, allocating an array of size (6) means you get array elements 0,1,2,3,4,5. Fine and dandy.
But in VB.Net, for some inscrutable reason, using
dim arrStr(6) as string
gives you elements 0,1,2,3,4,5, AND 6. Does this type of behavior exist in any other language?
I don’t know about other languages. The reason is that the number in a(x) is not the size parameter, it’s the upper bound. As in "Dim x(0 to 5)". 0 to 5 is six elements, not 5. I realize given the lack of non-zero lower bound arrays, this is an anachronism but there it is…
The die is long since cast, but I think that this is still a fundamental design error in .NET. 0-based indexing makes sense in the world of low-level systems programming in C where you’re more interested in working on buffers than collections of items, but it’s never made sense to me for high-level application development. I would suggest that the performance/optimization argument is a canard. Fortran has long supported arrays that are not zero-based, and nobody has ever accused Fortran of being slow; for many years, it was the gold standard for high-throughput mathematical computation.
(I learned about the differences between .NET arrays and vectors the hard way once upon a time, when I forcibly created a one-dimensional array and then found that I couldn’t use it in contexts that expected a vector. Grrr….)