Category Archives: Visual Basic

Non-zero lower bounded arrays (the other side of the coin)

Eric writes why, from the C# perspective, the .NET Framework doesn’t “really” support arrays with lower bounds other than zero. He trots out history as the reason why C# doesn’t support them, saying that

Understanding how zero-based indexing works is the secret handshake of the programming world. […] We’re not going to try to change our brain wiring just because some young whippersnapper is having trouble remembering that the first index is zero. […] Or, to put it another way, developers have a huge investment in hardwired things like this, and changing them will not make your customer happy.

Of course, this argument only applies to programmers schooled solely in C-derived languages. For VB programmers, the opposite is true: prior to VB .NET 2002, VB had no “secret handshake” for programmers to learn. We allowed programmers to declare arrays that had bounds from 1 to 20, or from 1001 to 2000 or whatever. Even more puzzling, Eric points out that the CLR does support such arrays. So if there’s no historical problem and there’s no technical problem, why doesn’t VB have non-zero lower bound arrays?

Good question.

You see, when the CLR designers sat down to come up with their array design, they were caught between two competing camps. On one side of the aisle was VB, which had arrays that could have any lower bound you liked, and on the other side of the aisle was the C-derived languages, which didn’t. One solution to this dilemma would have been to simply allow arrays to have non-zero lower bounds, and then just let the C-derived languages not allow their developers to use a lower bound other than zero. However, there was a major problem with this scheme. Many (though certainly not all) C-derived language programmers tend to be, well, a bit obsessed about performance. And the simple fact than an array could have a non-zero lower bound meant that the JIT optimizer would not be able to perform certain kinds of code optimization that C-derived language programmers are used to. This loss of performance (which in some cases could be non-trivial) was simply unacceptable to the C-derived languages.

To finesse this issue, the CLR designers came up with a compromise: there would be two kinds of arrays in the CLR. One kind, which I’ll call “arrays,” were just like normal VB arrays – they could have non-zero lower bounds. The other kind, which I’ll call “vectors,” were a restricted type of array: they could only be 1-dimensional, and their lower bound was fixed to be zero. This compromise allowed VB to have its arrays, and also allowed the C-derived languages to optimize the most common array case. Everyone was happy, right?

Well, not exactly. You see, the problem is that in the compromise scheme that the CLR devised there are actually two kinds of 1-dimensional arrays: vectors and 1-dimensional arrays. The only difference between the two is that the former has a fixed lower bound of zero and the second doesn’t. But because vectors are so highly optimized, they don’t store their lower bound in the array instance – it’s just assumed. What this means it that vectors and 1-dimensional arrays are not assignment compatible. This means that you can’t take a 1-dimensional array and convert it to a vector. (Weirdly enough, I think the CLR actually allows you to convert a vector to a 1-dimensional array and they handle the fact that the two have different layouts, but I may be wrong on that point.)

This becomes a major problem because many APIs in the base class library are written using C#, which uses vectors. If VB only used non-zero lower bound arrays, then you wouldn’t be able to pass a 1-dimensional array from VB into many of the base class library APIs! This was clearly unacceptable. But what could we do about it?

We did consider exposing this array mess to the user through the type system. In this way of thinking, VB would have two kinds of arrays: those that could have lower-bounds of zero and those that couldn’t. However, we discarded this idea for two reasons:

  1. Having two different types of arrays that were not assignment compatible seemed incomprehensible from a user’s perspective. It seemed likely that nobody would ever keep them straight and remember which was supposed to be used when. We felt it would be a disaster.
  2. As Eric points out, even if C-derived language programmers can’t declare a non-zero lower bound array, they’d still have to deal with one that was passed to them by a VB programmer. Given the amount of thought many C-derived language programmers give to VB, this was likely to cause serious problems – even in our own base class libraries – when people attempted to use non-zero lower bound arrays in APIs that weren’t equipped to deal with them.

In the end, we decided that we had to give in and just have zero-lower bound arrays. It was a loss for many VB programmers, but the alternatives, given the situation, were worse…

(Actually, if you can believe it, this is a very simplified version of the long saga of arrays on the CLR. Suffice it to say, this was one of the most complicated aspects of getting .NET out of the door.)

Names changed to protect the innocent

Robert lets the cat out of the bag that the name of the next version of the Visual Basic product is going to just be “Visual Basic 2005 xxx Edition” and that the official name of the language is going to be “Visual Basic.” No “.NET” in sight. Thus, my adage that

…the one thing you can count on at Microsoft is that there will be absolutely no consistency or constancy to names over time.

is proven once again.

Not being involved in the name change process in the slightest, I don’t know what the marketing logic behind the name change was, but I suspect that as we move further in time away from the COM/.NET schism there is not as much of a need to emphasize the unique .NET nature of VS, VB, C#, etc. It does mean that I’m probably going to have to change the title of the second edition of my book (assuming it does well enough to merit a second edition)…

Oh, and as far as I know, we’re keeping sequential numbers to refer to the language itself. So the language in the Visual Basic .NET 2002 product was Visual Basic .NET 7.0. The language in the Visual Basic .NET 2003 product was Visual Basic .NET 7.1. The language in the Visual Basic 2005 product is going to be Visual Basic 8.0. Confused yet?

The truth about choosing keywords

One comment that comes up over and over is “Why did you chose <fill in some keyword or syntax>? It looks <ugly, stupid, non-VB-like, inconsistent>!” So, how do we come up with new syntax?

We put all the options up on a dart board and throw darts at them blindfolded.

Seriously.

OK, not seriously. But sometimes it seems that way. You see, I’d like to pretend here that we have some formal intellectual process that allows us to arrive at the ideal syntax 100% of the time with complete accuracy. But the sad fact of the matter is that syntax, for better or for worse, is largely a matter of personal preference. Show the same syntax to 10 different people, and you’re going to get 10 different opinions. In the end, it almost comes down to just picking a damn syntax and sticking with it no matter what.

This isn’t to say that there aren’t considerations that come into play when thinking about syntax. Reusing characters already used in another context is not something that shouldn’t be done lightly. The availability of keys on the majority of keyboards in the world is another consideration. (Our test lead took to joking for a while that we should use the euro character for something, just to give the Europeans a leg up on the Americans for once.) And the number of characters in a keyword is a real consideration. I personally have some regrets about the length of some of the keywords that we chose for inheritance, for example.

But in the end, it comes down to aesthetics and that is totally a personal thing. The bitshift operators are an excellent example of this: when we first discussed them in the VS 2002 timeframe, we ended up having to table the whole feature because none of us could agree on whether we should choose <<, Shl, BitShl, ShiftTheValueLeftTheFollowingNumberOfBits, etc. We would have ended up cutting the feature because of time anyway, but it’s an instructive example.

The problem with something like choosing a bitshift operator is that everyone’s cultural context comes into play. For people familiar with C++, choosing << and >> seems natural because that’s what they’re used to. For people familiar with assembly, Shl and Shr might seem more natural. But given that VB has never had them, what to choose? Even looking at the issue of “word vs character” doesn’t help here, because even though we do have a tradition of using words for operators (“Mod”), we also have a tradition of using obscure characters for operators, too (“^”). In the end, my view was that the fact that << and >> visually encoded some indication of their function made them slightly preferable over Shl and Shr, which seemed a bit cryptic to my eyes.

My view eventually prevailed, by the way, because everybody else who disagreed with me left the team in the interim. Victory by attrition. But that just emphasizes the personal nature of the choice. In the end, I think it’s more of an artistic decision than a technical decision – less “how strong does this beam need to be to hold this ceiling up?” and more “what color should we paint the bedroom?”

Which doesn’t mean that we don’t agonize endlessly over what the right syntax is for something. The generics syntax is an excellent example of this, and deserves an entry of it’s own in the near future.

Background compilation, Intermission

I’m still planning on finishing my exegesis on background compilation, but I thought I would take a moment to address something that came up in the comments on Part 1 and which we get from time to time: why can’t I turn off background compilation?

Unfortunately, we usually get this question because a customer has got a project where the background compilation is causing the IDE to become unacceptably slow. There are, of course, theoretical maximums of what any program can handle (and users love pushing those maximums), but there have been problems with background compilation in the past. After shipping VS 2002, we discovered that there were some bugs in the way that we did our dependency calculations that resulted in too much decompilation occuring. This caused problems on some very large projects and we fixed it in VS 2003. If you’ve got a large project in VS 2003 that’s still got problems with responsiveness in the IDE and would be willing to give it to us to test against Whidbey, drop me a line and I’ll connect you with someone. We definitely want to fix any problems!

The reason, though, why you can’t turn of background compilation is that the IDE completely depends on it. The information provided by background compilation drives Intellisense, it enables the WinForms designer (as well as all other designers), it fills the dropdowns at the top of the window, it drives the object browser, it enables go to definition, etc, etc, etc. Essentially, without background compilation, the IDE becomes Notepad.exe with syntax coloring. (Coloring, interestingly enough, does not require background compilation because it’s entirely based off the lexical grammar of the language and requires no further knowledge of the code.) So that’s why we don’t let you turn it off.

But I’ll emphasize again: we believe there’s no reason that large projects shouldn’t perform well. And we work hard on ensuring that, so let us know if there are still problems!

The cat’s out of the bag…

Oh, and yeah, my book is finally available! I’ve actually got a real, printed copy of it sitting here on my desk and Amazon claims that they’re shipping it within 24 hours! (It may not be on the shelves of your local Barnes and Noble for a little while longer, though.)

Overall, I have to admit that I find this kind of frightening, somewhat akin to what it must be like taking your child off to their first day at school. Here’s something that’s been a part of you for quite some time, and now it’s out there at the mercy of the big, bad world. Will it make friends? Will it get beaten up and come home crying? I guess now only time is going to tell….

So, everyone, go out and buy a copy and tell me (and everyone else on Amazon) how wonderful it is! Here’s a handy link to help you get started:

Hope you all like it….

P.S. – There’s already someone selling a used copy of the book on Amazon. I’ve had my own copy something like 48 hours… how does that happen?

Why do hobbyists matter?

After Kathleen worried about losing the hobbyist programmer on .NET, Rory came back with the question “Should the hobbyist programmer matter to Microsoft?” His thesis, in a nutshell, was:

I say that we don’t worry about the hobbyists – don’t dissuade them from coding in .NET, but don’t cater to them either.

I understand where he’s coming from, but I think that the terminology is confusing the issue. When we talk about “hobbyist programmer,” it evokes images of guys tinkering in their garages or in their basements on the weekend. And, yeah, maybe if it was just the equivalent of a bunch of guys (or gals) building model trains or making furniture or rebuilding old cars, it wouldn’t matter so much. But the reality is that the hobbyist programmer doesn’t just program on the weekends – they’re also programming during the week at their “real” jobs.

Before I started working on VB, I worked on Access. And I cannot count the number of times that customer testimonials started along the lines of “I was fooling around with Access one day and managed to write this small app to help manage my group. Once my department found out about it, they started using it to manage the department. Now my whole company uses it!” One of the key aspects of Access’s success was this kind of “viral adoption” where some tinkerer used it to solve some local problem that ended up solving a company-wide problem. The same holds for VB – lots of VB applications in corporations started life as someone’s side project. As I put it in a recent presentation, “Throwaway applications have a way of becoming mission critical applications.” And where do those throwaway applications come from? Hobbyist programmers.

With the spread of computing into more and more industries, the people who don’t consider themselves programmers become more and more important because they’re the beachhead for “real programming” to make its way in. For example, the throwaway applications that hobbyists write ultimately helps drive demand for professional programmers to come in and “professionalize” the applications so that they scale correctly for the corporation. Also, as hobbyist applications make companies more open to the benefits of technology, they open the door to commercial software that can augment or replace the homegrown applications and maybe do a better job. And, of course, hobbyist programmers usually need lots of help, which drives demand for websites, magazines, books, consultants, etc.

So, in much the same way that small businesses serve a vital function in keeping the economy going so that large corporations can thrive, hobbyists play a vital role in sustaining the ecosystem that supports the professional programmers. Even if the professional programmers don’t always appreciate that…

TryCast (aka. the ‘as’ operator)

About 8 months ago, I promised to explain why the C# ‘as’ operator was worth adding to VB and then never got back to it. While trying to figure out what comments I missed, I came across the promise again and figured I’d take the opportunity to rectify the situation.

In Whidbey, we’re going to introduce YACO (Yet Another Conversion Operator) called TryCast that’s equivalent to the C# ‘as’ operator. Why? It turns out that it’s useful in some specific, but not uncommon, situations. Let’s say that you want to write a method that accepts an object and, if it implements a particular interface, do something with it. For example:

Sub Print(ByVal o As Object)
Dim PrintableObject As IPrintable

If TypeOf o Is IPrintable Then
PrintableObject = DirectCast(o, IPrintable)
PrintableObject.Print()
End If
...
End Sub

Ok, great, this works fine. The problem is, though, that we’re doing redundant work here. You see, when the CLR executes the If statement, it does a type check for the TypeOf expression. If o does implement IPrintable, it then goes ahead and casts o to IPrintable, which does another type check to ensure that the value really does implement IPrintable. So you’re doing two type checks, and type checks can be expensive. (When I say “expensive” here, I don’t mean expensive like “$760,000 for a Ferrari” expensive, I mean more like “$3.50 for a latte twice a day adds up over 365 days” expensive.)

What TryCast does is it allows you to combine the two type checks into one. TryCast, as its name suggests, will try the cast and, if it succeeds, return the value cast to that type. Otherwise, it returns the value Nothing. So you can rewrite the code above as:

Sub Print(ByVal o As Object)
Dim PrintableObject As IPrintable = TryCast(o, IPrintable)

If PrintableObject IsNot Nothing Then
PrintableObject.Print()
End If
...
End Sub

Voila, two type checks becomes one! Even if we hadn’t gotten any feature requests for this, we were planning to do it anyway because we found that several of our language helpers that deal with object values could be sped up ~5% just by eliminating the redundant type checks. So it’s not nothing. (So to speak.)

One question that might come to mind is: why not just optimize the first code into the second code under the covers instead of introducing a new operator? We did consider this at the compiler level, and the CLR could also choose to do it at the JIT level. However, while we could optimize this particular case, it’s easy to construct cases that would be difficult for the compiler or JIT to optimize. Which would mean that you could make a minor tweak to the structure of your code and suddenly lose the optimization without knowing it. In this case we figured explicitness was better…

Update 4/3/04: Made some corrections based on comments below.

IsNot and the fall of civilization

Back when I said that Whidbey is going to be introducing an IsNot operator, Karl raised some objections to the operator in the comments, to which I said:

Karl, it seems like your question isn’t so much why we’re adding “IsNot” but why we have “Is” in the first place. It’s a good question, I’ll try to address it in an entry soon.

…and then I never came back to it. Now that Daniel has picked up the thread again, let me return to the subject and elaborate just a little bit.

As I said in the comments, the question that Karl and Daniel raise is not so much “why IsNot?” but instead “why Is?” The fundamental reason for adding IsNot was that it’s irritating to have a comparison operator and not its inverse. For example, it’s entirely feasible to have a language with just Not, =, < and <=, but no language bothers to be “pure” that way because it’s a PITA to have to write “If Not x = 5 Then” or “If Not x <= 5 Then” and so on. Given the existence of the “Is” operator, having its opposite seems like, as Martha would say, “a good thing.” Charges of language pollution seem, on the face of it, to be a stretch.

OK, so let’s leave aside the question of IsNot for a moment. Why bother to have two comparison operators, “Is” and “=“? There’s a historical reason and a modern reason. The historical reason is that, prior to VS 2002, VB allowed classes to define a parameterless default property that represented the “value” of the class. This feature allowed, say, a TextBox object to declare that its Text property was its “value” property and then the user could write “TextBox1 = “foo”” and it would assign the string value “foo” to TextBox1.Text. However, since a class could be both an object and a value, you needed to have two forms of comparison and assignment to distinguish between the two. Assignment was handled by splitting it into “Let” vs “Set” assignment: “Let TextBox1 = “foo”” assigned a value to the default property, while “Set TextBox1 = New TextBox()” assigned a value to TextBox1 itself. Comparison was handled by splitting it into, yes, you guessed it, “=” and “Is”. The code “TextBox1 = “foo”” would do a value comparison between the string value “foo” and the default property TextBox1.Text. The code “TextBox1 Is TextBox2”, on the other hand, would do an object comparison between two TextBox objects.

When we made the leap to .NET, though, we ran up against a bit of a wall. Other languages such as C# and C++ didn’t support the concept of parameterless default properties, two types of assignment, two types of comparison, etc. This wasn’t a big deal for fields, but it was a huge issue for properties – whereas a VB property could define a Get, Let and Set, a C# property could define only a Get and a Set. We spent a lot of time trying to invent ways to map between the two schemes and finally came to the conclusion there was no good mapping. (Even today this is one of the friction points between COM and .NET that can make interop a pain.) So we gave in and dropped parameterless default properties from the language. This meant that we could also drop Let vs. Set assignment from the language. And, if we’d chosen too, we could have dropped “Is”. But we didn’t…. Why is that? Two reasons, one minor, one major.

The minor reason is that we could produce more optimal comparisons between values typed as Object (if someone’s really interested in this, I can explain, but it’s kind of obscure). The major reason was that we were anticipating a feature that didn’t make it into VS 2002 or VS 2003 but will make it into Whidbey: operator overloading. Take the situation where you’ve got a class that overloads the equality operator:

Class ComplexNumber
...

Public Shared Operator = (ByVal c1 As ComplexNumber, ByVal c2 As ComplexNumber) As Integer
...
End Operator
End Class

Now, when you say “c1 = c2“, you’re always going to get value equality between two ComplexNumber instances, not reference equality. But let’s say that you really need to know whether c1 and c2 are the same instance not just the same value. How do you do it? In VB, it’s simple. You say “If c1 Is c2 Then…“. What do you have to do in C#? You have to make sure you cast both values to object and then do the comparison: “if ((object)c1 == (object)c2) {…}“. Forget the cast, and you’ve accidentally slipped back into value comparison.

So that’s why we kept “Is“ and “=“. And, yes, I think this is one of the places where our syntax is clearer than C#’s. Although everyone’s free to disagree…

Background compilation, part 1

Roy points to Philip’s complaint that VB still exhibits problems with multi-language solutions that have been around since the VS 2002 beta. Philip’s completely correct, and the explanation of why this bug still hasn’t been fixed even though we’ve known about it since before VS 2002 shipped bears some explanation. Specifically, the problem is with a mistake we made when designing our background compilation system a very long time ago. Since I’ve been asked more than a few times about how background compilation works, this is an excellent chance to delve into that subject. So let me talk about background compilation for a while and then we’ll get back to Philip’s bug.

“Background compilation” is the feature in VB that gives you a complete set of errors as you type. People who move back and forth between VB and C# notice this, but VB-only developers may not realize that other languages such as C# don’t always give you 100% accurate Intellisense and don’t always give you all of the errors that exist in your code. This is because their Intellisense engines are separate, scaled-down compilers that don’t do full compilation in the background. VB, on the other hand, compiles your entire project from start to finish as Visual Studio sits idle, allowing us to immediately populate the task list with completely accurate errors and allowing us to give you completely accurate Intellisense. As Martha would say, it’s a good thing.

However, doing background compilation is a tricky prospect. The problem is that just as soon as you’ve finished compiling the project in the background, the user is likely to do something annoying like edit their code. Once they do that, the application you just finished compiling is now incorrect – it doesn’t reflect the current state of the user’s code anymore. So, the question is: how do you handle that? The brute force way would be to throw away the entire result of the compilation and start over again. However, since Intellisense depends on compilation being mostly complete, this is impractical – given a reasonably large project, you may never get the chance to give Intellisense because by the time you’re almost done recompiling the whole project, the user has had the chance to type in another line of code, thus invaliding all the work you just did. You’ll never catch up.

To deal with this, we implement a concept we call “partial decompilation.” When a user makes an edit, instead of throwing the entire compilation state away, we figure out the smallest amount of stuff we can throw away and then keep everything else. Since most edits don’t actually affect the project as a whole, this means we can usually throw out minimal information and get back to being fully compiled pretty quickly. Here’s how we do it: each file in the project is considered to be in one of the following states at any one time:

  • NoState: We’ve done nothing with the file.
  • Declared: We’ve built symbols for the declarations in the file, but we haven’t bound references to other types yet.
  • Bound: We’ve bound all references to types.
  • Compiled: We’ve emitted IL for all the properties and methods in the file.

When a project is compiled, all the files in the project are brought up to each successive state. (In other words, we have to have gotten all files to Declared before we can bring any file up to Bound, because we need to have symbols for all the declarations in hand before we can bind type references.) When all the files have reached Compiled, then the project is fully compiled.

Now let’s say that a user walks up to a project that’s reached Compiled state and makes an edit to a file. The first thing that we have to do is classify the kind of edit that the user made. (Keep in mind that “an edit” can actually be an extremely complex one if the user chose to cut and paste one block of code over another block of code.) Edits can generally be broken down into two classifications:

  • Method-level edits, i.e. edits that occurs within a method or a property accessor. These are the most common and also the easiest to deal with because a method-level edit can never affect anything outside of the method itself.
  • Declaration-level edits, i.e. edits that occur in the declaration of a type or type member (method, property, field, etc). These are less common and can affect anyone who references them or might reference them anywhere in the project.

When an edit comes through, it’s first classified. If it’s a method-level edit, then the file that the edit took place in is decompiled to Bound. This involves the relatively small work of throwing away all the IL for the properties and methods defined in the file. Then we can just recompile all the methods and we’re back to being fully compiled. Not a lot of work. Say, though, that the edit is a declaration-level edit. Now, we have to do some more work.

Earlier, when we were bringing files up to Bound state, we kept track of all the intra-file dependencies caused by the binding process. So if a file a.vb contained a reference to a class in b.vb, we recorded a dependency from a.vb to b.vb. When we go to decompile a file that’s had a declaration edit, we call into the dependency manager to determine what files depend on the edited file. We then decompile the edited file all the way down to NoState, because we have to rebuild symbols for the file. Then we go through and decompile all the files that depend on the edited file down to Declared, because those files now have to rebind all their name references in case something changed (for example, maybe the class the file depended on got removed). This is a bit more work, but in most cases the number of files being decompiled is limited and we’re still doing a lot less work than doing a full recompile.

This is kind of the high-level overview of how it works – there are lots of little details that I’ve glossed over, and the process is quite a bit more complex than this, but you get the idea. I’m going to stop here for the moment and pick up the thread again in a few days, because there’s a few more pieces of the puzzle that we have to put into place before we get to explaining the bug.