Monthly Archives: March 2004

Background compilation, part 2

When I last left off with background compilation, we were talking about how compilation and decompilation worked within a project. But what about between projects? This is where things start to get a little more interesting and problematic.

The simplest case is two VB projects that have a reference between them. Because they’re both source code projects, we can handle compilation and decompilation by using the same scheme that we use inside of projects – in essence, all the VB projects in a solution look like one big project that happens to produce multiple assemblies. (This is why Mike has his problem with the compiler complaining about two types with the same name in two projects that don’t reference each other, although we’re planning to fix that particular error for Whidbey.) So we just track dependencies between files regardless of which project they are in and everything works fine. There are a few added complications because now we need to track a project’s compilation level, but they aren’t really worth mentioning.

Now let’s talk about referenced assemblies such as mscorlib, System, System.Windows.Forms, etc. When you reference a compiled assembly, we have to build a symbol table for it just like we have to for a source project. The way we handle this is to treat each assembly as it’s own “metadata project” with a single file in it (i.e. the assembly). Metadata projects go through the same compilation steps as source projects: NoState, Declared, Bound and Compiled. However, the compilation process for a metadata assembly can be much simplified, for the following reasons:

  • Since a metadata assembly is already compiled, the Compiled state has no meaning.
  • Metadata assemblies rarely, if ever, change.
  • Metadata assemblies only ever reference other metadata assemblies (since VS doesn’t support circular builds, at least not in the IDE).

Since the Compiled state makes no sense for metadata assemblies, we ignore it. And since metadata assemblies don’t really ever change (or change very infrequently) and only depend on other metadata assemblies, we can skip much of the decompilation work that I described in the first part of this discussion. In fact, we can make it very simple: when any metadata project changes, we just decompile every metadata project in the solution to NoState and decompile every source project down to Declared. Essentially, we throw away all metadata information and start over. Since metadata projects don’t really change often, usually only when you add or remove a reference, this is a reasonable simplification.

Right? Wrong.

Unfortunately, much of the logic of the previous few paragraphs is faulty because it neglects something fairly significant: multi-language solutions. Let’s say that you have a solution with a VB project and a C# project, and the VB project references the C# project. How does that C# project look to the VB project? It’s not a source project because the VB compiler only understands VB code. Yes, that’s right, class, it looks like a metadata project. And C# projects completely violate 2 of the 3 bullet points I listed above: they can change frequently (i.e. with every rebuild) and they can have references into VB source projects.

Here’s where the train wreck happens. Let’s say you’ve got three projects: VB1, VB2 and CS1. VB1 has a reference to VB2 and CS1. CS1 has a reference to VB2. When we go to load in the metadata for CS1, the compiler finds a reference to type Foo in VB2. But because we assume that metadata projects can’t refer to source projects, we only look for Foo in the other metadata projects and fail to find it. So we mark Foo as a bad type. Now VB1 tries to call some method in CS1 that returns a Foo. As we compile the call, we notice that the type Foo is bad, so we generate an error that says something along the lines of “We can’t find ‘Foo’ in ‘VB2’. Add a reference to ‘VB2.'” And, of course, the user ends up scratching his head because VB1 already has a reference to VB2. (Even better, it’s possible to get error messages along the lines of “Can’t convert type ‘Foo’ to type ‘Foo’.” if you try and use a Foo you got from CS1 and a Foo you got from VB2 together.)

OK, you say, that’s bad. But why don’t you just allow CS1 to lookup Foo in VB2’s symbol table? Then this would all work. And it would. Until the first time you actually edited VB2 and caused the project to decompile. If you’ll remember, we assumed that metadata projects don’t need to participate in decompilation. So now VB2 decompiles and CS1’s symbol table is left with a bogus pointer off into hyperspace because it didn’t know how to handle the decompilation properly. (We could just decompile the world at this point, but the performance of that would be horrendous.)

The real fix is to make metadata projects work like source projects – make them track intra-file dependencies and decompile properly. The problem is that this fix requires a fundamental reworking of our project system, which is a major, major undertaking because we made so many faulty assumptions about metadata projects. Rewriting a basic piece of the compiler like this requires an extended period of time for stabilization and a massive amount of testing to ensure we got everything right. Since we found this very late in the VS 2002 cycle, it was too late to make a change of this magnitude without causing a significant slip to the entire VS/.NET/ASP.NET product agglomeration. And the VS 2003 cycle was way too short to do this kind of work. This will be fixed, I can promise everyone, for VS 2005.

All that said, I think it’s worth acknowledging that this was a screwup on our part, plain and simple. There are reasons why multi-language solution testing came online so late, but none of the really justify the pain and annoyance that this has caused (and still causes) for customers. It doesn’t really make it better, but we really apologize for not fixing this problem in time. As always, we strive to do better.

I will add that there is a workaround for the problem. If you create a reference between VB projects using a file reference (i.e. using the “.NET” tab and browsing to the actual DLL) instead of a project reference (i.e. using the “Projects” tab), then you’ll force the compiler to see all the references as metadata projects and you won’t get weird errors. The downside is that you’ll have duplicate symbol tables for VB projects that you reference that are also in your solution. It’s an imperfect solution, but it’s all there is for the moment.

So ends the lesson on background compilation… All of this material will be on the test.

Non-zero lower bounded arrays (the other side of the coin)

Eric writes why, from the C# perspective, the .NET Framework doesn’t “really” support arrays with lower bounds other than zero. He trots out history as the reason why C# doesn’t support them, saying that

Understanding how zero-based indexing works is the secret handshake of the programming world. […] We’re not going to try to change our brain wiring just because some young whippersnapper is having trouble remembering that the first index is zero. […] Or, to put it another way, developers have a huge investment in hardwired things like this, and changing them will not make your customer happy.

Of course, this argument only applies to programmers schooled solely in C-derived languages. For VB programmers, the opposite is true: prior to VB .NET 2002, VB had no “secret handshake” for programmers to learn. We allowed programmers to declare arrays that had bounds from 1 to 20, or from 1001 to 2000 or whatever. Even more puzzling, Eric points out that the CLR does support such arrays. So if there’s no historical problem and there’s no technical problem, why doesn’t VB have non-zero lower bound arrays?

Good question.

You see, when the CLR designers sat down to come up with their array design, they were caught between two competing camps. On one side of the aisle was VB, which had arrays that could have any lower bound you liked, and on the other side of the aisle was the C-derived languages, which didn’t. One solution to this dilemma would have been to simply allow arrays to have non-zero lower bounds, and then just let the C-derived languages not allow their developers to use a lower bound other than zero. However, there was a major problem with this scheme. Many (though certainly not all) C-derived language programmers tend to be, well, a bit obsessed about performance. And the simple fact than an array could have a non-zero lower bound meant that the JIT optimizer would not be able to perform certain kinds of code optimization that C-derived language programmers are used to. This loss of performance (which in some cases could be non-trivial) was simply unacceptable to the C-derived languages.

To finesse this issue, the CLR designers came up with a compromise: there would be two kinds of arrays in the CLR. One kind, which I’ll call “arrays,” were just like normal VB arrays – they could have non-zero lower bounds. The other kind, which I’ll call “vectors,” were a restricted type of array: they could only be 1-dimensional, and their lower bound was fixed to be zero. This compromise allowed VB to have its arrays, and also allowed the C-derived languages to optimize the most common array case. Everyone was happy, right?

Well, not exactly. You see, the problem is that in the compromise scheme that the CLR devised there are actually two kinds of 1-dimensional arrays: vectors and 1-dimensional arrays. The only difference between the two is that the former has a fixed lower bound of zero and the second doesn’t. But because vectors are so highly optimized, they don’t store their lower bound in the array instance – it’s just assumed. What this means it that vectors and 1-dimensional arrays are not assignment compatible. This means that you can’t take a 1-dimensional array and convert it to a vector. (Weirdly enough, I think the CLR actually allows you to convert a vector to a 1-dimensional array and they handle the fact that the two have different layouts, but I may be wrong on that point.)

This becomes a major problem because many APIs in the base class library are written using C#, which uses vectors. If VB only used non-zero lower bound arrays, then you wouldn’t be able to pass a 1-dimensional array from VB into many of the base class library APIs! This was clearly unacceptable. But what could we do about it?

We did consider exposing this array mess to the user through the type system. In this way of thinking, VB would have two kinds of arrays: those that could have lower-bounds of zero and those that couldn’t. However, we discarded this idea for two reasons:

  1. Having two different types of arrays that were not assignment compatible seemed incomprehensible from a user’s perspective. It seemed likely that nobody would ever keep them straight and remember which was supposed to be used when. We felt it would be a disaster.
  2. As Eric points out, even if C-derived language programmers can’t declare a non-zero lower bound array, they’d still have to deal with one that was passed to them by a VB programmer. Given the amount of thought many C-derived language programmers give to VB, this was likely to cause serious problems – even in our own base class libraries – when people attempted to use non-zero lower bound arrays in APIs that weren’t equipped to deal with them.

In the end, we decided that we had to give in and just have zero-lower bound arrays. It was a loss for many VB programmers, but the alternatives, given the situation, were worse…

(Actually, if you can believe it, this is a very simplified version of the long saga of arrays on the CLR. Suffice it to say, this was one of the most complicated aspects of getting .NET out of the door.)

Names changed to protect the innocent

Robert lets the cat out of the bag that the name of the next version of the Visual Basic product is going to just be “Visual Basic 2005 xxx Edition” and that the official name of the language is going to be “Visual Basic.” No “.NET” in sight. Thus, my adage that

…the one thing you can count on at Microsoft is that there will be absolutely no consistency or constancy to names over time.

is proven once again.

Not being involved in the name change process in the slightest, I don’t know what the marketing logic behind the name change was, but I suspect that as we move further in time away from the COM/.NET schism there is not as much of a need to emphasize the unique .NET nature of VS, VB, C#, etc. It does mean that I’m probably going to have to change the title of the second edition of my book (assuming it does well enough to merit a second edition)…

Oh, and as far as I know, we’re keeping sequential numbers to refer to the language itself. So the language in the Visual Basic .NET 2002 product was Visual Basic .NET 7.0. The language in the Visual Basic .NET 2003 product was Visual Basic .NET 7.1. The language in the Visual Basic 2005 product is going to be Visual Basic 8.0. Confused yet?

A sad farewell…

I just wanted to bid a public farewell to Cameron, who’s leaving the team to head off to Japan to learn Japanese and, oh, yeah, hang out with his girlfriend who just happens to live there. I’m very sad to see him go – Cameron’s been a major part of the team and it’s been a lot of fun working with him. His technical expertise and skills will be sorely missed and, well, if things don’t work out in Japan, he’s always got an invitation to come back. (Although we’re all hoping that things do work out in Japan.) I have a lot of respect for someone who’s willing to take the plunge and go after something that they really want to do, even if it means giving up something that they really like doing and feel secure in.

Best wishes to Cameron, good luck, and looking forward to further blog entries!

The truth about choosing keywords

One comment that comes up over and over is “Why did you chose <fill in some keyword or syntax>? It looks <ugly, stupid, non-VB-like, inconsistent>!” So, how do we come up with new syntax?

We put all the options up on a dart board and throw darts at them blindfolded.

Seriously.

OK, not seriously. But sometimes it seems that way. You see, I’d like to pretend here that we have some formal intellectual process that allows us to arrive at the ideal syntax 100% of the time with complete accuracy. But the sad fact of the matter is that syntax, for better or for worse, is largely a matter of personal preference. Show the same syntax to 10 different people, and you’re going to get 10 different opinions. In the end, it almost comes down to just picking a damn syntax and sticking with it no matter what.

This isn’t to say that there aren’t considerations that come into play when thinking about syntax. Reusing characters already used in another context is not something that shouldn’t be done lightly. The availability of keys on the majority of keyboards in the world is another consideration. (Our test lead took to joking for a while that we should use the euro character for something, just to give the Europeans a leg up on the Americans for once.) And the number of characters in a keyword is a real consideration. I personally have some regrets about the length of some of the keywords that we chose for inheritance, for example.

But in the end, it comes down to aesthetics and that is totally a personal thing. The bitshift operators are an excellent example of this: when we first discussed them in the VS 2002 timeframe, we ended up having to table the whole feature because none of us could agree on whether we should choose <<, Shl, BitShl, ShiftTheValueLeftTheFollowingNumberOfBits, etc. We would have ended up cutting the feature because of time anyway, but it’s an instructive example.

The problem with something like choosing a bitshift operator is that everyone’s cultural context comes into play. For people familiar with C++, choosing << and >> seems natural because that’s what they’re used to. For people familiar with assembly, Shl and Shr might seem more natural. But given that VB has never had them, what to choose? Even looking at the issue of “word vs character” doesn’t help here, because even though we do have a tradition of using words for operators (“Mod”), we also have a tradition of using obscure characters for operators, too (“^”). In the end, my view was that the fact that << and >> visually encoded some indication of their function made them slightly preferable over Shl and Shr, which seemed a bit cryptic to my eyes.

My view eventually prevailed, by the way, because everybody else who disagreed with me left the team in the interim. Victory by attrition. But that just emphasizes the personal nature of the choice. In the end, I think it’s more of an artistic decision than a technical decision – less “how strong does this beam need to be to hold this ceiling up?” and more “what color should we paint the bedroom?”

Which doesn’t mean that we don’t agonize endlessly over what the right syntax is for something. The generics syntax is an excellent example of this, and deserves an entry of it’s own in the near future.

God, I am such a moron sometimes…

So after updating the software, I thought I’d go through and just remove some of the bogus Trackback entries that .Text apperars to be dropping in my comments lately. So I pull up a query of comments and start deleting entries. Then I start getting an error about “the row has already been deleted.” Huh? Oh, yeah, my query included a join into the table of blog entries, so when I deleted the comment, I deleted the entry it was commenting on! Which cascaded into the comments table, etc.

The worst part? This is the second time I’ve done something stupid like this.

Thankfully, I have regular backups and everything appears to be restored. But this is definitely one of those “learn it the hard way” kind of things…

Background compilation, Intermission

I’m still planning on finishing my exegesis on background compilation, but I thought I would take a moment to address something that came up in the comments on Part 1 and which we get from time to time: why can’t I turn off background compilation?

Unfortunately, we usually get this question because a customer has got a project where the background compilation is causing the IDE to become unacceptably slow. There are, of course, theoretical maximums of what any program can handle (and users love pushing those maximums), but there have been problems with background compilation in the past. After shipping VS 2002, we discovered that there were some bugs in the way that we did our dependency calculations that resulted in too much decompilation occuring. This caused problems on some very large projects and we fixed it in VS 2003. If you’ve got a large project in VS 2003 that’s still got problems with responsiveness in the IDE and would be willing to give it to us to test against Whidbey, drop me a line and I’ll connect you with someone. We definitely want to fix any problems!

The reason, though, why you can’t turn of background compilation is that the IDE completely depends on it. The information provided by background compilation drives Intellisense, it enables the WinForms designer (as well as all other designers), it fills the dropdowns at the top of the window, it drives the object browser, it enables go to definition, etc, etc, etc. Essentially, without background compilation, the IDE becomes Notepad.exe with syntax coloring. (Coloring, interestingly enough, does not require background compilation because it’s entirely based off the lexical grammar of the language and requires no further knowledge of the code.) So that’s why we don’t let you turn it off.

But I’ll emphasize again: we believe there’s no reason that large projects shouldn’t perform well. And we work hard on ensuring that, so let us know if there are still problems!

The cat’s out of the bag…

Oh, and yeah, my book is finally available! I’ve actually got a real, printed copy of it sitting here on my desk and Amazon claims that they’re shipping it within 24 hours! (It may not be on the shelves of your local Barnes and Noble for a little while longer, though.)

Overall, I have to admit that I find this kind of frightening, somewhat akin to what it must be like taking your child off to their first day at school. Here’s something that’s been a part of you for quite some time, and now it’s out there at the mercy of the big, bad world. Will it make friends? Will it get beaten up and come home crying? I guess now only time is going to tell….

So, everyone, go out and buy a copy and tell me (and everyone else on Amazon) how wonderful it is! Here’s a handy link to help you get started:

Hope you all like it….

P.S. – There’s already someone selling a used copy of the book on Amazon. I’ve had my own copy something like 48 hours… how does that happen?

Why do hobbyists matter?

After Kathleen worried about losing the hobbyist programmer on .NET, Rory came back with the question “Should the hobbyist programmer matter to Microsoft?” His thesis, in a nutshell, was:

I say that we don’t worry about the hobbyists – don’t dissuade them from coding in .NET, but don’t cater to them either.

I understand where he’s coming from, but I think that the terminology is confusing the issue. When we talk about “hobbyist programmer,” it evokes images of guys tinkering in their garages or in their basements on the weekend. And, yeah, maybe if it was just the equivalent of a bunch of guys (or gals) building model trains or making furniture or rebuilding old cars, it wouldn’t matter so much. But the reality is that the hobbyist programmer doesn’t just program on the weekends – they’re also programming during the week at their “real” jobs.

Before I started working on VB, I worked on Access. And I cannot count the number of times that customer testimonials started along the lines of “I was fooling around with Access one day and managed to write this small app to help manage my group. Once my department found out about it, they started using it to manage the department. Now my whole company uses it!” One of the key aspects of Access’s success was this kind of “viral adoption” where some tinkerer used it to solve some local problem that ended up solving a company-wide problem. The same holds for VB – lots of VB applications in corporations started life as someone’s side project. As I put it in a recent presentation, “Throwaway applications have a way of becoming mission critical applications.” And where do those throwaway applications come from? Hobbyist programmers.

With the spread of computing into more and more industries, the people who don’t consider themselves programmers become more and more important because they’re the beachhead for “real programming” to make its way in. For example, the throwaway applications that hobbyists write ultimately helps drive demand for professional programmers to come in and “professionalize” the applications so that they scale correctly for the corporation. Also, as hobbyist applications make companies more open to the benefits of technology, they open the door to commercial software that can augment or replace the homegrown applications and maybe do a better job. And, of course, hobbyist programmers usually need lots of help, which drives demand for websites, magazines, books, consultants, etc.

So, in much the same way that small businesses serve a vital function in keeping the economy going so that large corporations can thrive, hobbyists play a vital role in sustaining the ecosystem that supports the professional programmers. Even if the professional programmers don’t always appreciate that…