Background compilation, part 2

When I last left off with background compilation, we were talking about how compilation and decompilation worked within a project. But what about between projects? This is where things start to get a little more interesting and problematic.

The simplest case is two VB projects that have a reference between them. Because they’re both source code projects, we can handle compilation and decompilation by using the same scheme that we use inside of projects – in essence, all the VB projects in a solution look like one big project that happens to produce multiple assemblies. (This is why Mike has his problem with the compiler complaining about two types with the same name in two projects that don’t reference each other, although we’re planning to fix that particular error for Whidbey.) So we just track dependencies between files regardless of which project they are in and everything works fine. There are a few added complications because now we need to track a project’s compilation level, but they aren’t really worth mentioning.

Now let’s talk about referenced assemblies such as mscorlib, System, System.Windows.Forms, etc. When you reference a compiled assembly, we have to build a symbol table for it just like we have to for a source project. The way we handle this is to treat each assembly as it’s own “metadata project” with a single file in it (i.e. the assembly). Metadata projects go through the same compilation steps as source projects: NoState, Declared, Bound and Compiled. However, the compilation process for a metadata assembly can be much simplified, for the following reasons:

  • Since a metadata assembly is already compiled, the Compiled state has no meaning.
  • Metadata assemblies rarely, if ever, change.
  • Metadata assemblies only ever reference other metadata assemblies (since VS doesn’t support circular builds, at least not in the IDE).

Since the Compiled state makes no sense for metadata assemblies, we ignore it. And since metadata assemblies don’t really ever change (or change very infrequently) and only depend on other metadata assemblies, we can skip much of the decompilation work that I described in the first part of this discussion. In fact, we can make it very simple: when any metadata project changes, we just decompile every metadata project in the solution to NoState and decompile every source project down to Declared. Essentially, we throw away all metadata information and start over. Since metadata projects don’t really change often, usually only when you add or remove a reference, this is a reasonable simplification.

Right? Wrong.

Unfortunately, much of the logic of the previous few paragraphs is faulty because it neglects something fairly significant: multi-language solutions. Let’s say that you have a solution with a VB project and a C# project, and the VB project references the C# project. How does that C# project look to the VB project? It’s not a source project because the VB compiler only understands VB code. Yes, that’s right, class, it looks like a metadata project. And C# projects completely violate 2 of the 3 bullet points I listed above: they can change frequently (i.e. with every rebuild) and they can have references into VB source projects.

Here’s where the train wreck happens. Let’s say you’ve got three projects: VB1, VB2 and CS1. VB1 has a reference to VB2 and CS1. CS1 has a reference to VB2. When we go to load in the metadata for CS1, the compiler finds a reference to type Foo in VB2. But because we assume that metadata projects can’t refer to source projects, we only look for Foo in the other metadata projects and fail to find it. So we mark Foo as a bad type. Now VB1 tries to call some method in CS1 that returns a Foo. As we compile the call, we notice that the type Foo is bad, so we generate an error that says something along the lines of “We can’t find ‘Foo’ in ‘VB2’. Add a reference to ‘VB2.'” And, of course, the user ends up scratching his head because VB1 already has a reference to VB2. (Even better, it’s possible to get error messages along the lines of “Can’t convert type ‘Foo’ to type ‘Foo’.” if you try and use a Foo you got from CS1 and a Foo you got from VB2 together.)

OK, you say, that’s bad. But why don’t you just allow CS1 to lookup Foo in VB2’s symbol table? Then this would all work. And it would. Until the first time you actually edited VB2 and caused the project to decompile. If you’ll remember, we assumed that metadata projects don’t need to participate in decompilation. So now VB2 decompiles and CS1’s symbol table is left with a bogus pointer off into hyperspace because it didn’t know how to handle the decompilation properly. (We could just decompile the world at this point, but the performance of that would be horrendous.)

The real fix is to make metadata projects work like source projects – make them track intra-file dependencies and decompile properly. The problem is that this fix requires a fundamental reworking of our project system, which is a major, major undertaking because we made so many faulty assumptions about metadata projects. Rewriting a basic piece of the compiler like this requires an extended period of time for stabilization and a massive amount of testing to ensure we got everything right. Since we found this very late in the VS 2002 cycle, it was too late to make a change of this magnitude without causing a significant slip to the entire VS/.NET/ASP.NET product agglomeration. And the VS 2003 cycle was way too short to do this kind of work. This will be fixed, I can promise everyone, for VS 2005.

All that said, I think it’s worth acknowledging that this was a screwup on our part, plain and simple. There are reasons why multi-language solution testing came online so late, but none of the really justify the pain and annoyance that this has caused (and still causes) for customers. It doesn’t really make it better, but we really apologize for not fixing this problem in time. As always, we strive to do better.

I will add that there is a workaround for the problem. If you create a reference between VB projects using a file reference (i.e. using the “.NET” tab and browsing to the actual DLL) instead of a project reference (i.e. using the “Projects” tab), then you’ll force the compiler to see all the references as metadata projects and you won’t get weird errors. The downside is that you’ll have duplicate symbol tables for VB projects that you reference that are also in your solution. It’s an imperfect solution, but it’s all there is for the moment.

So ends the lesson on background compilation… All of this material will be on the test.

4 thoughts on “Background compilation, part 2

  1. Matt

    So… how did the conversation go when this problem was discovered? 🙂

    I’m guessing some very colorful words were used. Must have been one heck of a meeting in War that day.

    Reply
  2. Pingback: Richard Clark

  3. Pingback: Corrado's BLogs

  4. Pingback: Anonymous

Leave a Reply

Your email address will not be published. Required fields are marked *