One of the side projects that I'm working on in addition to this weblog is a managed
scanner and parser for the Visual Basic .NET language. I started on the project because
I'd really like to write some project analysis tools that I could use to determine
things about how people use the language, but as it's gone on I'm hoping to also release
it into the community as a sample. (One of the wrinkles is that I'm "writing" the
managed parser by looking at the existing unmanaged parser, so it's not like the code
I'm writing is completely of my own invention, although it is largely so.) I think
that the more that people can work with a language in an automated way, and the more
tools people can write for a language, the better it is for people who use that language.
Anyway, we'll just have to see. I still have to finish it first...
After I finished the scanner (scanners are easy), I started parsing expressions and
have been working my way up the parse tree hierarchy. I've recently reached type members
and, in particular, methods. An interesting thing is the way that VB's line orientation
interacts with MustOverride methods when parsing. In C#, parsing methods is pretty
simple because after a method header you can see either a semicolon or an open curly
brace. If it's the former, then you've got an abstract method declaration; if it's
the latter, then you've got a concrete method declaration. Whether or not you specified
"abstract" as a modifier on the declaration is really something for declaration semantics
to sort out later on. In VB, though, the following fragment is ambiguous if you don't
look at the modifiers:
<modifiers> Sub Foo()
Dim Bar As Integer
In other words, once you've gotten through the method header, the next line starts
with "Dim," which could either mean that Foo was a MustOverride method and Bar is
a field, or that Foo was a concrete method and Bar is a local. To figure out which
is which, you absolutely have to look at the modifiers to see if "MustOverride"
is there. This is the kind of thing that drives formal grammar writers nuts because
it means you have to hork up your grammar productions to make it all come out right.
It doesn't make such a big deal, though, if you hand-code your parser, which is what
we do for VB. (Hand-coded parsers vs table-driven parsers seems to be one of those
religious arguments that language people get into. Let me just say I don't take a
formal position on the question.)
Interestingly, there are at least a few other places in the language (esp. in terms
of object creation expressions versus array creation expressions) where things get
complicated like this. But overall, it's been pretty smooth sailing. I'll let everyone
know as the project progresses.