One of the things that I discovered when I started working on SQL Server is that T-SQL, like VB prior to .NET, has no language specification. This continues to mystify me—how language teams get away without having a language specification for so long. I should probably back up for a moment and explain what I mean when I say “language specification.” I mean a document that:
- Is public.
- Is kept up to date.
- Describes the entire language (syntax, semantics, type system, etc.).
- Is produced by the team that produces the language itself.
An initial spec that was allowed to lapse doesn’t count, for obvious reasons. Books written by people outside of the team/company don’t count because there’s no way someone who doesn’t live and breath the language every single day can possibly hope to shoot for completeness. Documents that just describe syntax or sort-of describe the language (SQL Server’s Books Online falls into this category) don’t count because often what you need is a holistic description of the language, and for that you need, well, the whole language described. And documents that aren’t public don’t count because what good is a document that no one reads? (Not that many people read language specifications anyway, unless the author’s name happens to be Hejlsberg or Gosling.)
I think the reason that language specifications often don’t get written is that they are a huge amount of work and an incredible pain in the ass. Writing the Visual Basic Language Specification was no mean feat and consumed a whole lot of time that I could have productively spent elsewhere. But I do think that every programming language needs one, for a variety of reasons:
- Languages are not algorithms. Although many programs of some sort or another could use a good specifications written down, the reality is that many don’t need one because the code is the specification. For example, I can imagine that there might be some specification written down about how the Excel recalc engine works, but it’s probably not that useful because, well, you could just go look at that code itself if you want to know how the recalc engine works. Most pieces of code can be more easily understood by tracing through the code than by reading a description of them. Languages, however, are a special case—they are an emergent property of the compiler, and although there are various algorithms that contribute to a language (say, the binding rules), much of the most important parts of the language arise from the interplay of the various algorithms. Thus, just having the code often is not really enough to understand what a language is or does.
- Specifications keep you honest. And honesty is important when it comes to programming languages. You can cheat like hell when it comes to user interfaces, and most of the time you can get away with it—extraneous menu items or options or notifications or other junk may clutter the UI up a bit but by and large you can deal with it, and you can always “clean up” a UI without too much trouble. You can cheat less with libraries and APIs, but even there you can always come up with a new version of the interface or library and gradually move people. With a language, however, it’s nearly impossible to fix something once it’s in the language. SQL Server’s own deprecation process takes three major releases, which means that a mistake you make in the language will take well over a decade to get rectified, if at all. Having a central language specification helps the team monitor what’s really going in the language and helps keep the team honest about what they’re doing.
- Having to explain things forces you to actually try and understand what your language does. I was continually amazed when writing the Visual Basic language specification how superficially I actually understood some features until I tried to write them down and explain them. Features that had been extensively discussed in design meetings and were already prototyped, even. In this way, it’s somewhat analogous to coding—how often I think I understand the solution to a problem until I sit down to write the code and realize how foolish I have been to think I really understood the problem!
- You need some kind of institutional memory. It’s all well and good to rely on the one guy who knows everything about everything in the language, but what happens when they retire/quit/move on/get hit by a bus? Now all you’re left with is a mass of language rules and often no idea why things were done one particular way or another. This is not an uncommon problem in general with programming, but it’s even more acute with language design. When I was a young, naïve language designer, there were a number of times when I looked at existing languages and the choices they made and thought to myself, “Boy, is that a dumb design decision.” Then, a couple of years down the road, having ignored some of the wisdom of those who came before me and having run into a brick wall at full speed, I would think, “Oh, I see, that’s why they did that…”
Anyway, I’m not planning on just complaining about this and not doing anything about it. Stay tuned…
You should also follow me on Twitter here.