Why every language needs a language specification…

One of the things that I discovered when I started working on SQL Server is that T-SQL, like VB prior to .NET, has no language specification. This continues to mystify me—how language teams get away without having a language specification for so long. I should probably back up for a moment and explain what I mean when I say “language specification.” I mean a document that:

  • Is public.
  • Is kept up to date.
  • Describes the entire language (syntax, semantics, type system, etc.).
  • Is produced by the team that produces the language itself.

An initial spec that was allowed to lapse doesn’t count, for obvious reasons. Books written by people outside of the team/company don’t count because there’s no way someone who doesn’t live and breath the language every single day can possibly hope to shoot for completeness. Documents that just describe syntax or sort-of describe the language (SQL Server’s Books Online falls into this category) don’t count because often what you need is a holistic description of the language, and for that you need, well, the whole language described. And documents that aren’t public don’t count because what good is a document that no one reads? (Not that many people read language specifications anyway, unless the author’s name happens to be Hejlsberg or Gosling.)

I think the reason that language specifications often don’t get written is that they are a huge amount of work and an incredible pain in the ass. Writing the Visual Basic Language Specification was no mean feat and consumed a whole lot of time that I could have productively spent elsewhere. But I do think that every programming language needs one, for a variety of reasons:

  1. Languages are not algorithms. Although many programs of some sort or another could use a good specifications written down, the reality is that many don’t need one because the code is the specification. For example, I can imagine that there might be some specification written down about how the Excel recalc engine works, but it’s probably not that useful because, well, you could just go look at that code itself if you want to know how the recalc engine works. Most pieces of code can be more easily understood by tracing through the code than by reading a description of them. Languages, however, are a special case—they are an emergent property of the compiler, and although there are various algorithms that contribute to a language (say, the binding rules), much of the most important parts of the language arise from the interplay of the various algorithms. Thus, just having the code often is not really enough to understand what a language is or does.
  2. Specifications keep you honest. And honesty is important when it comes to programming languages. You can cheat like hell when it comes to user interfaces, and most of the time you can get away with it—extraneous menu items or options or notifications or other junk may clutter the UI up a bit but by and large you can deal with it, and you can always “clean up” a UI without too much trouble. You can cheat less with libraries and APIs, but even there you can always come up with a new version of the interface or library and gradually move people. With a language, however, it’s nearly impossible to fix something once it’s in the language. SQL Server’s own deprecation process takes three major releases, which means that a mistake you make in the language will take well over a decade to get rectified, if at all. Having a central language specification helps the team monitor what’s really going in the language and helps keep the team honest about what they’re doing.
  3. Having to explain things forces you to actually try and understand what your language does. I was continually amazed when writing the Visual Basic language specification how superficially I actually understood some features until I tried to write them down and explain them. Features that had been extensively discussed in design meetings and were already prototyped, even. In this way, it’s somewhat analogous to coding—how often I think I understand the solution to a problem until I sit down to write the code and realize how foolish I have been to think I really understood the problem!
  4. You need some kind of institutional memory. It’s all well and good to rely on the one guy who knows everything about everything in the language, but what happens when they retire/quit/move on/get hit by a bus? Now all you’re left with is a mass of language rules and often no idea why things were done one particular way or another. This is not an uncommon problem in general with programming, but it’s even more acute with language design. When I was a young, naïve language designer, there were a number of times when I looked at existing languages and the choices they made and thought to myself, “Boy, is that a dumb design decision.” Then, a couple of years down the road, having ignored some of the wisdom of those who came before me and having run into a brick wall at full speed, I would think, “Oh, I see, that’s why they did that…”

Anyway, I’m not planning on just complaining about this and not doing anything about it. Stay tuned…

You should also follow me on Twitter here.

T-SQL Tuesday #8: Learning and Teaching

T-SQL Tuesday

Since I’m joining the T-SQL community, I thought I’d try my hand at a “T-SQL Tuesday” that I could actually have an opinion about. This week’s question (hosted by Robert Davis, a.k.a. @SQLSoldier on Twitter) is “How do you learn? How do you teach? What are you learning or teaching?” and is very relevant for me because, of course, I just joined the T-SQL team a short while ago and am doing a whole lot of learning at the moment.


How I learn

I was going to say “by doing,” but I don’t think that’s accurate enough because there are lots of kinds of “doing.” I’m reminded of something they said when I was learning to ballroom dance for my wedding reception. They said that when learning anything new, people tend to go through four distinct stages: “unconscious incompetence” (i.e. you don’t know how bad you are), “conscious incompetence” (i.e. you know exactly how bad you are), “conscious competence” (i.e. you’re good but you have to pay attention), and “unconscious competence” (i.e. you’re good and it seems effortless). So when I’m starting something new, I’m doing a lot things but most of what I’m doing is learning just how little I actually know. That’s helpful and necessary, but it’s not exactly what I call “real” learning. The real learning seems to come between the second and third stages-when I’ve discovered just how bad I am and am now working on figuring out how to be less bad. When I get to the fourth stage, the learning starts to taper down and that’s when I really get to enjoy the state of knowing (which I think is also called the state of “flow”) and I get to have a lot of fun.

The interesting implication of this is that when I’m entering a new area, my first attempts are necessarily going to not be that great because I don’t know what I don’t know yet. So the initial doing isn’t really very helpful in learning the area, nor is it likely to look much like what I’m going to end up with if I keep on learning. But it’s only when I’ve got something and I know, at least at some level, how bad it is that I can start learning the area. Ironically, when the true learning starts it mostly looks like anal-retentiveness and neat-freakishness-going over and over and over something I’ve done, trying to make it better and suck less. In other words, to start really learning something I have to take something I’ve already done and go back and start pulling at the loose threads, seeing how it unravels and then figuring out how to reweave it properly. That’s when I really get to figure out how the things are supposed to work.

(I’ll note here that this is the number one mistake that I’ve seen most new programmers make. They’re like the verse from The Rubaiyat of Omar Khayyam:

The Moving Finger writes; and, having writ,
Moves on: nor all thy Piety nor Wit
Shall lure it back to cancel half a Line,
Nor all thy Tears wash out a Word of it.

They write their code once and then abandon it, never returning, always moving on to the next thing. Thus, they never actually get the chance to learn how to do things properly and always stay in that state of unconscious incompetence.)

Ironically, the situation I’m stepping into in SQL Server is perfect for “real” learning because I get to largely shortcut through the first stage of unconscious incompetence. That is to say, there’s already this large, mature artifact (i.e. the SQL Server codebase), so I don’t need to go through the trouble of creating something imperfect-someone’s already done that for me. I can spend just a few short weeks realizing just how little I actually know about anything and then jump straight to pulling threads and seeing what starts coming apart. Metaphorically, of course. I’m not gunning to have SQL Server fall apart on me or anything.

I actually think this can be more fun than starting something brand new and blazing the path, which isn’t the way the world sees it, oftentimes.


How I teach

I think I’m actually going to touch on this in more detail soon, but the short answer is: “by writing.” I’m pretty consciously incompetent when it comes to standing up in front of people and teaching them things, but I’ve been practicing writing for a whole lot longer and am better at it than speaking. And writing is just another form of what I was talking about in the previous section-first, I sit down and try to write down an explanation of whatever it is I’m trying to say. Then I realize how pathetically inadequate it is (most of the time) at saying what I want it to say. Or how little I understand what I’m trying to talk about. So I start pulling on the threads again and seeing what I can unravel and rework. And I find myself learning more not only about the process and practice of writing, but also more about whatever it is I’m trying to explain.

I think writing can be a wonderful way to teach people things, but I think it only really works-even technical writing!-if you follow the dictum of “writing what you know (and love).” In the end, I guess any teaching medium works if the teacher is interested enough in the subject, knowledgeable enough about it, and has a real passion for teaching (as opposed to a passion for having people listen to them, which is something entirely different).


Well, that’s about it. Hope this was interesting!

“Fixing” T-SQL

In a comment to my previous post, Rich asked

Does this mean you’re the person to fix T-SQL programmability?

I honestly don’t know the answer to that question because, coming from the outside, I’m not sure about everything that’s wrong with T-SQL. I’d love to hear more from anyone who’s got an opinion (and any pointers to complaints around the web would be welcome as well). You can also feel free to use my contact form to talk to me directly. I can’t promise I can do anything at this point, but I’m looking to learn.

What I can do is offer two thoughts about why I chose to move over to the T-SQL team:

  1. Even though I’ve been interfacing with SQL Server for nearly 20 years in one capacity or another, I only know a surface amount about it-basically, some standard SQL and that’s about it. A lot of the reason for that is because T-SQL always seemed, well, a little arcane from the outside. I sort of got the basics of querying, but beyond that I never felt I really had the time to figure things because they looked. complicated. Some of that, I’m sure, is just the usual “people look strange when you’re a stranger,” but I have always wondered what might be done to make SQL Server and T-SQL a little more approachable.
  2. Along those lines, one recurring situation that I’ve found with SQL Server (both personally and observing others) is that the perceived impenetrability of T-SQL causes people to waste the wonderful opportunity to leverage the power of the server and instead use up precious time sucking data down to the client just to do a bit of processing that could easily have been done as a part of the query. The question comes to mind: what could we do to enable people to more easily capitalize on the server resources that are available to them, and how can T-SQL play a part in that?

So that’s what my hopes are, we’ll just have to see how it plays out.

Another transition…

After spending a year and a half working on “M”, I’ve decided to make another change in what I’m doing and and move over to the SQL Server Programmability team. That’s the team responsible for things like the T-SQL language and runtime in SQL Server. Working on “M” was a lot of fun and the team was great, but after spending a good, long while down in the bowels of a GLR parser, I decided that that was enough and that it was time to do something else. Working on SQL Server programmability is, in some ways, a combination of all my previous jobs-a bit of data from Access, a bit of runtime from OLE Automation, and a bit of programming language from Visual Basic and “M”. It’s also an interesting challenge-a product that’s both well established and confronting a lot of new challenges. I think it’s going to be quite a bit of fun!

It does mean saying goodbye to “M”, and that was sad (although, really, they’re still in the same division and not that far away), but that’s the way it goes. I’ll be looking forward to their next CTP, which is where people will see a lot of the hard work that’s been going on and the overall direction that the language is headed. There’s a lot of cool stuff coming, and I think people will find it very interesting!

Changing jobs also means that I’m back to drinking from the firehose, learning the ins and outs of the guts of the SQL Server engine, as well as T-SQL. Interesting stuff. Any good T-SQL/SQL Server blogs anyone can recommend?