Category Archives: Language Design

The Avalanche Theory of Programming Language Evolution

One of the other things that listening to Bjarne Stroustrup reminded me of is an idea that I’ve had kicking around in my head for quite some time about the way programming languages evolve. I can’t exactly quote Bjarne here, but I think he said something to the effect of, “One of the reasons I give this talk is because people still think of C++ as it existed back in 1986, not as it exists today.” Which reminded me of something I read a long time ago about avalanches and the way that they work.

The interesting thing about avalanches is that, should you ever be unlucky enough to be caught in one, they go through two distinct phases. In the first phase, while the avalanche is making its way down the slope, it behaves almost as if it were a liquid. Everything is moving, and it’s theoretically possible (if you don’t get bashed in the head by a rock or tree or something, and you can tell which way is “up”) to “swim” to the top of the avalanche and sort of float/surf on top of it. And it turns out that it’s pretty important to do this if you happen to be caught in an avalanche. Because once the avalanche hits the bottom of the slope and stops moving, it suddenly transitions to another state: a solid. At this point, if you aren’t at least partially free of the avalanche (i.e. on top of it), you’re totally screwed because you are now basically encased in a block of concrete. Not only will you be totally unable to move, but you’re quickly going to suffocate because there’s no way to get fresh air. Unless someone is close by with a locator and a shovel, you’re basically dead.

Programming languages, as far as I can tell, seem to evolve in a way similar to avalanches. Once a programming language “breaks free”–and, to be honest, most never do–it starts accelerating down the slope. At this point, the design of the language is very fluid as new developers pour into the space. And by “design of the language,” I don’t just mean the language spec but everything that makes up the practical design of a language, from the minute (coding standards, delimiter formatting, identifier casing) to the large-scale (best practices for APIs, componentization, etc.). Everything is shifting and changing and growing and it’s very exciting!

And then… boom! It all stops and freezes into place, just like that. Usually, this happens around maybe the second or third major release of the language, long enough for the initial kinks to get worked out and for things to cool and take shape. And the interesting thing about it is that it’s not so much that the language designers are done, as it is that from that point on the language design effectively become fixed in the minds of the public. There are number of reasons, but I think that it has to do with reaching a critical mass of things like books, blog posts, articles, samples, course materials, etc. and a critical mass of developers who were trained in a particular version of a language. Once enough of that stuff is out there and enough people have adopted a particular usage of the language, they effectively become the “standard” for the language.

And from that point on, I think, the language designer is essentially like those poor souls trapped at the bottom of the avalanche–they’re still alive and kicking (well, for a while, at least) but they increasingly find that they can’t really move anything. They can pump out new features and new ideas, but smaller and smaller slices of the developers in those languages are even aware of those features, much less willing to retrain (and rethink) to take advantage of them.

It makes me seriously wonder about all the work we did in VS 2008 to add things like LINQ to VB and C#. I suspect the .NET development avalanche largely came to rest around the time of VS 2005 (or maybe even VS 2003), and while I think a lot of high-end developers like and use LINQ, I don’t know that it ever penetrated the huge unwashed masses of .NET developers. I’m not saying we shouldn’t have done it–I think at very least, it helps push the language design conversation forward and influences the next generation of programming languages as they start their descent down the slopes–but just that I wonder.

And I also have to say that I’m very much a participant in this process. I originally learned C++ all by myself back in the ancient year of 1989 by buying myself Bjarne’s first C++ book and a copy of Zortech C++ for my IBM 386 PC. For a long, long time, C++ was effectively the C++ that I learned way back when C++ compilers were just front-ends for C compilers. Even with lots of exposure to more modern programming concepts while working on VB and so on, it’s taken me a long time to break the old habits and stretch within the C++ language. And, I have to admit, it’s really not a bad language once I did it. But I suspect I’m part of a somewhat small portion of the C++ market.

Anyway, all it really means is that I expectantly scan the metaphorical slopes, waiting for the large “BANG!” that will herald the descent of a new avalanche and the chance to try and surf it once again…

You should also follow me on Twitter here.

The Secret to Understanding C++ (and why we teach C++ the wrong way)

A little over the year ago, I asked “How on earth do normal people learn C++?” which reflected some of my frustration as I re-engaged with the language and tried to make sense of what “modern” C++ had become. Over time, I’ve found that as I’ve become more and more familiar with the language again, things have begun to make more sense. Then a couple of days ago I went to a talk by Bjarne Stroustrup (whose name, apparently, I have no hope of ever pronouncing correctly) and the secret of understanding C++ suddenly crystallized in my mind.

I have to say, I found the talk quite interesting which was a huge accomplishment because: a) I usually don’t like sitting listening to talks under the best of circumstances, and b) he was basically covering the “whys and wherefores” of C++, which is something I’m already fairly familiar with. However, in listening to the designer of C++ talk about his language, I was struck by a realization: the secret to understanding C++ is to think like the machine, not like the programmer.

You see, the fundamental mistake that most teachers make when teaching C++ to human beings is to teach C++ like they teach other programming languages. But with the exception of C, most modern programming languages are designed around hiding as many details of how things actually happen on the machine as possible. They’re designed to allow humans to explain to the computer what they want to do in a way that’s as close to the way humans think (or, at least, how engineers thing) as possible. And since C++ superficially looks like some of those languages, teachers just apply the same methodology. Hence, when you learn about classes, teachers tend to spend most of their time on the conceptual level, talking about inheritance and encapsulation and such.

But the way Bjarne talks about C++, it’s clear that everything that C++ does is designed while thinking hard about the question how will this translate to the machine level? This may be a completely obvious point for a language whose main purpose in life is a systems programming language, but I don’t think I’d ever really groked how deeply that idea is baked in to C++. And once I really looked at things that way, things make a lot more sense. Instead of teaching classes at just the conceptual level, you really need to teach classes at the implementation level for C++ to make sense.

For example, I’ve never been in a programming class that discussed how C++ classes are actually implemented at runtime using Vtables and such. Instead, I had to learn all that on my own by implementing a programming language on the Common Language Runtime. The CLR hides a lot of the nitty-gritty of implementing inheritance from the C# and VB programmer, but the language implementer has to understand it at a fairly deep level to make sure they handle cross-language interop correctly. As such, I find myself continually falling back on my CLR experience when looking at C++ features and thinking, “How is this supposed to work?” I can’t imagine how people who haven’t had to confront these kinds of implementation-level details figure it out.

It makes me wonder if a proper C++ programming course would actually work in the opposite direction of how most classes (that I’ve seen) do it. Instead of starting at the conceptual level, start at the machine level. Here is a machine: CPU, registers, memory. Here’s how the basic C++ expression map to them. Here’s how basic C++ structures map to them. Here’s how you use those to build C++ classes and inheritance. And so on. By the time you got to move semantics vs. copy semantics, people might actually understand what you’re talking about.

You should also follow me on Twitter here.

How on earth do normal people learn C++?

I’m asking this seriously. How?

One dirty little secret around Microsoft is how little “real” C++ code is written around here. I think a lot of it has to do with the fact that when C++ was first coming into it’s own there were a number of high profile projects that enthusiastically adopted C++ but ran into a lot of problems. Some of those problems had to do with the immaturity of the tools (produced by Microsoft), some of them had to do with a lack of experience with C++ (since, of course, it was relatively new at the time), and some of them had to do with the “when you have a hammer…” effect. The end result is that although a lot of projects I’ve worked on have had “.cpp” file extensions on their source code, in many cases you could rename the extension to “.c” and it would compile with very few changes.

Lately, though, I’ve been having to do a lot more modern-ish C++ programming, and while I have an amazed respect for the sheer amount of power available to the C++ programmer nowadays, I am also baffled how anyone can actually understand what the hell they are doing half the time. Don’t get me wrong, I feel like I get by pretty well in C++, but then I spent a decade designing a OOP language, building a compiler for it, and debugging the whole thing. There are lots of times when I’ve been able to survive in C++ only because I can fall back on a mental type system model that was built up through a lot of blood, sweat, and tears. I wonder how people who haven’t had that particular experience actually grok a lot (sometimes, any) of what C++ does.

I’m guessing people just throw a lot of code at the wall until something sticks, or maybe just copy and paste from Stack Overflow. I don’t know. But, seriously, sometimes I think you should have a license before you should be allowed to attempt C++.

Either that, or I’m just being dense. Sadly, that’s always a possibility.

The Use/Build Fallacy

Working in the language space, especially in language design, you frequently encounter people who fall victim to what I call the “Use/Build Fallacy.” It goes something like this:

Because I know how to use something, I know how to build it as well.

This fallacy is best illustrated by a story I heard from a friend who’s a teacher (another profession that frequently has to deal with this). She was teaching middle-school when teacher conferences rolled around. Talking to the father of one of her students, she explained to him that his daughter was having a lot of trouble in English class and that, based on her observations of how hard the daughter was working, she was pretty sure that the daughter had some sort of language learning disability. She therefore strongly recommended that he take his daughter to an expert to get tested, and that she be tutored by someone trained to deal with the specific kind of learning disability. The father was nonplussed, mainly because he didn’t like the idea that all this would cost him money. “Can’t you just help her more in class?” he asked. My friend explained that she was helping her all she could, but she wasn’t an expert in diagnosing learning disabilities and his daughter really needed to see someone who had the appropriate training.

After a bit of back-and-forth, the father finally got exasperated and said, “Fine, I’ll just tutor her myself! I mean, how hard could it be? I went to school!” My friend then shot back, “Look, you’re a general contractor, right? What would you think if I came to you and said, ‘I don’t need you to build my house—I’ve lived in a house before, so how hard could it be to build one myself?” This, finally, stumped him. I’m not sure whether he actually got his daughter the help she needed, but the story stuck with me because my friend’s response is the perfect distillation of the Use/Build Fallacy.

Note that I’m not saying that just because you’re not an expert on something you can’t have an opinion. I may not know how to build a house, but that doesn’t mean I have nothing to say to the contractors if I decide to do some renovations on my house. Not falling prey to the fallacy, though, means that I always keep a healthy respect for the expert in a field—as long as they truly seem to know what they’re talking about. (I hear this from friends who are architects all the time—they get hired by someone to build or renovate a house for them, and then their client spends all their time endlessly arguing with everything they do. Why bother to hire an expert if you think you already know how to do it yourself?)

I try to remember this myself every time I encounter some aspect of some programming language that I don’t like. Right now, I’m neck-deep in C++ code and it’s tempting to spend all my time kvetching about how how horrible a job Bjarne has done over the years. And then I try to remember—even as someone who’s actually built a language—that this stuff is hard. A language of any complexity has a huge number of moving parts, all of which interact with each other in an unpredictable manner. Historical choices can come back to bite you in all sorts of unexpected ways. Oftentimes all you have are a bunch of imperfect choices, and you have to simply pick the least bad of them all. And then you get to sit there and listen to everyone on the sidelines complain about how horrible a job you’ve done and how they could do it so much better than you because, hey, they’ve used a programming language before.

So I try to temper my complaints with a little humility, and remember how much different building is from using.

Why every language needs a language specification…

One of the things that I discovered when I started working on SQL Server is that T-SQL, like VB prior to .NET, has no language specification. This continues to mystify me—how language teams get away without having a language specification for so long. I should probably back up for a moment and explain what I mean when I say “language specification.” I mean a document that:

  • Is public.
  • Is kept up to date.
  • Describes the entire language (syntax, semantics, type system, etc.).
  • Is produced by the team that produces the language itself.

An initial spec that was allowed to lapse doesn’t count, for obvious reasons. Books written by people outside of the team/company don’t count because there’s no way someone who doesn’t live and breath the language every single day can possibly hope to shoot for completeness. Documents that just describe syntax or sort-of describe the language (SQL Server’s Books Online falls into this category) don’t count because often what you need is a holistic description of the language, and for that you need, well, the whole language described. And documents that aren’t public don’t count because what good is a document that no one reads? (Not that many people read language specifications anyway, unless the author’s name happens to be Hejlsberg or Gosling.)

I think the reason that language specifications often don’t get written is that they are a huge amount of work and an incredible pain in the ass. Writing the Visual Basic Language Specification was no mean feat and consumed a whole lot of time that I could have productively spent elsewhere. But I do think that every programming language needs one, for a variety of reasons:

  1. Languages are not algorithms. Although many programs of some sort or another could use a good specifications written down, the reality is that many don’t need one because the code is the specification. For example, I can imagine that there might be some specification written down about how the Excel recalc engine works, but it’s probably not that useful because, well, you could just go look at that code itself if you want to know how the recalc engine works. Most pieces of code can be more easily understood by tracing through the code than by reading a description of them. Languages, however, are a special case—they are an emergent property of the compiler, and although there are various algorithms that contribute to a language (say, the binding rules), much of the most important parts of the language arise from the interplay of the various algorithms. Thus, just having the code often is not really enough to understand what a language is or does.
  2. Specifications keep you honest. And honesty is important when it comes to programming languages. You can cheat like hell when it comes to user interfaces, and most of the time you can get away with it—extraneous menu items or options or notifications or other junk may clutter the UI up a bit but by and large you can deal with it, and you can always “clean up” a UI without too much trouble. You can cheat less with libraries and APIs, but even there you can always come up with a new version of the interface or library and gradually move people. With a language, however, it’s nearly impossible to fix something once it’s in the language. SQL Server’s own deprecation process takes three major releases, which means that a mistake you make in the language will take well over a decade to get rectified, if at all. Having a central language specification helps the team monitor what’s really going in the language and helps keep the team honest about what they’re doing.
  3. Having to explain things forces you to actually try and understand what your language does. I was continually amazed when writing the Visual Basic language specification how superficially I actually understood some features until I tried to write them down and explain them. Features that had been extensively discussed in design meetings and were already prototyped, even. In this way, it’s somewhat analogous to coding—how often I think I understand the solution to a problem until I sit down to write the code and realize how foolish I have been to think I really understood the problem!
  4. You need some kind of institutional memory. It’s all well and good to rely on the one guy who knows everything about everything in the language, but what happens when they retire/quit/move on/get hit by a bus? Now all you’re left with is a mass of language rules and often no idea why things were done one particular way or another. This is not an uncommon problem in general with programming, but it’s even more acute with language design. When I was a young, naïve language designer, there were a number of times when I looked at existing languages and the choices they made and thought to myself, “Boy, is that a dumb design decision.” Then, a couple of years down the road, having ignored some of the wisdom of those who came before me and having run into a brick wall at full speed, I would think, “Oh, I see, that’s why they did that…”

Anyway, I’m not planning on just complaining about this and not doing anything about it. Stay tuned…

You should also follow me on Twitter here.