Category Archives: Microsoft

Was .NET a black hole project?

Eric Lippert astutely pointed out a hole that I consciously left in my discussion of black hole projects – namely, that they sometimes succeed. I left that part out because I figured it probably merited a little discussion of its own and I didn’t want to complicate the whole “black hole” narrative.

Eric’s completely correct that you could argue that .NET was a black hole project, albeit one that succeeded. It managed to hit most, if not all, of the bullet points I listed, including the last one (I started work on what would become VB.NET in mid-1998, after all, and some people had been working on it for several months by that point). I distinctly remember the first meeting I had with Brian Harry where he outlined a proposal for what would become the metadata engine in .NET. I thought to myself “this guy is completely crazy” because of his grandiose goals. A few months later, I remember sitting in a co-workers office talking about the emerging .NET strategy. “Do you think this thing will actually fly?” she asked me. “I have no idea.” I replied, “I give it about 50/50 chance. I guess we’ll see.”

However, I disagree with Eric a bit that it’s extremely hard to distinguish between a black hole project that’s going to impode and a black hole project that’s going to succeed. I think there are some traits that distinguish the projects that have succeeded. I’m less certain about them because 4 out of 5 black hole projects have failed, so I have less data to extrapolate from, but here’s my take on it…

The number one, no question about it, don’t leave home without it, trait of succesful black hole projects is:

Despite having grandiose goals, they are fundamentally a response to a serious and concrete competitive threat.

If .NET had started just as some abstract project to change the way that people program, then I’d be willing to wager that it would have gone down in flames at some point. What kept the teams focused and moving forward was the fact that, let’s be honest here, Java was breathing down Microsoft’s neck. Without that serious competitive pressure, odds are pretty decent that the .NET project would have eventually collapsed under its own weight because we wouldn’t have had the kinds of checks and balances that competition imposed on us. But the reality of needing to compete with Java helped keep our eyes on the prize, as it were, and counterbalanced the tendency of black hole projects to spin out of control.

Some other traits of successful projects are:

They tend to be led by people who, while they may be fantastically idealistic, are ultimately realists when push comes to shove.
It also seems to be vitally important that the project leadership be very, very technical. This allows them to stay in touch with what is actually happening in the project rather than having to rely on status reports (which are usually skewed in the optimistic direction) and allows them to ask the hard questions.
Hidden under the absurdly grandiose goals must be some set of real life problems that real life customers need solved.

I think what I’m starting to get at is that the line that divides success from failure is ultimately how in touch with reality the project team is. The Catch-22 is that to make large leaps, you often have to unmoor yourself a bit from reality so that you can see beyond what is into what could be. But once you’ve unmoored yourself from reality, it’s easy to go all the way and completely lose touch with reality. That’s the point at which things start to spiral out of control. The projects that succeed, in turn, manage to keep one foot in reality at all times. This keeps them from going absolutely insane. And, I think, it’s not as hard as it seems to get a sense of whether a particular project has one foot in reality or not. Usually, all you have to do is talk to a cross-section of the people working on it. They’ll know pretty well what the state of the project is.

I should close by saying my intention wasn’t to knock ambitious projects. I think they’re totally necessary to help our industry make those big leaps forward. It’s more of a question of how people go about doing ambitious projects that I sometimes have a problem with. Projects that think big but try as best they can to stay practical are really the best kind of projects to work on. I can’t say that every moment of working on .NET has been fun, but overall it’s been a wonderful experience and one that I’m very glad I got to be a part of…

Black hole projects

46 Replies

After hearing about a product named Netdoc from Scoble, Steve Maine takes the opportunity to reminsce about a similarly code-named project at Microsoft (that has nothing to do with the new product). He says:

The name “Netdocs” reminds me of my experience as an intern at MS in 2000. There was this mythical project codenamed “Netdocs”, and it was a black hole into which entire teams disappeared. I had several intern friends who got transferred to the Netdocs team and were never heard from again. Everyone knew that Netdocs was huge and that there were a ton of people working on it, but nobody had any idea what the project actually did.

I left Office just about the time that Netdocs really started going, but I do know a few people who invested quite a few years of their lives into it. I can’t say that I know much more than Steve about it, but it did get me thinking about other “black hole projects” at Microsoft. There was one I was very close to earlier in my career that I managed not to get myself sucked into and several others that I just watched from afar. None I can really talk about since they never saw the light of day, but it did get me thinking about the peculiar traits of a black hole project. They seem to be:

They must have absurdly grandiose goals. Something like “fundamentally reimagine the way that people work with computers.” Nobody, including the people who originate the goals, has a clear idea what the goals actually mean.
They must involve throwing out some large existing codebase and rewriting everything from scratch, “the right way, this time.”
They must have completely unrealistic deadlines. Usually this is because they believe that they can rewrite the original codebase in much, much less time than it took to write that codebase in the first place.
They must have completely unrealistic beliefs about compatibility. Usually this takes the form of believing you can rewrite a huge codebase and preserve all of the little quirks and such without a massive amount of extra effort.
They are always “six months” from from major deadline that never seems to arrive. Or, if it does arrive, another milestone is added on to the end of the project to compensate.
They must consume huge amounts of resources, sucking the lifeblood out of one or more established products that make significant amounts of money or have significant marketshare.
They must take over any group that does anything that relates to their absurdly broad goals, especially if that group is small, focused, has modest goals and actually has a hope of shipping in a reasonable timeframe.
They must be prominently featured as demos at several company meetings, to the point where people groan “Oh, god, not another demo of this thing. When is it ever going to ship?”
They usually are prominently talked up by BillG publicly years before shipping/dying a quiet death.
They usually involve “componetizing” some monolithic application or system. This means that not only are you rewriting a huge amount of code, you’re also splitting it up across one or more teams that have to all seamlessly work together.
As a result of the previous point, they also usually involve absolutely massive integration problems as different teams try madly to get their components working with each other.
They usually involve rewriting the application or system on top of brand-new technology that has not been proven at a large scale yet. As such, they get to flush out all the scalability problems with the new technology.
They are usually led by one or more Captain Ahabs, madly pursuing the white whale with absolute conviction, while the deckhands stand around saying “Gee, that whale looks awfully big. I’m not sure we can really take him down.”
Finally, 90% of the time, they must fail and die a flaming death, possibly taking down or damaging other products with it. If they do ship, they must have taken at least 4-5 years to ship and be at least 2 years overdue.

I’m kind of frightened at how easy it was to come up with this list – it all just kind of poured out. Looking back over 12.5 years at Microsoft, I’m also kind of frightened at how many projects this describes. Including some projects that are ongoing at the moment…

]]>

Drafting Edit and Continue

6 Replies

As I think the entire blog universe knows by now, C# is going to be adding Edit and Continue in their 2005 version. As I’ve said many times before, I think Edit and Continue is a great feature, so I’m happy to see another language on board with it. Of course, there’s a selfish part of me that would have liked to see the C# team continue to ignore such a great feature and leave it completely to us, but what’s good for the goose…

What’s interesting about this, though, is that it points up one of the practical reasons why having three .NET languages at Microsoft is a Good Thing(tm). As evidenced by the fact that the C# team didn’t originally plan to do Edit and Continue, if there had only ever been C# at Microsoft, Edit and Continue would probably never have happened. The CLR team only added the underlying support for Edit and Continue because VB insisted on it and then sat down with them and worked out how it was going to happen. Similarly, I don’t know if generics would have ever happened if there had only ever been VB at Microsoft, even though I think it’s going to be an excellent feature for our customers. Lacking the C++ history, it’s something that we probably never would have pushed the CLR team to implement the way the C# team did.

It occurs to me that this is an extension of Rocky’s excellent rant on programming languages, just flipped around. His point is that programmers who limit themselves to knowing just one language limit their ability to think about programming. I think the same goes for the Microsoft.

What I find really interesting is the way that the language teams end up drafting one another as we go forward. Drafting is the phenomenon mostly commonly used in cycling where a cyclist can maintain a higher speed with less energy when following in the backdraft of the cyclist ahead of him. (A better explanation can be found here.) A lot of cycling strategy involves members of the team riding ahead of other team members (say, Lance Armstrong) so that the following rider can conserve energy for the final push at the end of the race. Interestingly, drafting can also help the lead cyclist as well, although I lack sufficient physics knowledge to really explain that.

Anyway, the point of this whole digression is that a very similar thing happens inside of Microsoft as well. When the C# team decided to implement Edit and Continue, it was easier for them to do so than it had been for the VB team because the VB team had already worked out a lot of the details (and a lot of the kinks in the system) beforehand. Similarly, the C# team started implementing generics before VB did, so they ended up working out a lot of details for us ahead of time. In the end, each team takes advantage of the work the other team is doing to advance their own language more rapidly. It’s a very virtuous cycle.

Of course, a lot of the people who don’t like Edit and Continue are the same people who don’t think that Microsoft shouldn’t bother to have more than one language (read: C# is the “one true language”). So instead of drafting, they probably think of pollution instead. Ah well, to each their own…

Yes, coding is a zero-sum game, Robert…

8 Replies

Scoble questions whether greater community engagement really takes away from time that developers have to spend on other things like fixing bugs. I would have to agree with John Cavnar-Johnson on this one: from personal experience, blogging and other community engagement like the newsgroups does suck time away from other activities that I could be pursuing like designing new features, writing new code or fixing bugs. I don’t think Scoble is totally off base with his “all work and no play makes Jack a very dull boy” thesis, but I think that that model only really applies to the creative end of development which is only one part of the overall work it takes to get a product out of the door. A huge chunk of the product development cycle is, indeed, one-step-after-another work where you just have to get the damn work done. And if you’re busy blogging, you’re not doing that.

That being said, I am reminded of the famous Churchill quote, “Democracy is the worst form of government except for all the other forms that have been tried.” As in, yeah, it sucks that community engagement takes time away from other things, but what’s the better alternative? Robert has it right that community engagement provides valuable insight to both us and the community, and so even if it’s a zero-sum game in terms of developer time, I still think everyone comes out ahead.

Which, I suppose, was Robert’s original point. OK, I guess it’s time for me to go back to doing something useful…

The digital divide (Or, “right clicking is hard”)…

6 Replies

In my previous entry on the relative uselessness of the status bar, I got a bit of flack in the comments from people who find the status bar extremely useful. In fact, I’m one of those people – I regularly use the status bar of IE to figure out the URL of a link that I’m hovering over. And I use the handy status bar functions (like Count, Sum, etc) in Excel all the time. But the point is that you and I are not typical. I don’t think it’s bad at all to put stuff in the status bar for advanced users – it’s just bad (as people are wont to do) to put stuff in the status bar that the average user really might want to know. There’s a big difference between the advanced users and average users sometimes.

Which reminds me of another funny story from my Access days. A favorite place for developers to stick important things in applications is right click menus. “Hmmm, we don’t want to clutter the menus up with this, why don’t we put it on a right click menu?” During the Access 2.0 cycle, one of our PMs (who shall remain nameless) started having this weird problem. Every once in a while, she would be using Access 2.0 and her forms would just stop responding. They wouldn’t hang, per se, they would just not accept any more mouse clicks or keyboard presses. The poor developer who owned the forms engine, Peter, couldn’t figure out what was wrong. He looked extensively at the code to see if he could suss out what the problem might be, but no luck. I believe he even tried instrumenting the PM’s version of Access to see if he could isolate the problem. Finally, one day he was sitting in her office watching her work when it happened, and he just happened to figure out the problem. You see, the PM had a very slight hand coordination problem – when she went to right click the mouse, she would occasionally have an involuntary movement of the adjacent fingers that would cause the left mouse button to be pressed at the same time. And when a left click and a right click message came in together at just the right time, the form would freeze up. (Access wasn’t the only Microsoft product to have this problem.) Peter tracked down the problem and fixed it.

The amusing postscript to this story is that another developer on Access, Cameron (not Beccario), heard about this and thought it would be funny to play a trick on the poor PM. So he wrote a little Access macro that would put up some funny message box every time she pressed both mouse buttons together. He installed it on her computer over the weekend and figured she’d find it sometime later that week. When he got into his office late Monday morning, he had a couple of irate voice mail messages from the PM saying “what the hell did you do to my computer?” Apparently, this finger twitch was not that uncommon…

Anyway, the moral of the story, such as it is, is that things us advanced users take for granted can often pose problems for the regular users….

Status bar == almost entirely superfluous

15 Replies

Raymond has a short musing about why it’s not a good idea to just punt on difficult questions and ask the user. Makes me think of the good old days on Access – I seem to remember one usability person saying that some non-trivial percentage of users always reacted to dialog boxes by immediately hitting Return, thus chosing the default option. As a programmer writing UI at the time, I found that fact extremely frightening given the number of dialog boxes I had authored.

But what this really makes me think of is a usability test they did on Access one day to see how effective text placed in the status bar was. The test went like this: the user was given some task to do in Access. Unbeknownst to them, we’d stuck a message in the status bar that read “If you notice this message and tell the usability lead, we will give you $15.” Want to guess how many people got the $15? Zero. After that, we were careful not to put any important information down in the status bar, because it was 100% likely that no one would ever see it.

Instead, the status bar is just a nice little waste of screen real estate where we can put cute, innocuous little messages like “Done” (Internet Exporer at the moment) or “Ready” (Visual Studio at the moment).

MVP Summit

A requiem for easter eggs…

7 Replies

Tom rebuts Jeremy Mazner’s lament for the disappearance of easter eggs. Ultimately, I think most easter eggs are the equivalent of stories about college exploits: they’re only interesting only to the people who were involved and deathly boring to everyone else. Sure, there is the occasionally clever or humorous easter egg, but most serve no purpose to anybody except as a little ego trip.

I say this knowing full well that I wrote several easter eggs for Access before the prohibition on easter eggs went in. I even wrote one that I thought was somewhat clever: a Magic Eight Ball easter egg. The problem was, I left the team and within several versions the Magic Eight Ball had turned into the Crashing Magic Eight Ball.

I don’t think losing easter eggs is a great loss, personally… (Although those Excel guys were always pretty damn impressive.)

The Ten Rules of Performance

47 Replies

An Updated Introduction: Seven Years Later (February 9, 2004)

I wrote this document a long time ago to capture the conclusions I had reached from working on application performance for a few years. At the time that I wrote it, the way in which Microsoft dealt with performance in its products was starting to undergo a large-scale transformation. Performance analysis up to that time had largely been an ad-hoc effort and was hampered by a lack of well-known “best practices” and, in some cases, the tools necessary to get the job done. As the transition started from the 16-bit world to the 32-bit world, this kind of approach was clearly insufficient to deal with the increasing popularity of our products and the increasing demands that were being placed on them. The past seven years have seen major changes in the way that performance is integrated into the development process at Microsoft, and many of the “rules” that I outline below have become internalized into the daily processes of most major groups in the company. Even though a lot of what follows is no longer fresh thinking, I still get requests for the document internally which leads me to believe that there’s still value in saying things that many people already know. So I’ve decided to provide in publicly here, for your edification and in the hope that someone might find it useful. (Historical note: The “major Microsoft application“ that I refer to below was not VB.)

A Short Introduction (May 28, 1997)

In the fall of 1995, I was part of a team working on an upgrade to a major Microsoft application when we noticed that we had a little problem — we were slower than the previous version. Not just a little slower, but shockingly slower. This fact had largely been ignored by the development team (including myself), since we assumed it would improve once we were finished coding up our new features. However, after a while the management team started to get a little worried. E-mails were sent around asking whether the developers could spend some time focusing on performance. Thinking “How hard could this be?” I replied, saying that I’d be happy to own performance. This should be easy, I thought naively, just tweak a few loops, rewrite some brain-dead code, eliminate some unnecessary work and we’ll be back in the black. A full two years (and a lot of man-years worth of work on many people’s part) later, we finally achieved the performance improvements that I was sure would only take me alone a few months to achieve.

Unsurprisingly, I had approached the problem of performance with a lot of assumptions about what I would find and what I would need to do. Most of those assumptions turned out to be dead wrong or, at best, wildly inaccurate. It took many painful months of work to learn how to “really” approach the problem of performance, and that was just the beginning — once I figured out what to do, I discovered that there was a huge amount of work ahead! From that painful experience, I have come up with a set of “Ten Rules of Performance,” intended to help others avoid the errors that I made. So, without further ado…

The Rules:

Rule #1: Don’t assume you know anything.

In the immortal words of Pogo, “We have met the enemy, and he is us.” Your biggest enemy in dealing with performance is by far all the little assumptions about your application you carry around inside your head. Because you designed the code, because you’ve worked on the operating system for years, because you did well in your college CS classes, you’re tempted to believe that you understand how your application works. Well, you don’t. You understand how it’s supposed to work. Unfortunately, performance work deals with how things actually work, which in many cases is completely different. Bugs, design shortcuts and unforeseen cases can all cause computer systems to behave (and execute code) in unexpected, surprising ways. If you want to get anywhere with performance, you must continuously test and re-test all assumptions you have: about the system, about your components, about your code. If you’re content to assume you know what’s going on and never bother to prove you know what’s going on, start getting used to saying the following phrase: “I don’t know what’s wrong… It’s supposed to be fast!”

Rule #2: Never take your eyes off the ball

For most developers, performance exists as an abstract problem at the very beginning of the development cycle and a concrete problem at the very end of the cycle. In between, they’ve got better things to be doing. As a result, typically developers write code in a way that they assume will be fast (breaking rule #1) and then end up scrambling like crazy when the beta feedback comes back that the product is too slow. Of course, by the time the product is in beta there’s no real time to go back and redesign things without slipping, so the best that can be done is usually some simple band-aiding of the problem and praying that everyone is buying faster machines this Christmas.

If you’re serious about performance, you must start thinking about it when you begin designing your code and can only stop thinking about it when the final golden bits have been sent to manufacturing. In between, you must never, ever stop testing, analyzing and working on the performance of your code. Slowness is insidious — it will sneak into your product while you’re not looking. The price of speed is eternal vigilance.

Rule #3: Be afraid of the dark

Part of the reason why development teams find it so easy to ignore performance problems in favor of working on new features (or on bugs) is that rarely, if ever, is there anything to make them sit up and pay attention. Developers notice when the schedule shows them falling behind, and they start to panic when their bug list begins to grow too long. But most teams never have any kind of performance benchmark that can show developers how slow (or fast) things actually are. Instead, most teams thrash around in the dark, randomly addressing performance in an extremely ad hoc way and failing to motivate their developers to do anything about the problems that exist.

One of the most critical elements of a successful performance strategy is a set of reproducible real-world benchmarks run over a long period of time. If the benchmarks are not reproducible or real-world, they are liable to be dismissed by everyone as insignificant. And they must be run over a long period of time (and against previous versions) to give a real level of comparison. Most importantly, they must be run on a typical user’s machine. Usually, coming up with such numbers will be an eye opening experience for you and others on your team. “What do you mean that my feature has slowed down 146% since the previous version?!?” It’s a great motivator and will tell you what you really need to be working on.

Rule #4: Assume things will always get worse

The typical state of affairs in a development team is that the developers are always behind the eight ball. There’s another milestone coming up that you have to get those twenty features done for, and then once that milestone is done there’s another one right around the corner. What gets lost in this rush is the incentive for you to take some time as you go along to make sure that non-critical performance problems are fixed. At the end of milestone 1, your benchmarks may say that your feature is 15% slower but you’ve got a lot of work to do and, hey, it’s only milestone 1! At the end of milestone 2, the benchmarks now tell you your feature is 30% slower, but you’re pushing for an alpha release and you just don’t have time to worry about it. At the end of milestone 3, you’re code complete, pushing for beta and the benchmarks say that your feature is now 90% slower and program management is beginning to freak out. Under pressure, you finally profile the feature and discover the design problems that you started out with back in milestone 1. Only now with beta just weeks away and then a push to RTM, there’s no way you can go back and redesign things from the ground up! Avoid this mistake — always assume that however bad things are now, they’re only going to get worse in the future, so you’d better deal with them now. The longer you wait, the worse it’s going to be for you. It’s true more often than you think.

Rule #5: Problems have to be seen to be believed (or: profile, profile, profile)

Here’s the typical project’s approach to performance: Performance problems are identified. Development goes off, thinks about their design and says “We’ve got it! The problem must be X. If we just do Y, everything will be fixed!” Development goes off and does Y. Surprisingly, the performance problems persist. Development goes off, thinks about their design and says “We’ve got it! The problem must be A. If we just do B, everything will be fixed!” Development goes off and does B. Surprisingly, the performance problems persist. Development goes off… well, you get the idea. It’s amazing how many iterations of this some development groups will go through before they actually admit that they don’t know exactly what’s going on and bother to profile their code to find out. If you can’t point to a profile that shows what’s going on, you can’t say you know what’s wrong.

Every developer needs a good profiler. Even if you don’t deal with performance regularly, I say: Learn it, love it, live it. It’s an invaluable tool in a developer’s toolbox, right up there with a good compiler and debugger. Even if your code is running with acceptable speed, regularly profiling your code can reveal surprising information about it’s actual behavior.

Rule #6: 90% of performance problems are designed in, not coded in

This is a hard rule to swallow because a lot of developers assume that performance problems have more to do with code issues (badly designed loops, etc) than with the overall application design. The sad fact of the matter is that in all but the luckiest groups, most of the big performance problems you’re going to confront are not the nice and tidy kind of issues where someone is doing something really dumb. Instead, it’s going to be extremely difficult to pinpoint situations where several pieces of code are interacting in ways that end up being slow. To solve the problems usually requires a redesign of the way large chunks of your code are structured (very bad) or a redesign of the way several components interact (even worse). And given that most pieces of an application are interrelated these days, a small change in the design of one piece of code may cascade into changes in several other pieces of code. Either way it’s not going to be simple or easy. That’s why you need to diagnose problems as soon as you can and get at them before you’ve piled a lot of code on top of your designs.

Also, don’t fall in love with your code. Most programmers take a justifiable pride in the code that they write and even more pride in the overall design of the code. However, this means that many times when you point out to them that their intellectually beautiful design causes horrendous performance problems and that several changes are going to be needed, they tend not to take it very well. “How can we mar the elegance and undeniable usability of this design?” they ask, horrified, adding that perhaps you should look elsewhere for your performance gains. Don’t fall into this trap. A design that is beautiful but slow is like a Ferrari with the engine of a Yugo — sure, it looks great, but you certainly can’t take it very far. Truly elegant designs are beautiful and fast.

Rule #7: Performance is an iterative process

At this point, you’re probably starting to come to the realization that the rules outlined so far tend to contradict one another. You can’t gauge the performance of a design until you’ve coded it and can profile it. However, if you’ve got a problem, it’s most likely going to be your design, not your code. So, basically, there’s no way to tell how good a design is going to be until it’s too late to do anything about! Not exactly, though. If you take the standard linear model of development (design, code, test, ship), you’re right: it’s impossible to heed all the rules. However, if you look at the development process as being iterative (design, code, test, re-design, re-code, re-test, re-design, re-code, re-test, …, ship), then it becomes possible. You will probably have to go through and test several designs before you reach the “right” one. Look at one of the most performance obsessed companies in the software business: Id Software (producers of the games Doom and Quake). In the development of their 3D display engines (which are super performance critical) they often will go through several entirely different designs per week, rewriting their engine as often as necessary to achieve the result they want. Fortunately, we’re not all that performance sensitive, but if you expect to design your code once and get it right the first time, expect to be more wrong than right.

Rule #8: You’re either part of the solution or part of the problem

This is a simple rule: don’t be lazy. Because we all tend to be very busy people, the reflexive way we deal with difficult problems is to push them off to someone else. If your application is slow, you blame one of your external components and say “it’s their problem.” If you’re one of the external components, you blame the application for using you in a way you didn’t expect and say “it’s their problem.” Or you blame the operating system and say “it’s their problem.” Or you blame the user for doing something stupid and say “it’s their problem.” The problem with this way of dealing with these things is that soon the performance issue (which must be so
lved) is bouncing around like a pinball until it’s lucky enough to land on someone who’s going to say “I don’t care whose problems this is, we’ve got to solve it” and then does. In the end, it doesn’t matter whose fault it is, just that the problem gets fixed. You may be entirely correct that some boneheaded developer on another team caused your performance regression, but if it’s your feature it’s up to you to find a solution. If you think this is unfair, get over it. Our customers don’t blame a particular developer for a performance problem, they blame Microsoft.

Also, don’t live with a mystery. At one point in working on the boot performance of my application, I had done all the right things (profiled, re-designed, re-profiled, etc) but I started getting strange results. My profiles showed that I’d sped boot up by 30%, but the benchmarks we were running showed it had slowed down by 30%. My first instinct was to dismiss the benchmarks as being wrong, but they had been so reliable in the past (see rule #3) that I couldn’t do that. So I was left with a mystery. My second instinct was to ignore this mystery, report that I’d sped the feature up 30% and move on. Fortunately, program management was also reading the benchmark results, so I couldn’t slime out of it that easily. So I was forced to spend a few weeks beating my head against a wall trying to figure out what was going on. In the process, I discovered rule #9 below which explained the mystery. Case closed. I’ve seen many, many developers (including myself on plenty of other occasions) fall into the trap of leaving mysteries unsolved. If you’ve got a mystery, some nagging detail that isn’t quite right, some performance slowdown that you can’t quite explain, don’t be lazy and don’t stop until you’ve solved the mystery. Otherwise you may miss the key to your entire performance puzzle.

Rule #9: It’s the memory, stupid.

As I mentioned above, I reached a point in working on speeding up application boot where my profiles showed that I was 30% faster, but the benchmarks indicated I was 30% slower. After much hair-pulling, I discovered that the profiling method that I had chosen effectively filtered out the time the system spent doing things like faulting memory pages in and flushing dirty pages out to disk. Given that: 1) We faulted a lot of code in from disk to boot, and 2) We allocated a lot of dynamic memory on boot, I was effectively filtering out a huge percentage of the boot time out of the profiles! A flip of a switch and suddenly my profiles were in line with the benchmarks, indicating I had a lot of work to do. This taught me a key to understanding performance, namely that memory pages used are generally much more important than CPU cycles used. Intellectually, this makes sense: while CPU performance has been rapidly increasing every year, the amount of time it takes to access memory chips hasn’t been keeping up. And even worse, the amount of time it takes to access the disk lags even further behind. So if you have really tight boot code that nonetheless causes 1 megabyte of code to be faulted in from the disk, you’re going to be almost entirely gated by the speed of the disk controller, not the CPU. And if you end up using so much memory that the operating system is forced to start paging memory out (and then later forced to start paging it back in), you’re in real trouble.

Rule #10: Don’t do anything unless you absolutely have to

This final rule addresses the most common design error that developers make: doing work that they don’t absolutely have to. Often, developers will initialize structures or allocate resources up front because it simplifies the overall design of the code. And, to a certain degree, this is a good idea if it would be painful to do initialization (or other work) further down the line. But often times this practice leads to doing a huge amount of initialization so that the code is ready to handle all kinds of situations that may or may not occur. If you’re not 100% absolutely sure that a piece of code is going to need to be executed, then don’t execute it! Conversely, when delaying initialization code, be aware of where that work is going to be going. If you move an expensive initialization routine out of one critical feature and into another one, you may not have bought yourself much. It’s a bit of shell game, so be aware of what you’re doing.

Also, keep in mind that memory is as important as code speed, so don’t accumulate state unless you absolutely have to. One of the big reasons why memory is such a problem is the mindset of programmers that CPU cycles should be preserved at all costs. If you calculate a result at one point in your program and you might need it later on elsewhere, programmers automatically stash that result away “in case I need it later.” And in some expensive cases, this is a good idea. However, often times the result is that ten different developers think, “All I need is x bytes to store this value. That’s much cheaper than the cycles it took me to calculate it.” And then soon your application has slowed to a crawl as the memory swaps in and out like crazy. Now not only are you wasting tons of cycles going out to the disk to fetch a page that now would have been (now) much cheaper to calculate, you’re also spending more of your time managing all the state you’ve accumulated. It sounds counterintuitive, but it’s true: recalculate everything on the fly; save only the really expensive, important stuff.

Ship gifts

2 Replies

As I prepare to do my final packing for Africa (weight limit: 33 lbs for 4 weeks, ug), I’m reminded of the quaint tradition of ship gifts at Microsoft and wonder again why ship gifts tend to be so bad. Ship gifts, in case it isn’t obvious, are gifts that the company gives a development team when a product finally makes it out the door.

I’m reminded of this because the duffel bag that I’m packing is pretty much the only ship gift that I’ve gotten in nearly 12 years at Microsoft that I think is actually worth a damn. As a gift for shipping Access ’97, everyone got these really nice High Sierra duffel bags with a tasteful “Access’ 97” sewn into the bag. The duffel is really great — besides being rugged, it also expands from normal size up to a bigger size in case you buy stuff on your trip or need a little extra space. I’ve taken the bag all kinds of places and it was my primary bag on my last really long trip, my monthlong honeymoon to Spain four years ago.

In comparison, the 8 or 9 other ship gifts have either been entirely forgotten or were so cheap and/or useless that they’ve been tossed out. I know, I know, the old adage about looking a gift horse in the mouth applies here, and it’s not like Microsoft doesn’t pay me a living wage. So what am I complaining about? Mostly the waste. I mean, for shipping Access ’95, we were given a pretty decent letterman’s jacket that was marred by, excuse my French, a butt-ugly Access logo that I think was supposed to be purple but ended up looking more pink. Some sewing minded coworkers managed to get the logo patch off without destroying the jacket, but who’s got the time (or the equipment for that matter)? Instead, it’s been taking up space in my closet for the past eight years because I can’t bring myself to throw it out or give it away.

In a lot of ways, ship gifts are just a part of the love affair that corporate America as a whole has with cheap, useless and/or ugly swag. And I do think most of the time it’s just a huge waste, fueled by everyone’s love of getting something, even a crappy something, for nothing. (And based on my own behavior at trade shows, this is the pot calling the kettle black.) I think we’d all be a lot better off either getting high quality useful stuff that we’re not going to toss or leave in the hall closet for years or just not getting anything at all. It’s not like we need most of this stuff.

But, then, that’s just my $0.02…

Panopticon Central

a blog on programming languageson programming languages, the tech industry, and other stuff…