An Updated Introduction: Seven Years Later (February 9, 2004)
I wrote this document a long time ago to capture the conclusions I had reached from working on application performance for a few years. At the time that I wrote it, the way in which Microsoft dealt with performance in its products was starting to undergo a large-scale transformation. Performance analysis up to that time had largely been an ad-hoc effort and was hampered by a lack of well-known “best practices” and, in some cases, the tools necessary to get the job done. As the transition started from the 16-bit world to the 32-bit world, this kind of approach was clearly insufficient to deal with the increasing popularity of our products and the increasing demands that were being placed on them. The past seven years have seen major changes in the way that performance is integrated into the development process at Microsoft, and many of the “rules” that I outline below have become internalized into the daily processes of most major groups in the company. Even though a lot of what follows is no longer fresh thinking, I still get requests for the document internally which leads me to believe that there’s still value in saying things that many people already know. So I’ve decided to provide in publicly here, for your edification and in the hope that someone might find it useful. (Historical note: The “major Microsoft application“ that I refer to below was not VB.)
A Short Introduction (May 28, 1997)
In the fall of 1995, I was part of a team working on an upgrade to a major Microsoft application when we noticed that we had a little problem — we were slower than the previous version. Not just a little slower, but shockingly slower. This fact had largely been ignored by the development team (including myself), since we assumed it would improve once we were finished coding up our new features. However, after a while the management team started to get a little worried. E-mails were sent around asking whether the developers could spend some time focusing on performance. Thinking “How hard could this be?” I replied, saying that I’d be happy to own performance. This should be easy, I thought naively, just tweak a few loops, rewrite some brain-dead code, eliminate some unnecessary work and we’ll be back in the black. A full two years (and a lot of man-years worth of work on many people’s part) later, we finally achieved the performance improvements that I was sure would only take me alone a few months to achieve.
Unsurprisingly, I had approached the problem of performance with a lot of assumptions about what I would find and what I would need to do. Most of those assumptions turned out to be dead wrong or, at best, wildly inaccurate. It took many painful months of work to learn how to “really” approach the problem of performance, and that was just the beginning — once I figured out what to do, I discovered that there was a huge amount of work ahead! From that painful experience, I have come up with a set of “Ten Rules of Performance,” intended to help others avoid the errors that I made. So, without further ado…
Rule #1: Don’t assume you know anything.
In the immortal words of Pogo, “We have met the enemy, and he is us.” Your biggest enemy in dealing with performance is by far all the little assumptions about your application you carry around inside your head. Because you designed the code, because you’ve worked on the operating system for years, because you did well in your college CS classes, you’re tempted to believe that you understand how your application works. Well, you don’t. You understand how it’s supposed to work. Unfortunately, performance work deals with how things actually work, which in many cases is completely different. Bugs, design shortcuts and unforeseen cases can all cause computer systems to behave (and execute code) in unexpected, surprising ways. If you want to get anywhere with performance, you must continuously test and re-test all assumptions you have: about the system, about your components, about your code. If you’re content to assume you know what’s going on and never bother to prove you know what’s going on, start getting used to saying the following phrase: “I don’t know what’s wrong… It’s supposed to be fast!”
Rule #2: Never take your eyes off the ball
For most developers, performance exists as an abstract problem at the very beginning of the development cycle and a concrete problem at the very end of the cycle. In between, they’ve got better things to be doing. As a result, typically developers write code in a way that they assume will be fast (breaking rule #1) and then end up scrambling like crazy when the beta feedback comes back that the product is too slow. Of course, by the time the product is in beta there’s no real time to go back and redesign things without slipping, so the best that can be done is usually some simple band-aiding of the problem and praying that everyone is buying faster machines this Christmas.
If you’re serious about performance, you must start thinking about it when you begin designing your code and can only stop thinking about it when the final golden bits have been sent to manufacturing. In between, you must never, ever stop testing, analyzing and working on the performance of your code. Slowness is insidious — it will sneak into your product while you’re not looking. The price of speed is eternal vigilance.
Rule #3: Be afraid of the dark
Part of the reason why development teams find it so easy to ignore performance problems in favor of working on new features (or on bugs) is that rarely, if ever, is there anything to make them sit up and pay attention. Developers notice when the schedule shows them falling behind, and they start to panic when their bug list begins to grow too long. But most teams never have any kind of performance benchmark that can show developers how slow (or fast) things actually are. Instead, most teams thrash around in the dark, randomly addressing performance in an extremely ad hoc way and failing to motivate their developers to do anything about the problems that exist.
One of the most critical elements of a successful performance strategy is a set of reproducible real-world benchmarks run over a long period of time. If the benchmarks are not reproducible or real-world, they are liable to be dismissed by everyone as insignificant. And they must be run over a long period of time (and against previous versions) to give a real level of comparison. Most importantly, they must be run on a typical user’s machine. Usually, coming up with such numbers will be an eye opening experience for you and others on your team. “What do you mean that my feature has slowed down 146% since the previous version?!?” It’s a great motivator and will tell you what you really need to be working on.
Rule #4: Assume things will always get worse
The typical state of affairs in a development team is that the developers are always behind the eight ball. There’s another milestone coming up that you have to get those twenty features done for, and then once that milestone is done there’s another one right around the corner. What gets lost in this rush is the incentive for you to take some time as you go along to make sure that non-critical performance problems are fixed. At the end of milestone 1, your benchmarks may say that your feature is 15% slower but you’ve got a lot of work to do and, hey, it’s only milestone 1! At the end of milestone 2, the benchmarks now tell you your feature is 30% slower, but you’re pushing for an alpha release and you just don’t have time to worry about it. At the end of milestone 3, you’re code complete, pushing for beta and the benchmarks say that your feature is now 90% slower and program management is beginning to freak out. Under pressure, you finally profile the feature and discover the design problems that you started out with back in milestone 1. Only now with beta just weeks away and then a push to RTM, there’s no way you can go back and redesign things from the ground up! Avoid this mistake — always assume that however bad things are now, they’re only going to get worse in the future, so you’d better deal with them now. The longer you wait, the worse it’s going to be for you. It’s true more often than you think.
Rule #5: Problems have to be seen to be believed (or: profile, profile, profile)
Here’s the typical project’s approach to performance: Performance problems are identified. Development goes off, thinks about their design and says “We’ve got it! The problem must be X. If we just do Y, everything will be fixed!” Development goes off and does Y. Surprisingly, the performance problems persist. Development goes off, thinks about their design and says “We’ve got it! The problem must be A. If we just do B, everything will be fixed!” Development goes off and does B. Surprisingly, the performance problems persist. Development goes off… well, you get the idea. It’s amazing how many iterations of this some development groups will go through before they actually admit that they don’t know exactly what’s going on and bother to profile their code to find out. If you can’t point to a profile that shows what’s going on, you can’t say you know what’s wrong.
Every developer needs a good profiler. Even if you don’t deal with performance regularly, I say: Learn it, love it, live it. It’s an invaluable tool in a developer’s toolbox, right up there with a good compiler and debugger. Even if your code is running with acceptable speed, regularly profiling your code can reveal surprising information about it’s actual behavior.
Rule #6: 90% of performance problems are designed in, not coded in
This is a hard rule to swallow because a lot of developers assume that performance problems have more to do with code issues (badly designed loops, etc) than with the overall application design. The sad fact of the matter is that in all but the luckiest groups, most of the big performance problems you’re going to confront are not the nice and tidy kind of issues where someone is doing something really dumb. Instead, it’s going to be extremely difficult to pinpoint situations where several pieces of code are interacting in ways that end up being slow. To solve the problems usually requires a redesign of the way large chunks of your code are structured (very bad) or a redesign of the way several components interact (even worse). And given that most pieces of an application are interrelated these days, a small change in the design of one piece of code may cascade into changes in several other pieces of code. Either way it’s not going to be simple or easy. That’s why you need to diagnose problems as soon as you can and get at them before you’ve piled a lot of code on top of your designs.
Also, don’t fall in love with your code. Most programmers take a justifiable pride in the code that they write and even more pride in the overall design of the code. However, this means that many times when you point out to them that their intellectually beautiful design causes horrendous performance problems and that several changes are going to be needed, they tend not to take it very well. “How can we mar the elegance and undeniable usability of this design?” they ask, horrified, adding that perhaps you should look elsewhere for your performance gains. Don’t fall into this trap. A design that is beautiful but slow is like a Ferrari with the engine of a Yugo — sure, it looks great, but you certainly can’t take it very far. Truly elegant designs are beautiful and fast.
Rule #7: Performance is an iterative process
At this point, you’re probably starting to come to the realization that the rules outlined so far tend to contradict one another. You can’t gauge the performance of a design until you’ve coded it and can profile it. However, if you’ve got a problem, it’s most likely going to be your design, not your code. So, basically, there’s no way to tell how good a design is going to be until it’s too late to do anything about! Not exactly, though. If you take the standard linear model of development (design, code, test, ship), you’re right: it’s impossible to heed all the rules. However, if you look at the development process as being iterative (design, code, test, re-design, re-code, re-test, re-design, re-code, re-test, …, ship), then it becomes possible. You will probably have to go through and test several designs before you reach the “right” one. Look at one of the most performance obsessed companies in the software business: Id Software (producers of the games Doom and Quake). In the development of their 3D display engines (which are super performance critical) they often will go through several entirely different designs per week, rewriting their engine as often as necessary to achieve the result they want. Fortunately, we’re not all that performance sensitive, but if you expect to design your code once and get it right the first time, expect to be more wrong than right.
Rule #8: You’re either part of the solution or part of the problem
This is a simple rule: don’t be lazy. Because we all tend to be very busy people, the reflexive way we deal with difficult problems is to push them off to someone else. If your application is slow, you blame one of your external components and say “it’s their problem.” If you’re one of the external components, you blame the application for using you in a way you didn’t expect and say “it’s their problem.” Or you blame the operating system and say “it’s their problem.” Or you blame the user for doing something stupid and say “it’s their problem.” The problem with this way of dealing with these things is that soon the performance issue (which must be so
lved) is bouncing around like a pinball until it’s lucky enough to land on someone who’s going to say “I don’t care whose problems this is, we’ve got to solve it” and then does. In the end, it doesn’t matter whose fault it is, just that the problem gets fixed. You may be entirely correct that some boneheaded developer on another team caused your performance regression, but if it’s your feature it’s up to you to find a solution. If you think this is unfair, get over it. Our customers don’t blame a particular developer for a performance problem, they blame Microsoft.
Also, don’t live with a mystery. At one point in working on the boot performance of my application, I had done all the right things (profiled, re-designed, re-profiled, etc) but I started getting strange results. My profiles showed that I’d sped boot up by 30%, but the benchmarks we were running showed it had slowed down by 30%. My first instinct was to dismiss the benchmarks as being wrong, but they had been so reliable in the past (see rule #3) that I couldn’t do that. So I was left with a mystery. My second instinct was to ignore this mystery, report that I’d sped the feature up 30% and move on. Fortunately, program management was also reading the benchmark results, so I couldn’t slime out of it that easily. So I was forced to spend a few weeks beating my head against a wall trying to figure out what was going on. In the process, I discovered rule #9 below which explained the mystery. Case closed. I’ve seen many, many developers (including myself on plenty of other occasions) fall into the trap of leaving mysteries unsolved. If you’ve got a mystery, some nagging detail that isn’t quite right, some performance slowdown that you can’t quite explain, don’t be lazy and don’t stop until you’ve solved the mystery. Otherwise you may miss the key to your entire performance puzzle.
Rule #9: It’s the memory, stupid.
As I mentioned above, I reached a point in working on speeding up application boot where my profiles showed that I was 30% faster, but the benchmarks indicated I was 30% slower. After much hair-pulling, I discovered that the profiling method that I had chosen effectively filtered out the time the system spent doing things like faulting memory pages in and flushing dirty pages out to disk. Given that: 1) We faulted a lot of code in from disk to boot, and 2) We allocated a lot of dynamic memory on boot, I was effectively filtering out a huge percentage of the boot time out of the profiles! A flip of a switch and suddenly my profiles were in line with the benchmarks, indicating I had a lot of work to do. This taught me a key to understanding performance, namely that memory pages used are generally much more important than CPU cycles used. Intellectually, this makes sense: while CPU performance has been rapidly increasing every year, the amount of time it takes to access memory chips hasn’t been keeping up. And even worse, the amount of time it takes to access the disk lags even further behind. So if you have really tight boot code that nonetheless causes 1 megabyte of code to be faulted in from the disk, you’re going to be almost entirely gated by the speed of the disk controller, not the CPU. And if you end up using so much memory that the operating system is forced to start paging memory out (and then later forced to start paging it back in), you’re in real trouble.
Rule #10: Don’t do anything unless you absolutely have to
This final rule addresses the most common design error that developers make: doing work that they don’t absolutely have to. Often, developers will initialize structures or allocate resources up front because it simplifies the overall design of the code. And, to a certain degree, this is a good idea if it would be painful to do initialization (or other work) further down the line. But often times this practice leads to doing a huge amount of initialization so that the code is ready to handle all kinds of situations that may or may not occur. If you’re not 100% absolutely sure that a piece of code is going to need to be executed, then don’t execute it! Conversely, when delaying initialization code, be aware of where that work is going to be going. If you move an expensive initialization routine out of one critical feature and into another one, you may not have bought yourself much. It’s a bit of shell game, so be aware of what you’re doing.
Also, keep in mind that memory is as important as code speed, so don’t accumulate state unless you absolutely have to. One of the big reasons why memory is such a problem is the mindset of programmers that CPU cycles should be preserved at all costs. If you calculate a result at one point in your program and you might need it later on elsewhere, programmers automatically stash that result away “in case I need it later.” And in some expensive cases, this is a good idea. However, often times the result is that ten different developers think, “All I need is x bytes to store this value. That’s much cheaper than the cycles it took me to calculate it.” And then soon your application has slowed to a crawl as the memory swaps in and out like crazy. Now not only are you wasting tons of cycles going out to the disk to fetch a page that now would have been (now) much cheaper to calculate, you’re also spending more of your time managing all the state you’ve accumulated. It sounds counterintuitive, but it’s true: recalculate everything on the fly; save only the really expensive, important stuff.
© 1997-2004 Paul Vick