An Updated Introduction: Seven Years Later (February 9, 2004)
I wrote this document a long time ago to capture the conclusions I had reached from working on application performance for a few years. At the time that I wrote it, the way in which Microsoft dealt with performance in its products was starting to undergo a large-scale transformation. Performance analysis up to that time had largely been an ad-hoc effort and was hampered by a lack of well-known “best practices” and, in some cases, the tools necessary to get the job done. As the transition started from the 16-bit world to the 32-bit world, this kind of approach was clearly insufficient to deal with the increasing popularity of our products and the increasing demands that were being placed on them. The past seven years have seen major changes in the way that performance is integrated into the development process at Microsoft, and many of the “rules” that I outline below have become internalized into the daily processes of most major groups in the company. Even though a lot of what follows is no longer fresh thinking, I still get requests for the document internally which leads me to believe that there’s still value in saying things that many people already know. So I’ve decided to provide in publicly here, for your edification and in the hope that someone might find it useful. (Historical note: The “major Microsoft application“ that I refer to below was not VB.)
A Short Introduction (May 28, 1997)
In the fall of 1995, I was part of a team working on an upgrade to a major Microsoft application when we noticed that we had a little problem — we were slower than the previous version. Not just a little slower, but shockingly slower. This fact had largely been ignored by the development team (including myself), since we assumed it would improve once we were finished coding up our new features. However, after a while the management team started to get a little worried. E-mails were sent around asking whether the developers could spend some time focusing on performance. Thinking “How hard could this be?” I replied, saying that I’d be happy to own performance. This should be easy, I thought naively, just tweak a few loops, rewrite some brain-dead code, eliminate some unnecessary work and we’ll be back in the black. A full two years (and a lot of man-years worth of work on many people’s part) later, we finally achieved the performance improvements that I was sure would only take me alone a few months to achieve.
Unsurprisingly, I had approached the problem of performance with a lot of assumptions about what I would find and what I would need to do. Most of those assumptions turned out to be dead wrong or, at best, wildly inaccurate. It took many painful months of work to learn how to “really” approach the problem of performance, and that was just the beginning — once I figured out what to do, I discovered that there was a huge amount of work ahead! From that painful experience, I have come up with a set of “Ten Rules of Performance,” intended to help others avoid the errors that I made. So, without further ado…
The Rules:
Rule #1: Don’t assume you know anything.
In the immortal words of Pogo, “We have met the enemy, and he is us.” Your biggest enemy in dealing with performance is by far all the little assumptions about your application you carry around inside your head. Because you designed the code, because you’ve worked on the operating system for years, because you did well in your college CS classes, you’re tempted to believe that you understand how your application works. Well, you don’t. You understand how it’s supposed to work. Unfortunately, performance work deals with how things actually work, which in many cases is completely different. Bugs, design shortcuts and unforeseen cases can all cause computer systems to behave (and execute code) in unexpected, surprising ways. If you want to get anywhere with performance, you must continuously test and re-test all assumptions you have: about the system, about your components, about your code. If you’re content to assume you know what’s going on and never bother to prove you know what’s going on, start getting used to saying the following phrase: “I don’t know what’s wrong… It’s supposed to be fast!”
Rule #2: Never take your eyes off the ball
For most developers, performance exists as an abstract problem at the very beginning of the development cycle and a concrete problem at the very end of the cycle. In between, they’ve got better things to be doing. As a result, typically developers write code in a way that they assume will be fast (breaking rule #1) and then end up scrambling like crazy when the beta feedback comes back that the product is too slow. Of course, by the time the product is in beta there’s no real time to go back and redesign things without slipping, so the best that can be done is usually some simple band-aiding of the problem and praying that everyone is buying faster machines this Christmas.
If you’re serious about performance, you must start thinking about it when you begin designing your code and can only stop thinking about it when the final golden bits have been sent to manufacturing. In between, you must never, ever stop testing, analyzing and working on the performance of your code. Slowness is insidious — it will sneak into your product while you’re not looking. The price of speed is eternal vigilance.
Rule #3: Be afraid of the dark
Part of the reason why development teams find it so easy to ignore performance problems in favor of working on new features (or on bugs) is that rarely, if ever, is there anything to make them sit up and pay attention. Developers notice when the schedule shows them falling behind, and they start to panic when their bug list begins to grow too long. But most teams never have any kind of performance benchmark that can show developers how slow (or fast) things actually are. Instead, most teams thrash around in the dark, randomly addressing performance in an extremely ad hoc way and failing to motivate their developers to do anything about the problems that exist.
One of the most critical elements of a successful performance strategy is a set of reproducible real-world benchmarks run over a long period of time. If the benchmarks are not reproducible or real-world, they are liable to be dismissed by everyone as insignificant. And they must be run over a long period of time (and against previous versions) to give a real level of comparison. Most importantly, they must be run on a typical user’s machine. Usually, coming up with such numbers will be an eye opening experience for you and others on your team. “What do you mean that my feature has slowed down 146% since the previous version?!?” It’s a great motivator and will tell you what you really need to be working on.
Rule #4: Assume things will always get worse
DIV>
The typical state of affairs in a development team is that the developers are always behind the eight ball. There’s another milestone coming up that you have to get those twenty features done for, and then once that milestone is done there’s another one right around the corner. What gets lost in this rush is the incentive for you to take some time as you go along to make sure that non-critical performance problems are fixed. At the end of milestone 1, your benchmarks may say that your feature is 15% slower but you’ve got a lot of work to do and, hey, it’s only milestone 1! At the end of milestone 2, the benchmarks now tell you your feature is 30% slower, but you’re pushing for an alpha release and you just don’t have time to worry about it. At the end of milestone 3, you’re code complete, pushing for beta and the benchmarks say that your feature is now 90% slower and program management is beginning to freak out. Under pressure, you finally profile the feature and discover the design problems that you started out with back in milestone 1. Only now with beta just weeks away and then a push to RTM, there’s no way you can go back and redesign things from the ground up! Avoid this mistake — always assume that however bad things are now, they’re only going to get worse in the future, so you’d better deal with them now. The longer you wait, the worse it’s going to be for you. It’s true more often than you think.
Rule #5: Problems have to be seen to be believed (or: profile, profile, profile)
Here’s the typical project’s approach to performance: Performance problems are identified. Development goes off, thinks about their design and says “We’ve got it! The problem must be X. If we just do Y, everything will be fixed!” Development goes off and does Y. Surprisingly, the performance problems persist. Development goes off, thinks about their design and says “We’ve got it! The problem must be A. If we just do B, everything will be fixed!” Development goes off and does B. Surprisingly, the performance problems persist. Development goes off… well, you get the idea. It’s amazing how many iterations of this some development groups will go through before they actually admit that they don’t know exactly what’s going on and bother to profile their code to find out. If you can’t point to a profile that shows what’s going on, you can’t say you know what’s wrong.
Every developer needs a good profiler. Even if you don’t deal with performance regularly, I say: Learn it, love it, live it. It’s an invaluable tool in a developer’s toolbox, right up there with a good compiler and debugger. Even if your code is running with acceptable speed, regularly profiling your code can reveal surprising information about it’s actual behavior.
Rule #6: 90% of performance problems are designed in, not coded in
This is a hard rule to swallow because a lot of developers assume that performance problems have more to do with code issues (badly designed loops, etc) than with the overall application design. The sad fact of the matter is that in all but the luckiest groups, most of the big performance problems you’re going to confront are not the nice and tidy kind of issues where someone is doing something really dumb. Instead, it’s going to be extremely difficult to pinpoint situations where several pieces of code are interacting in ways that end up being slow. To solve the problems usually requires a redesign of the way large chunks of your code are structured (very bad) or a redesign of the way several components interact (even worse). And given that most pieces of an application are interrelated these days, a small change in the design of one piece of code may cascade into changes in several other pieces of code. Either way it’s not going to be simple or easy. That’s why you need to diagnose problems as soon as you can and get at them before you’ve piled a lot of code on top of your designs.
Also, don’t fall in love with your code. Most programmers take a justifiable pride in the code that they write and even more pride in the overall design of the code. However, this means that many times when you point out to them that their intellectually beautiful design causes horrendous performance problems and that several changes are going to be needed, they tend not to take it very well. “How can we mar the elegance and undeniable usability of this design?” they ask, horrified, adding that perhaps you should look elsewhere for your performance gains. Don’t fall into this trap. A design that is beautiful but slow is like a Ferrari with the engine of a Yugo — sure, it looks great, but you certainly can’t take it very far. Truly elegant designs are beautiful and fast.
Rule #7: Performance is an iterative process
At this point, you’re probably starting to come to the realization that the rules outlined so far tend to contradict one another. You can’t gauge the performance of a design until you’ve coded it and can profile it. However, if you’ve got a problem, it’s most likely going to be your design, not your code. So, basically, there’s no way to tell how good a design is going to be until it’s too late to do anything about! Not exactly, though. If you take the standard linear model of development (design, code, test, ship), you’re right: it’s impossible to heed all the rules. However, if you look at the development process as being iterative (design, code, test, re-design, re-code, re-test, re-design, re-code, re-test, …, ship), then it becomes possible. You will probably have to go through and test several designs before you reach the “right” one. Look at one of the most performance obsessed companies in the software business: Id Software (producers of the games Doom and Quake). In the development of their 3D display engines (which are super performance critical) they often will go through several entirely different designs per week, rewriting their engine as often as necessary to achieve the result they want. Fortunately, we’re not all that performance sensitive, but if you expect to design your code once and get it right the first time, expect to be more wrong than right.
Rule #8: You’re either part of the solution or part of the problem
This is a simple rule: don’t be lazy. Because we all tend to be very busy people, the reflexive way we deal with difficult problems is to push them off to someone else. If your application is slow, you blame one of your external components and say “it’s their problem.” If you’re one of the external components, you blame the application for using you in a way you didn’t expect and say “it’s their problem.” Or you blame the operating system and say “it’s their problem.” Or you blame the user for doing something stupid and say “it’s their problem.” The problem with this way of dealing with these things is that soon the performance issue (which must be so
lved) is bouncing around like a pinball until it’s lucky enough to land on someone who’s going to say “I don’t care whose problems this is, we’ve got to solve it” and then does. In the end, it doesn’t matter whose fault it is, just that the problem gets fixed. You may be entirely correct that some boneheaded developer on another team caused your performance regression, but if it’s your feature it’s up to you to find a solution. If you think this is unfair, get over it. Our customers don’t blame a particular developer for a performance problem, they blame Microsoft.
Also, don’t live with a mystery. At one point in working on the boot performance of my application, I had done all the right things (profiled, re-designed, re-profiled, etc) but I started getting strange results. My profiles showed that I’d sped boot up by 30%, but the benchmarks we were running showed it had slowed down by 30%. My first instinct was to dismiss the benchmarks as being wrong, but they had been so reliable in the past (see rule #3) that I couldn’t do that. So I was left with a mystery. My second instinct was to ignore this mystery, report that I’d sped the feature up 30% and move on. Fortunately, program management was also reading the benchmark results, so I couldn’t slime out of it that easily. So I was forced to spend a few weeks beating my head against a wall trying to figure out what was going on. In the process, I discovered rule #9 below which explained the mystery. Case closed. I’ve seen many, many developers (including myself on plenty of other occasions) fall into the trap of leaving mysteries unsolved. If you’ve got a mystery, some nagging detail that isn’t quite right, some performance slowdown that you can’t quite explain, don’t be lazy and don’t stop until you’ve solved the mystery. Otherwise you may miss the key to your entire performance puzzle.
Rule #9: It’s the memory, stupid.
As I mentioned above, I reached a point in working on speeding up application boot where my profiles showed that I was 30% faster, but the benchmarks indicated I was 30% slower. After much hair-pulling, I discovered that the profiling method that I had chosen effectively filtered out the time the system spent doing things like faulting memory pages in and flushing dirty pages out to disk. Given that: 1) We faulted a lot of code in from disk to boot, and 2) We allocated a lot of dynamic memory on boot, I was effectively filtering out a huge percentage of the boot time out of the profiles! A flip of a switch and suddenly my profiles were in line with the benchmarks, indicating I had a lot of work to do. This taught me a key to understanding performance, namely that memory pages used are generally much more important than CPU cycles used. Intellectually, this makes sense: while CPU performance has been rapidly increasing every year, the amount of time it takes to access memory chips hasn’t been keeping up. And even worse, the amount of time it takes to access the disk lags even further behind. So if you have really tight boot code that nonetheless causes 1 megabyte of code to be faulted in from the disk, you’re going to be almost entirely gated by the speed of the disk controller, not the CPU. And if you end up using so much memory that the operating system is forced to start paging memory out (and then later forced to start paging it back in), you’re in real trouble.
Rule #10: Don’t do anything unless you absolutely have to
This final rule addresses the most common design error that developers make: doing work that they don’t absolutely have to. Often, developers will initialize structures or allocate resources up front because it simplifies the overall design of the code. And, to a certain degree, this is a good idea if it would be painful to do initialization (or other work) further down the line. But often times this practice leads to doing a huge amount of initialization so that the code is ready to handle all kinds of situations that may or may not occur. If you’re not 100% absolutely sure that a piece of code is going to need to be executed, then don’t execute it! Conversely, when delaying initialization code, be aware of where that work is going to be going. If you move an expensive initialization routine out of one critical feature and into another one, you may not have bought yourself much. It’s a bit of shell game, so be aware of what you’re doing.
Also, keep in mind that memory is as important as code speed, so don’t accumulate state unless you absolutely have to. One of the big reasons why memory is such a problem is the mindset of programmers that CPU cycles should be preserved at all costs. If you calculate a result at one point in your program and you might need it later on elsewhere, programmers automatically stash that result away “in case I need it later.” And in some expensive cases, this is a good idea. However, often times the result is that ten different developers think, “All I need is x bytes to store this value. That’s much cheaper than the cycles it took me to calculate it.” And then soon your application has slowed to a crawl as the memory swaps in and out like crazy. Now not only are you wasting tons of cycles going out to the disk to fetch a page that now would have been (now) much cheaper to calculate, you’re also spending more of your time managing all the state you’ve accumulated. It sounds counterintuitive, but it’s true: recalculate everything on the fly; save only the really expensive, important stuff.
© 1997-2004 Paul Vick
Pingback: ShowUsYour
Pingback: Steven Smith
Great stuff Paul…
Pingback: guidmaster
Pingback: ThoughtChain
Pingback: .Net Blog
Pingback: Brandon Fuller
Pingback: Paul Brenner's Blog
Pingback: Malay Shah
Pingback: MasterMaq.NET
Pingback: shawn's blog
Pingback: Rico Mariani's WebLog
Pingback: Chris Hammond's Blog
Pingback: .NET From India
Pingback: HelloWorld
The very last point you make, on the last paragraph, is the most important one, imho. Many of the other points are beginner mistakes. Hopefully people will learn from experienced developers before they make them.
But the last point, about calculating being cheaper that data caching, is very interesting. I have noticed in my own performance tuning efforts, since the 80s, that over time, calculating has become cheaper and cheaper, relative to caching, precisely for the reasons you note: processor speed faster and faster relative to memory cache misses. In fact, if you don’t hit the disk, or perform a lengthy loop, it almost always pays off to choose compactness over complexity of access. The more compact your memory footprint, the faster the whole system becomes.
Unfortunately, ease of garbage collection leads people to waste memory with impunity, without considering the impact on memory cache misses.
Pingback: Franci Penov
Pingback: Buglinks
Pingback: Raj Chaudhuri
Pingback: Amanda Murphy's Weblog XBOX :: T
Pingback: @Splat
I disagree with most of these points, since they force the developer to think about performance all the time. I agree that many problems are design problems and that profiling is crucial, but even more importantly are my own 2 rules of optimization.
Rule #1: Don’t!
Rule #2: Not yet!
Or in other words:
Focus on solving the problem, not on making the code unreadable to solve some phantom performance problem. Its better to have a slow program that works than a fast one that doesn’t.
Jet, I completely agree that you need to make the code work first and then worry about whether it’s fast or not. It’s kind of implicit in the rules – it’s not really possible (or useful) to profile code until it actually works. Even then, it’s a balancing act – while you can easily waste time worrying about performance issues that may not turn out to occur, you can also waste time by not doing at least some thinking up front about performance. It’s not like everyone should start with bubble sort and then work their way up to using quicksort once problems become apparent…
Pingback: overflow
Pingback: pdeb's Blog
I just keep missing the point. When will a developer turn himself into another geek instead of agressively maken empty recommondations, personal assaults, contradictory statements, idiosyncratic behaviour, etc. etc. etc. It’s just give everyone an nice ego massage who’s within the same league or within the same (mostly load American) company protectionism, without virtually, or should i say, in abstract terms, literally and absolutely nothing about what the matter really is about, besides some dull bureaucratic, money wasting, printing no man’s place. Sorry, i losing every standard milestone of this universal dullness and empty mindedness. Wish you luck.
hear the silent masses of specialists, who usually shut up when the recognize a common or shared interest and refuse to explain anything of the subject matter. What a waste.
Pingback: Vinod Kumar's Blog
Paul, thanks for making the effort to post this. Your valuable insights will not be wasted on me. — Cheers
Paul,
That’s one of the best quick top-N lists of perf I’ve ever read.
There’s one thing that should be added to the list however: disk seek time.
You state: "After much hair-pulling, I discovered that the profiling method that I had chosen effectively filtered out the time the system spent doing things like faulting memory pages in and flushing dirty pages out to disk." Well, the same thing goes for disk access: the OS will typically schedule some other process during the time when your task is waiting for disk, and so many profilers won’t record that time.
Over the last 10-20 years, the speed of data transfer has increased greatly, but seek time has not decreased much at all. A database product I was working on made heavy use of B-Trees, where every element was about 4K. To load the database and display just the most basic information a normal user would see, the program would have to read in 50-100 B-Tree entries, which were scattered all over the database file. The biggest perf boost I made was to increases the size of the leaf elements to 32K. Now we read in only 10-20 entries, and that part of the boot time got 5x better. Because the difference between reading 4K and 32K was marginal. But the difference between 10 seeks and 50 seeks was tremendous.
To give another example: for one test of search speed, we had to read in about 1,000 records and scan for specific text. I changed our search algorithm so that it first sorted those records according to their order ON DISK, and *then* did the search. Immediately, the search became 5x faster.
And to loop back to my original point: in these disk tests, the profiler did not highlight disk access as being a hotspot. It did show a fair amount of time being spent in an OS process that had to do with I/O, but there was no indication which process had initiated the I/O that the system was waiting on.
Jorg, definitely true! I would rank disk seek time right up there with memory usage. In fact, often times the two go together!
Pingback: The Technical Adventures of Adam
Pingback: Panopticon Central
Pingback: help.net
So True, I agree especially with #9 and #8.
In my last application I was getting major performance issues and after redesigning and re-coding numerous times, I discovered that it was because I was using up to much memory. I had to make a plan on how to improve this, and after about 30 minutes on the machine, changing code here and there, my performance shot up dramatically.
I still was not as fast as I would have liked and finally found that it had to do with a component my application used. Naturally, I blamed the component and the developers and just let the app go on until management, QA and my sense of pride told me something had to be done. I got the code for this component and after about 4 days I had found the bottleneck and the application was better for it. Now everyone who uses this component is happy with the performance gain their applications automatically got.
Basically what I learnt during the course of this project is that if it affects you, then it is your problem and you must do whatever is necessary to solve it.
This was an excellent article BTW
Excellent post Paul. I have referred so many people to this post when I talk to them about perf testing. It’s an eye opener for some people.
An excellent read.
Pingback: Jan on .Net
Pingback: Jan on .Net
Hi.
The rules can work out better if you a tool like one at http://www.SoftProdigy.com.
Its called profilesharp and it does all the things that u have ever seen a profile do.
Profile .NEt Windows services ,ASP.NEt Attach to running proccesses. Profile right down to the source-sode line level.
Pingback: IWebThereforeIAm
Pingback: InfoPath Team Blog
Pingback: Code Eater
Pingback: Anonymous
Pingback: Anonymous