Shortly before our office move, we kicked off an effort to take a hard look at our startup time, to both understand what we all do, and to figure out how to improve it. zpao (Paul O’Shannessy), ddahl (David Dahl), and I have been working towards a few goals:
- Document how to reproducibly get a cold and warm startup on Windows (XP/Vista/7), MacOS X, and Linux
- Create tools to capture both JS execution during startup, as well as file IO
- Add instrumentation to firefox to identify “big blocks” of startup for timing
- Create tools to visualize the captured data in a way that’s easy to analyze
One thing that’s fairly obvious with playing with startup is that “warm” startup is significantly faster than “cold” startup; that is, when you’ve launched Firefox before, the OS caches a bunch of the data off the disk, and it doesn’t have to hit the disk again. This directly points to IO being a major component of our startup time, which is why IO is part of the capture above. This is a pretty big problem even on desktop systems; on my fairly beefy Windows 7 box, a cold startup takes upwards of 12 seconds (!); warm startup is also fairly slow if the system is under load.
We’ve fixed some bugs in our dtrace javascript provider along the way (bug 403345), so dtrace will actually give correct (and sane) data now. Also, I’ve been doing a lot of work with Microsoft’s xperf (part of the Windows Performance Toolkit), which can capture much the same data. (In theory we should be able to create JS providers for xperf as well, but that’s out of scope for this particular project.)
One example of the type of data we’re capturing and tools that we’re building is available here — this is just a quick io capture with xperf, with the data dumped into a Timeline widget from the SIMILE project. (The time scales are a bit off; the raw data is in microseconds, but SIMILE only handles milliseconds… so all times need to be divided by 1000, which becomes a problem when you go over 60 seconds — which is actually just 60 ms! Something that we’ll fix.)
Another example is the result of a startup trace; zpao is still working on the visualization and data capture, but you can see an early version here — the “Exclusive function elapsed times” view will provide the most accurate data, basically telling you “how long did we spend in a given function, ignoring all descendants”. In this view, the “null” filename dominates, generally indicating native code. And within that, calls to “getService” also dominate, which indicates that much of the time is spent within getService, presumably initializing whatever the requested service is.
In the future, we hope to have hierarchy correctly represented in the inclusive view, as well as adding IO operations as part of that hierarchy. Also, these tools aren’t really limited to analyzing startup; they will hopefully form the basis of a set of javascript performance analysis tools that we can apply to any browser operation.
Besides IO and JS, Taras Glek found in earlier examinations of startup that loading CSS/XBL/etc. was taking a significant amount of time. We’re working on instrumenting those parts of the code as well, so that we can capture it along with the raw js/io/etc. portions.
Is there any other data that we should be capturing? Let us know, and we’ll see if we can figure out how to add it in. I’ll keep posting updated data as we have it, and will probably create a web page to collect it all — at that point it’ll be open season on any issues that can be identified.
Please direct comments followups to the dev.apps.firefox post.