Archive Page 2
From a transcript of the SXSW browser panel:
Chris [Wilson]: The thing that I'm really passionate about is web standards. Silverlight is a part of the web just like PDF and plaintext are part of the web.
Uh, it's what now? (Read the rest of the panel transcript, it's pretty good.)
Some of you may have noticed that the Firefox 3 nightly builds have felt a lot snappier since a few weeks ago. There's an interesting story in that, one that I finally have time to write up. We've had a number of bugs on the Mac where people were complaining of bad performance compared to Firefox 2, usually involving a test where a page was scrolled by a small step 100 or so times, and the time from start to finish was recorded. In many of these tests, Fx3 was coming in at 50% to 500%+ slower. This was odd, because in theory the graphics layer (which is what scrolling is mostly exercising) in Firefox 3 should be faster, given that it's talking almost directly to Quartz.
I noticed a few odd things -- if I disabled drawing of the native theme (used for the scrollbar), and if I tested just a tall page with a solid colored background and a scrollbar, I saw the same performance issues. There's a problem here, one that I initially blamed on the native theme drawing, because if I disabled native theme rendering, benchmark performance shot up, but there were still some inconsistencies. I collected some numbers: for a given test, the minimum time that it can report is the number of iterations multiplied by the setTimeout delay used. For example, with 200 iterations and a 50ms setTimeout, there's no way that test can finish faster than 10,000ms. The difference between that time and the time the test actually reports is the time spent doing interesting things. By running my test for different timing values, I ended up with this graph:
The horizontal axis is the setTimeout value, and the vertical axis is in milliseconds. That plateau there? That's not supposed to be there. Looking more closely at that number, it's roughly 6500ms. With 200 iterations, that's 6500/200 = ~32.5ms per iteration. So, per second, that gives us 1000/32.5 ~= 30.77. We're capping at 30.77 frames per second here. That's way too round of a number. Even more suspicious, if I disabled native theme rendering, the test hit a plateau at around 3300ms, which ends up being 60fps.
It's at this point that I start to get an idea of what's going on. I was aware of Beam Sync on OSX, but assumed that each app had to opt-in to it, given that it didn't seem to affect Firefox 2. Quartz Debug lets you disable Beam Sync on a global basis; I did that, and the benchmark numbers dropped -- the line above kept nicely following the blue line down, and I was able to peg the fps needle in Quartz Debug over to the right. So, we're being throttled by the OS which is forcing us to wait for the next frame interval before allowing us to draw again. This is a pretty serious problem, because at this point I thought that the only way to disable this was on a system-wide basis, which wouldn't be acceptable. Firefox 2 didn't suffer from this, though, so I did some more digging.
Eventually I came to this tech note from Apple. The reason why Firefox 2 wasn't affected was that Fx2 was not a Cocoa app; it is a Carbon app, and as such was exempt from being subject to coalesced updates. The key thing showed up in the "last resort" section of that tech note: how to disable coalesced updates for an individual app! This seems to be available available only on 10.4.4 or later, but that was fine; OS minor updates are free. I verified that adding the plist entry fixed the problem for me locally, and checked this in to become part of the build. See if you can spot when this change hit our performance testing infrastructure:
Not bad for two lines of XML.
While figuring all this out, I noticed that Safari/WebKit didn't seem to be affected by this framerate cap -- the fps meter when Safari was running the same benchmark happily went up beyond 60fps. After I found the plist entry, I checked Safari's plist and was surprised to discover that they didn't have this disabling in there. Doing some more searching, I found this code in WebKit. Apparently, there is a way to do this programatically, along with some other interesting things like enabling window update display throttling (though it's unclear what that means!) -- but only if you're Apple.
All these WK* methods are undocumented, and they appear in binary blobs shipped along with the WebKit source (see the WebKitLibraries directory). There are now over 100 private "OS-secrets-only-WebKit-knows" in the library, many of which are referred to in a mostly comment-free header file. Reading the WebKit code is pretty interesting; there are all sorts of potentially useful Cocoa internals bits you can pick up, more easily on the Objective C side (e.g. search for "AppKitSecretsIKnow" in the code), but also in other areas as a pile of these WK* methods used in quite a few places. Would any other apps like to take advantage of some of that functionality? I'm pretty sure the answer there is yes, but they can't. It's not even clear under what license libWebKitSystemInterface is provided, so that other apps can know if they can link to it.
Despite my frustrations with Linux, this type of hiding isn't really possible in a real open source environment. In the end, I really do hope that Linux can rise to the technical challenge and compete in desktop performance and features, but it's not there yet. However, I'm glad that there was a workaround for this issue for us on OSX, because the performance benefits are huge -- Firefox 3 on the Mac (everywhere, really) is going to be a kick-ass release!
Edit: Slashdot seems to have picked up on this, and in typical style, has completely misunderstood the post. To be clear, I do not think that Apple is in any way trying to purposely "cripple" non-Apple software. I also do not think that undocumented APIs give Safari any kind of "significant performance advantage" (as Firefox 3 should show!). However, as I said, the undocumented functionality could be useful for Firefox and other apps to implement things in an simpler (and potentially more efficient) manner. I don't think this is malicious, it's just an unfortunate cutting of corners that is way too easy for a company that's not fully open to do.
I've recently rediscovered Greasemonkey, and have used it to make two small fixes that have been annoying me for a while. A few people have asked, so I thought I'd share the user scripts. The first does some postprocessing of the tinderbox waterfall page, shrinking the Talos columns by making them use more of the large amounts of vertical space. Here's the script, and a picture of what the columns look like before the script is applied, and after.
The second script fixes something that's been annoying me for a while about the Mozilla development infrastructure --- all of our servers have an identical favicon, which means it's impossible to differentiate between different sites (or, most commonly, to find the tinderbox tab) when you have many tabs open. I blogged about this a while back, before the crash of my blogging machine; I received some good feedback, but my requests to get the icons changed on the actual sites didn't go anywhere. So, this script forces favicons for a couple of key mozilla sites, including tinderbox, bugzilla, wiki.m.o, and developer.m.o. The icons come from the famfamfam Silk set and a bit of hackery on my end using Pixelmator. Here's an image of what the result looks like:
![]()
So it turns out, now that Alice got the awesome nochrome tinderboxes up and running, that the Mac, on raw HTML layout, is about as fast as Windows. However, on win32 and linux, when you add in the perf hit of drawing the full browser UI (the nochrome boxes just render the HTML in a plain window with no widgetry), is on the order of 5%-10%. On the Mac, it's around 300%. Clearly there is something very wrong going there.
After some sniffing around, using a very reduced testcase (that I was working with for another performance issue) I came across the Mac native theme code. Disabling native theme entirely (commenting out all the calls) gave back all the performance. So I start digging into it. I collect a bunch of really weird data --- such as, if I replace a call to HIThemeDrawTrack with a loop calling HIThemeDrawTrack 5, 10, 50 times... the overall time spent in the benchmark doesn't change from 1 time. At 100 times though, it changes drastically. The delta between benchmark time and raw time spent in NSChildView's drawRect should remain mostly constant with the above changes, one would think. But it doesn't. Going from no native theme rendering to turning it on increases the "unknown" time significantly.
I'm still spending time with Shark (which is, I would argue, Apple's best developer tool -- maybe one of the best developer tools, ever), but the latest is completely baffling and what prompted this post. I finally got around to doing a shark System Trace, which traces at the user/kernel boundary, recording syscalls, preemptions, etc. In the syscall log, I noticed a bunch of calls to geteuid, getuid, etc., all being repeated. Looking at the call stack...
... you guessed it. HIThemeDrawTrack makes syscalls. For every single call, it calls geteuid, geteuid, getuid, geteuid, stat64, stat64. That's 4 geteuid/getuids and 2 stats. For every single call to the theme function. This is ludicrous. The stack traces indicate that this is because this function seems to pull out user preferences on each call, and then has to check if the preferences should change because the euid may have changed (?!?)... the getuid call is coming from _CFUserName. Because, you know, drawing a UI element needs to know the user name. It's probably also allocating the CFString to hold it.
I wish this was a joke, but it's not. I'm now looking into how to draw this crap faster; the NSCell API may be a better way to go -- we'll see if it has this same damage.
I just finished Uncharted: Drake's Fortune for the PS3. Overall, a very well done game; felt a lot like the Tomb Raider series, but with more shooting and fewer puzzles. I tend to prefer the opposite mix, but with the difficulty set to Easy, the shooting bits weren't annoying. The story is pretty good, and the voice acting is the best that I've seen in a game in a long time. The characters sound and act believable, which is impressive. Great graphics, no framerate issues, and I never had to load from a saved game -- the checkpoints are well-spaced and if you die, restarting from the checkpoint is instant. If you're looking for a game to spend an hour with now and then (there are 21 "chapters" that flow into eachother, each one taking probably an hour or so to complete), it is definitely worth a look. Not a whole lot to say, really.
You're probably not going to see me do too many of these types of posts, mainly because I hardly ever finish games, but I figured I'd start blogging about the games that I've played and do a mini review of sorts.
Search BlogAbout |
||||


