Over the past few weeks, I’ve spent some time getting nanojit working on ARM. There were two pieces of this work: the first was adding support for emulated floating point, for use on devices that do not have a floating point unit. This work is portable to any other platform without hardware floating point; it simply translates all floating point instructions within nanojit into appropriate function calls. The other piece was adding support to the nanojit ARM backend for the VFP (vector floating point) unit that’s present in most recent ARM cores, and emitting native VFP code. The current speedup gains are in many cases quite similar to what we see on x86, though there is still much more ARM-specific work to be done to generate the most efficient code possible.
Let’s look at the current speedup state. Here are a few microbenchmarks from the SunSpider suite, testing a few core JS operations. All the numbers are the speedup factor over current SpiderMonkey with tracing disabled (i.e., “5″ means “5x as fast as no-tracing SpiderMonkey”).
Next up are the individual results of the SunSpider benchmarks.
The large speedups are things that TraceMonkey can handle well currently, where most, if not all, of the benchmark is successfully traced. The tail of tests that don’t show any performance improvement are largely due to missing tracemonkey features, leading to a trace abort — the point at which the tracing infrastructure needs to go back to the interpreter because of an operation that it doesn’t know to express. One notable exception to that is the crypto-md5 test — the trace succeeds, but it’s so large that executing the CSE optimization pass dwarfs any performance gains that happen on trace. Hackers are on the case!
It’s important to note that, much like on x86, this is still the early days of performance wins that are possible. Core improvements in tracing will have an effect on both x86 and ARM (as well as x86-64, the three currently supported nanojit backends — anyone interested in doing a Sparc and/or PPC backend?), and there’s still lots of work being done on nanojit itself. The result of all this work will be a richer web experience on mobile and embedded devices, by allowing those users to take advantage of modern web applications that do much of their work on the browser instead of server side. Mobile users should be able to try out the JIT in the next alpha release of Fennec by enabling a config setting, like users of our desktop Firefox nightly builds can do today.
This work was largely done on a BeagleBoard, which, as I mentioned earlier, is a great little device for any ARM work, or as a speedy little computer for multimedia/car PC/whatever else purposes. Chris Blizzard just convinced me to do a separate blog post about the beagle, including all the bits and pieces that I needed to get things to work so that he can replicate my setup, so I’ll talk about that separately soon!