Vladimir Vukićević — Words
 



The recent posts about TraceMonkey and JavaScript performance have all focused on x86, because, unsurprisingly, the majority of the web's desktop users are on x86 platforms.  However, mobile and handheld platforms are going to quickly become consumers of the full web, and core performance gains will often yield much more significant user-percptible performance improvements.  For example, on a desktop, a 5x speedup from 500ms to 100ms for a particular action results in "hey, that feels snappier".  On lower power devices, if the same testcase originally takes 5s, speeding it up to 1s turns an action that wasn't usable into something that is.

Over the past few weeks, I've spent some time getting nanojit working on ARM.  There were two pieces of this work: the first was adding support for emulated floating point, for use on devices that do not have a floating point unit.  This work is portable to any other platform without hardware floating point; it simply translates all floating point instructions within nanojit into appropriate function calls.  The other piece was adding support to the nanojit ARM backend for the VFP (vector floating point) unit that's present in most recent ARM cores, and emitting native VFP code.  The current speedup gains are in many cases quite similar to what we see on x86, though there is still much more ARM-specific work to be done to generate the most efficient code possible.

Let's look at the current speedup state.  Here are a few microbenchmarks from the SunSpider suite, testing a few core JS operations.  All the numbers are the speedup factor over current SpiderMonkey with tracing disabled (i.e., "5" means "5x as fast as no-tracing SpiderMonkey").

Next up are the individual results of the SunSpider benchmarks.

The large speedups are things that TraceMonkey can handle well currently, where most, if not all, of the benchmark is successfully traced.  The tail of tests that don't show any performance improvement are largely due to missing tracemonkey features, leading to a trace abort — the point at which the tracing infrastructure needs to go back to the interpreter because of an operation that it doesn't know to express.  One notable exception to that is the crypto-md5 test — the trace succeeds, but it's so large that executing the CSE optimization pass dwarfs any performance gains that happen on trace.  Hackers are on the case!

It's important to note that, much like on x86, this is still the early days of performance wins that are possible.  Core improvements in tracing will have an effect on both x86 and ARM (as well as x86-64, the three currently supported nanojit backends — anyone interested in doing a Sparc and/or PPC backend?), and there's still lots of work being done on nanojit itself.  The result of all this work will be a richer web experience on mobile and embedded devices, by allowing those users to take advantage of modern web applications that do much of their work on the browser instead of server side.  Mobile users should be able to try out the JIT in the next alpha release of Fennec by enabling a config setting, like users of our desktop Firefox nightly builds can do today.

This work was largely done on a BeagleBoard, which, as I mentioned earlier, is a great little device for any ARM work, or as a speedy little computer for multimedia/car PC/whatever else purposes.  Chris Blizzard just convinced me to do a separate blog post about the beagle, including all the bits and pieces that I needed to get things to work so that he can replicate my setup, so I'll talk about that separately soon!


9 Comments to “TraceMonkey: Coming To A Pocket Near You”  

  1. 1 David

    What does the X scale represent on the graphs?

  2. 2 vladimir

    Speedups over Mozilla’s current JS engine, with tracing disabled — that is, a “5″ means “5x faster than the current JS engine without tracing.”

  3. 3 DAvid

    Wow, that’s a lot faster. Nice work. What was the total seconds scores for both in sunspider?

  4. 4 Thales

    Vladimir,

    X86-64 is or not supported by nanojit? In Mozillazine forums people say that it is not, and it will not be out soon…

  5. 5 RyanVM

    Nanojit is supported for x86-64. Firefox doesn’t officially-support x86-64 builds, however.

  6. 6 suihkulokki

    Quite impressive :) Does tracemonkey detect the presence of VFP runtime or does one need to compile different versions of fennec for VFP/softfloat systems?

  7. 7 Tobu

    There are now official x86_64 trunk builds, which is enough for me.

    I’ve compared Dromaeo results, it seems x86_64 doesn’t really use tracemonkey:

    http://dromaeo.com/?id=42043,42046

    (Clean profile, cautiously doing a restart for the change to javascript.options.jit.content to take effect)

  8. 8 Mark

    This is great, but when was Javascript performance ever the bottleneck? How many people raytrace, crunch fractals or do 3d in Javascript on their mobile devices? Sure, this helps, but it’s avoiding the main problem and perception of slowness in browsers for the vast majority of users: rendering speed. It’s not how fast the javascript is that’s important, the majority of time is taken up reflowing, redrawing and composing the page after the Javascript has modified the DOM.
    And the much promised speedups in Cairo don’t seem to be materializing.

  1. 1 Christopher Blizzard » Blog Archive » mobile / arm tracemonkey first numbers


Leave a Reply