Vladimir Vukićević — Words
 

Archive Page 2



Android Progress: March 31 Edition

Wow, it’s great to see how excited people are about getting Firefox running on their Android phones! We’ve made a bunch of progress in the past few weeks, and we’ve really ramped up development in the past few days, including bringing new folks onto the project.

We’ve done a bunch of stuff recently; Michael Wu has done some great work fixing issues with the soft keyboard support and cleaning up rotation behavior. He’s also done some more fun things like hooking up the accelerometer to the MozOrientation event. We’ve also experimented with a few different rendering approaches, moving from software rendering using private APIs, to using OpenGL, and then moving back to software, but without private APIs this time. (Turns out, for what we’re currently doing, non-GL rendering is crazyfast, though that will change once we get our hardware accelerated rendering system going on Android.)

There are still some bugs to fix before we’re comfortable letting people download nightly builds — for example, current and older development builds can lock up your phone, requiring killing the process from a debug terminal or rebooting. Those are the bugs that we’re spending all of our time on; we don’t want to have any kind of builds available until we think that our testers will have an acceptable experience (and locking up your phone isn’t acceptable!).

Having nightly builds available and getting some feedback is an important step before we can even consider releasing an alpha version. I’m also working (right now, in fact, while writing this post — waiting for a build to finish!) on getting Weave ready to be used with our first nightlies, because having your Firefox history and bookmarks synced onto your phone is pretty fantastic.

Here’s a quick video I took yesterday of using Fennec on a Nexus One:

Last but not least, we’re getting very close to moving all the patches and changes we have in flight (some in various repos, some just sitting on our computers) checked in to mozilla-central, so the Android work will be on the same platform that’s been seeing some fantastic improvements recently. Remember that about three months ago we had nothing working or even building on Android; now it’s getting ready to land on the trunk!

Three Fennecs, All in a Row

After wrestling with OpenGL on Android for a bit, I was finally able to get Gecko on Android rendering using OpenGL.  This was needed to both simplify the build process, removing the need to have private Android headers and libraries available, and also to remove an expensive CPU RGB->BGR byte swap.  Michael Wu’s also done a pile of work, including the all-important keyboard hookup so that you can, you know, type in some URLs or search terms.  (Handy in a web browser.)  Here’s a little family portrait of Fennec running on a Nexus One, a Motorola Droid, and a display attached to a NVIDIA Tegra 250 devkit.

We’re still working on getting the basic blocks in place to where it’s “dogfoodable”, that is, usable by the developers.  The good news is that while the builds are already pretty fast, we’re seeing that we have a lot of headroom for performance… especially for visual things like rendering and panning.  Most of the work we’ve done so far has just been quick work to unblock getting the basic port running; I’m looking forward to being able to dig deeper into a bunch of these issues!

Things I Learned Today: Android OpenGL Edition

One of the issues with the Gecko port to Android is that, early on, I used some internals to tie in to the Android graphics system from native code.  This worked fine, but it complicated the build: you needed to pull in a bunch of headers and some libraries from the actual Android source to be able to complete a build.

The solution for this was pretty easy: move to OpenGL for rendering.  However, there are some interesting quirks here.  I’m targeting Android 2.x only: specifically the Motorola Droid, HTC Nexus One, and a NVIDIA Tegra 250 devkit I have here.  For this initial step, all I need is to just draw a textured quad.  We’ve got full OpenGL compositing, rendering, fancy video decoding and all that stuff coming later, but for now we’re just hooking into our software rasterizer, uploading the result as a texture, and drawing a textured quad.  Easy, right?  Here are some random issues I ran into while doing this over the past day or two.

First Attempt: OpenGL ES 1.1

Well, there are two wrinkles.  First, Cairo’s software rasterizer uses a 32-bit ARGB pixel format and layout.  In little-endian per-byte terms, that’s B G R A.  OpenGL ES supports A R G B.  There is an EXT_bgra extension that adds support for GL_BGRA as another byte format, and this extension is one that’s potentially available on GL ES.  The second wrinkle is that this quad is display-sized, so the texture is display sized; it’s not going to be power-of-two dimensions.  While OpenGL ES 2.0 supports non power of two textures in the base (with some limitations, which are not relevant for my use case), ES 1.1 does not, and I figured given that all I was drawing is a textured quad, I may as well use ES 1.1.

Unfortunately, the tree devices I mentioned above support different combinations of these.  The NVIDIA device supports both EXT_bgra and ARB_texture_non_power_of_two.  This is perfect; no workarounds are needed here, though for some reason it doesn’t like TexSubImage2D with BGRA data, but that’s not a big deal.  The Droid (OMAP3, PowerVR SGX) supports EXT_texture_format_BGRA8888 (note: different name, similar functionality), so that’s good, but it doesn’t support non power of two textures with ES1.1.  The Nexus One, on the other hand, supports neither BGRA nor NPOT textures.

I was about to start using OES_draw_texture as well, because that seemed like a potentially faster way to get what I want to happen — but the lack of BGRA support on the Nexus One made me turn to ES2, where I can do the RGBA->BGRA swizzle in the fragment shader.

Undefined Symbols in GLESv2 Import Library

More fun!  The Android r3 NDK includes GLESv2 support, yay!  The bad news is that libGLESv2.so includes an external reference to _ZN7android33egl_get_image_for_current_contextEPv (android::egl_get_image_for_current_context), which means you’ll get linker errors (or at least undefined symbol errors) if you try to link anything that’s not a shared library.  Conveniently, that’s what you need to produce with the NDK anyway, but if you have some helper command line tools along the way, they’ll fail.  The solution is to add -Wl,–allow-shlib-undefined to your binary compile/link step.

After that, this was fairly straightforward, though the SDK only grudgingly allows you to specify the necessary EGL tokens for GLES2; the code samples in the NDK all just provide explicit integer values for them inside the code.

Choosing an EGLConfig

This applies to both OpenGL ES 1 and OpenGL ES 2 on Android.  When creating an EGLSurface for a SurfaceView (take a look at how GLSurfaceView does it for the details), you have to get an EGLConfig that has an exact match for the number of red/green/blue/alpha bits as your surface.  There’s a format parameter to surfaceChanged that’s supposed to tell you the format of the surface.  However, it seems to always show up as ‘-1′, which according to PixelFormat.java, is “OPAQUE”.  That’s not very helpful.  Reading GLSurfaceView, it can show up as -2, which is TRANSPARENT.  So — you have to assume that if you have an OPAQUE surface format, it’ll be 5650, and if you have a TRANSPARENT format it’ll be 8888.  This is pretty silly, as there are PixelFormat constants for handy things like RGBA_8888, RGB_565, RGB_888, RGBA_5551, etc.  Why doesn’t SurfaceView send the actual format down?

The devices that I have seem pretty consistent at least with 565 for OPAQUE, so it works OK, but it’s not pretty, and will likely blow up spectacularly if anyone introduces, say, a large-display Android device that uses 24bpp color.

Another config issue is that some GPUs have some odd requirements for getting the most preformance; for example, as discovered via searching, the PowerVR SGX in the Droid really wants 24-bit depth, as it’s faster than 0 and 32.  The Tegra, on the other hand, doesn’t have 24bpp depth at all, only 0 or 16 (and I don’t think it cares one way or the other).  Not sure whether the GPU in the Nexus One cares or has a preference.  So, you have to search for a 24-bit depth config first, use it if it’s found, and then try 0 if not found.  I suppose an alternate approach might be to search for 16-bit depth, but that might give you 32-bit if that happens to be supported somewhere.

At the end of all of this though, I have an app that uses OpenGL ES 2 on three different Android devices (with three different GPUs).

A while ago, Rob Arnold wrote a simple python caching proxy server for use with our Talos tests — the idea was that you’d run your test once against the proxy server in “record” mode, and then after that you can use the server for consistent local playback.

I was giving some WebGL demos recently, and needed a way to have all the content from the web-hosted demos locally.  As anyone who’s tried to create a local cache of any “Web 2.0″ app knows, it’s painful, given all the server requests, XMLHttpRequests, etc. that go on.  However, with the proxy server, this was actually ridiculously easy.

You can grab the proxy server here — it still lives in Mozilla CVS — proxyserver.py.  It works fine on Win32, OS X, and Linux.  On Win32, the python that’s part of mozilla-build works well.  Run it like this:

python proxyserver.py

and then in Firefox’s proxy settings (or the system proxy settings), set your HTTP proxy to localhost:8000.  You can change the port via a command line option.  Then, visit all the pages/sites that you want cached (don’t forget to shift-reload or clear Firefox’s cache beforehand to ensure that Firefox actually goes out to the network!).  After you’ve got everything going, restart the proxy server in local-only mode:

python proxyserver.py -l

… and make sure that your demos work.  You can also run without -l live, especially if you will have a network connection (even a slow one) to give you the option of going out to the network if necessary.  Also, if you want to copy the proxy cache to another machine, just copy proxy_cache.db that gets created in the same directory as proxyserver.py.

The proxy server currently supports HEAD and GET requests.  It doesn’t support POST, so if you have something that depends on POST, you’re out of luck.  It wouldn’t be too hard to add though; patches accepted if someone wants to tackle that.

With the proxy server, I was able to give a bunch of demos that made heavy use of XHR, including some that loaded video, without having to rely on a network or spend time downloading and fixing up URLs.  It really made demo prep much easier.

One common thread running through the many different and interesting WebGL projects out there is that they all need to do vector and matrix math, do it quickly, and do it in JavaScript.  To date, developers have either rolled their own, or they’ve used Sylvester, a fairly featureful vector and matrix JavaScript library.

One of the problems with Sylvester is that while it’s fully featured (arbitrary NxN matrices and vectors can be created and manipulated), it suffers in performance because of it.  Since this is such a crucial part of a successful WebGL program, I’ve put together a small package that I’m calling mjs.

mjs is designed around speed and simplicity.  For example, it doesn’t attempt to stuff vectors and matrices into JavaScript objects.  Because the language offers no operator overloading, there’s very little benefit in treating these types as discrete objects, and lots of performance and memory usage downsides.  Instead, it provides a set of functions for performing operations on vectors and matrices, which can be any array-like object.  For any function that returns a vector or matrix, an existing array can be passed in to take the result, or the function can create a new one.  Array reuse ends up being important because of the potential for expensive garbage collection churn eating away at performance.

Here’s a sample of the API:

var r = M4x4.rotate(Math.PI/2, V3.$(0, 1, 0),  M4x4.I);

Note that V3.$ and M4x4.$ are shorthand for creating a new V3 or M4x4 (I wanted to use V3() and M4x4(), but that didn’t work out too well since functions have a length property).  However, because all they return are just new array-like objects, you could also write:

var r = M4x4.rotate(Math.PI/2, [0, 1, 0], M4x4.I);

If the WebGL types are available, those will be used for newly created vectors/matrices.  They are a significant performance boost especially for repeated operations; but for specifying one-off vectors such as the above, literal array syntax is fine.

The rotate function internally makes a rotation matrix, and then multiplies it by the given matrix.  So the above could also be written as:

var rotation = M4x4.makeRotate(Math.PI/2, [0, 1, 0]);
var r = M4x4.mul(M4x4.I, rotation);

(The last line being redundant given that we’re multiplying by the identity matrix.)

All methods that return a vector or matrix take an optional final argument, that of an existing object to reuse.  For example:

var m0 = M4x4.$();
r = M4x4.mul(someMatrixA, someMatrixB, m0);
// r == m0, so the assignment isn't necessary, but it's handy for chaining
// .... do something with r ...
r = M4x4.mul(someMatrixB, someMatrixC, m0);
// r == m0 still
// ... do something else with new results ...

Without allocating any additional temporary objects.

As mentioned before, one of the goals of mjs is performance.  Matrix multiplication is one of the most common tasks, so here are some numbers comparing mjs, Sylvester, and native C code.  This was run on a Core i7 desktop using a local build of Spidermonkey, which included one patch that’s about to go into the tree that fixes the no-reuse tracing case.  (Without it, the no-reuse tracing case is much larger because it’s never actually jitted.) The test is simple: it multiplies two matrices together in a loop 1,000,000 times.

Test Time
mjs, JIT, matrix reuse 140ms
mjs, JIT, no reuse 533ms
Sylvester, JIT, no reuse 5,280ms
mjs, no JIT, matrix reuse 25,833ms
mjs, no JIT, no reuse 26,681ms
Sylvester, no JIT, no reuse 41,996ms
Native C++, SSE2, matrix reuse 71ms
Native C++, SSE2, no reuse 142ms

(I also have numbers for MSVC without the SSE2 compile flag, but the numbers vary greatly depending on whether the values eventually go to infinity or not; if the values end up trending towards 0, the non-SSE2 code tends to win at around 52ms vs. 71ms; if the values trend to infinity, the non-SSE2 code takes around 11,000ms!)

Those numbers are pretty encouraging — having native code be only 2x as slow for something like this is pretty nice to see. Granted, this is only a very isolated test, and I’m sure there are some tricks to optimizing the native code case (it’s currently just a fully unrolled set of multiplies and adds). The “no JIT” case is less nice, but I’m sure that our Jaegermonkey folks will be all over this testcase (right, guys?). In any case, ideally most WebGL rendering loops will be fully traced in Firefox, so it would be less of an issue.

mjs is still very much a work in progress; it’s missing a test suite and a whole bunch of features. You can find it hosted at Google Code, at webgl-mjs. (Side note: I couldn’t just call the project mjs because a project called mjs was abandoned on Sourceforget 5 years ago, and Google Code complained.) There’s also some documentation, viewable online here.

Bugs and contributions welcome!