Vladimir Vukićević — Words
 



While starting to convert much of our IO to using nio byte buffers, with an eventual goal in pushing that further up into the application, I decided to investigate in some more detail performance. I’d seen some blog posts that claimed that performance wasn’t great, in particular a very old blog post from 2004. That post included a simple benchmark, which I grabbed, converted to use Int buffers, dropped the count to 10,000,000 int values, and ran it. The source is available as niotest.java. The results weren’t encouraging:

  Java Version      1.6        1.7
  array[] put      26 ms      31 ms
  absolute put    129 ms     130 ms
  relative put    130 ms     132 ms
  array[] get      20 ms      19 ms
  absolute get    116 ms     119 ms
  relative get    130 ms     137 ms

Not only was there no really visible perf changes between Java 1.6 and 1.7, using the nio buffers was 4x-6x slower than regular java arrays! I wrote a quick equivalent benchmark in JavaScript, using Typed Arrays, and originally saw numbers in the 11ms range. (Note: the original benchmark numbers aboe were inthe 300ms range for nio arrays, before a laptop suspend/unsuspend — I incredulously tweeted the 30x difference, and then went about cleaning up the benchmarks. I can’t reproduce either result now; the Java version got faster, and the JavaScript version got slower.) The JS benchmark (source code buftest.html) gives about 65ms for writing and 40ms for reading. That still seemed faster, and I set about writing this blog post.

As part of that, I decided to clean up the benchmark code and put everything together in a nice package. The source for the new benchmark is ArrayBenchmark.java. Like the original, it works on arrays/buffers of 10,000,000 integers, first writing each element (with just its index) and then reading each element in the get operation. The additional “copy into” benchmarks time how long it takes to copy all the ints into an existing int[] array. Here are the results:

                                  Java 1.6       Java 1.7
===== native java int[] array
                           put:  27.971961 ms   42.464894 ms
                           get:  32.949032 ms   14.826696 ms
                     copy into:  20.069191 ms   15.853778 ms
===== nio heap buffers
                           put: 839.730766 ms   57.876372 ms
                put (relative): 844.618171 ms   80.951102 ms
                           get: 742.287840 ms   80.578592 ms
                get (relative): 759.317101 ms   79.563458 ms
                     copy into: 769.494235 ms   91.685437 ms
===== nio direct buffers
                           put: 161.480338 ms   31.951206 ms
                put (relative): 170.194344 ms   47.541457 ms
                           get: 179.621322 ms   18.913808 ms
                get (relative): 164.425689 ms   29.387186 ms
                     copy into:  21.940450 ms   16.936357 ms
===== custom buffers
                           put: 151.095845 ms   48.125012 ms
                 put unchecked: 148.538301 ms   51.241096 ms
                           get: 146.243837 ms   36.636723 ms
                 get unchecked: 138.765277 ms   31.641897 ms
                     copy into:  41.643050 ms   20.206091 ms
        copy into (copyMemory): N/A             16.845686 ms

These numbers show a significant improvement in Java 1.7! Direct buffers are roughly about as fast as regular arrays, which is what I had hoped to see originally. The “custom buffers” section is a hand-rolled integer buffer class that uses Unsafe.getInt/putInt without much of the additional nio buffer machinery or abstractions, to see how much that was contributing to overhead. It’s noticable in Java 1.6, but in Java 1.7 the original nio buffers win handily, even against “unchecked” versions of get/put that don’t do any bounds checking. I also added heap (non-direct) buffers, to see if there was any truth to a claim I read regarding mixing direct and non-direct buffers causing an overall slowdown, because then there would be two implementations of the abstract parent class, and the VM couldn’t optimize the virtual calls. That doesn’t seem to be the case any more — the JIT doesn’t care.

But, I am now very confused why the original benchmark code and the new code give such different results. The normal int[] ut is down to 42ms, slower than the 31ms in in the original benchmark, and slower still than the 27ms that the same benchmark gets in Java 1.6. The other numbers are all much better though — compare, for example, direct buffer absolute “get” performance — 119ms in the first benchmark, 19ms in the second. This is a 6x speed difference. The same compiler and JVM are used for both. I even added a ‘mixed’ set to the new benchmark, that does the operations in the exact same order as the first one (interlaving operations on arrays and int buffers), and it didn’t matter.

The new benchmark numbers are really encouraging, and mean that we’re going to probably push the nio buffers into many places, simplifying our interaction both with IO, OpenGL, algorithms implemented in JNI, etc. as well as letting us move the bulk of our large data out of the Java heap. However, I’d like to understand why the two benchmarks give such vastly different performance results. I’ve stared at the source for a while, and I’m virtually certain that they’re doing the same operations, on identically-sized arrays. Can someone explain the overall slowness of the first benchmark? Why didn’t the numbers change hardly at all between Java 1.6 and 1.7? Why are the 1.6 numbers in the second benchmark slower than the 1.6 numbers in the first?


2 Comments to “Looking at Java NIO Buffer performance”  

  1. 1 Osvaldo Doederlein

    The original benchmark runs way too fast, so I guess it’s just too short to let the JIT do its job. So I only spent real time with the newer bench. First off, have you tested with HotSpot Server or Client? Always an essential information in any Java benchmarking report :) as well as any other relevant VM switches (32-bit vs. 64-bit too). I suppose it’s Client, 32-bit, no other switches. Maybe you could test with JDK 7u2-b07 and 6u29-b08, these have updated builds of HotSpot.

    The most important factor for this benchmark is intrinsic optimization of DirectBuffer methods such as get*() and put*(). HotSpot Server fully optimizes this, meaning that it generates inline code for these operations instead of making JNI invocations to native libraries (bad not only because of JNI overhead, but because the invoked code cannot be inlined into the caller). HotSpot Client traditionally sucked on java.nio because it lacked intrinsification of these methods; but I think they are gradually fixing this. With these latst, EA builds, you will find that JDK 6 and 7 have basically the same performance. Comparing Client to Server, the Server VM still has a significant lead in several tests, but the gap is closing.

  2. 2 vladimir

    Actually it is Server, 64-bit; I thought I mentioned that, but it got lost in an edit :-) Specifically:

    Java(TM) SE Runtime Environment (build 1.7.0-b147)
    Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode)

    The (even odder) odd thing is that even if I change the second, currently longer, benchmark to do nothing but the exact same timings as the first, it’s still much faster. (Just completely deleting the other tests in the source file.) I’ve been looking at the disassembly and nothing was jumping out at me… the only thing I could think of is that the codegen and memory alloc just happens to get lucky and generates better aligned code/ops, but that’s too big of a perf difference even with that. I can force the interpreter and both benchmarks get significantly slower, as expected; hotspot is definitely kicking in. (I can also see that if I trace method compilation etc.)

    I’ll try the early access version of 7u2, though; but were you able to reproduce the speed difference between the two with the same VM?