While starting to convert much of our IO to using nio byte buffers, with an eventual goal in pushing that further up into the application, I decided to investigate in some more detail performance. I’d seen some blog posts that claimed that performance wasn’t great, in particular a very old blog post from 2004. That post included a simple benchmark, which I grabbed, converted to use Int buffers, dropped the count to 10,000,000 int values, and ran it. The source is available as niotest.java. The results weren’t encouraging:
Java Version 1.6 1.7 array put 26 ms 31 ms absolute put 129 ms 130 ms relative put 130 ms 132 ms array get 20 ms 19 ms absolute get 116 ms 119 ms relative get 130 ms 137 ms
As part of that, I decided to clean up the benchmark code and put everything together in a nice package. The source for the new benchmark is ArrayBenchmark.java. Like the original, it works on arrays/buffers of 10,000,000 integers, first writing each element (with just its index) and then reading each element in the get operation. The additional “copy into” benchmarks time how long it takes to copy all the ints into an existing int array. Here are the results:
Java 1.6 Java 1.7 ===== native java int array put: 27.971961 ms 42.464894 ms get: 32.949032 ms 14.826696 ms copy into: 20.069191 ms 15.853778 ms ===== nio heap buffers put: 839.730766 ms 57.876372 ms put (relative): 844.618171 ms 80.951102 ms get: 742.287840 ms 80.578592 ms get (relative): 759.317101 ms 79.563458 ms copy into: 769.494235 ms 91.685437 ms ===== nio direct buffers put: 161.480338 ms 31.951206 ms put (relative): 170.194344 ms 47.541457 ms get: 179.621322 ms 18.913808 ms get (relative): 164.425689 ms 29.387186 ms copy into: 21.940450 ms 16.936357 ms ===== custom buffers put: 151.095845 ms 48.125012 ms put unchecked: 148.538301 ms 51.241096 ms get: 146.243837 ms 36.636723 ms get unchecked: 138.765277 ms 31.641897 ms copy into: 41.643050 ms 20.206091 ms copy into (copyMemory): N/A 16.845686 ms
These numbers show a significant improvement in Java 1.7! Direct buffers are roughly about as fast as regular arrays, which is what I had hoped to see originally. The “custom buffers” section is a hand-rolled integer buffer class that uses Unsafe.getInt/putInt without much of the additional nio buffer machinery or abstractions, to see how much that was contributing to overhead. It’s noticable in Java 1.6, but in Java 1.7 the original nio buffers win handily, even against “unchecked” versions of get/put that don’t do any bounds checking. I also added heap (non-direct) buffers, to see if there was any truth to a claim I read regarding mixing direct and non-direct buffers causing an overall slowdown, because then there would be two implementations of the abstract parent class, and the VM couldn’t optimize the virtual calls. That doesn’t seem to be the case any more — the JIT doesn’t care.
But, I am now very confused why the original benchmark code and the new code give such different results. The normal int ut is down to 42ms, slower than the 31ms in in the original benchmark, and slower still than the 27ms that the same benchmark gets in Java 1.6. The other numbers are all much better though — compare, for example, direct buffer absolute “get” performance — 119ms in the first benchmark, 19ms in the second. This is a 6x speed difference. The same compiler and JVM are used for both. I even added a ‘mixed’ set to the new benchmark, that does the operations in the exact same order as the first one (interlaving operations on arrays and int buffers), and it didn’t matter.
The new benchmark numbers are really encouraging, and mean that we’re going to probably push the nio buffers into many places, simplifying our interaction both with IO, OpenGL, algorithms implemented in JNI, etc. as well as letting us move the bulk of our large data out of the Java heap. However, I’d like to understand why the two benchmarks give such vastly different performance results. I’ve stared at the source for a while, and I’m virtually certain that they’re doing the same operations, on identically-sized arrays. Can someone explain the overall slowness of the first benchmark? Why didn’t the numbers change hardly at all between Java 1.6 and 1.7? Why are the 1.6 numbers in the second benchmark slower than the 1.6 numbers in the first?