Iteration Performance
There are many views on how to iterate with high performance. The traditional way of iterating in Java has been a for-loop starting at zero and then counting up to some pre-defined number:private static final int ITERATIONS = 10_000; @Benchmark public int forUp() { int sum = 0; for (int i = 0; i < ITERATIONS; i++) { sum += i; } return sum; }Sometimes, we come across a for-loop that starts with a predetermined non-negative value and then it counts down instead. This is fairly common within the JDK itself, for example in the class
String
. Here is an example of solving the previous problem by counting down instead of up.@Benchmark public int forDown() { int sum = 0; for (int i = ITERATIONS; i-- > 0;) { sum += i; } return sum; }
I think the rationale here is that checking how values relate to zero is potentially more efficient than testing how values relate to any other arbitrary value. In fact, all CPUs that I know about have machine code instructions that can check how a given value relates to zero. Another idea is that the count down idiom given above appears to only inspect the loop variable one time (it simultaneously checks the value and then decreases it) as opposed to the regular example at the top. I suspect that this has little or no influence on today’s efficient JIT compiler who will be able to optimize the first iteration just as good as the second. It might have an impact when the code runs in interpretation mode but this is not examined in this article.
Another way of doing the same thing using an
IntStream
looks like this:@Benchmark public int stream() { return IntStream.range(0, ITERATIONS) .sum(); }
If more performance is needed for large iterations, it is relatively easy to make the stream parallel by just adding a
.parallel()
operator to the stream. This is not examined in this article.Performance under Graal VM
Running these tests under GraalVM (rc-11, with the new C2 compiler that ships with GraallVM) on my laptop (MacBook Pro mid 2015, 2.2 GHz Intel Core i7) gives the following:Benchmark Mode Cnt Score Error Units ForBenchmark.forDown thrpt 5 311419.166 ± 4201.724 ops/s ForBenchmark.forUp thrpt 5 309598.916 ± 12998.579 ops/s ForBenchmark.stream thrpt 5 312360.089 ± 8291.792 ops/s
It might come as a surprise for some that the stream solution is the fastest one, albeit by a margin that is well within error margins.
In a previous article, I presented some code metric advantages with streams and Declarative programming compared to traditional Imperative code. I have not tested performance for cold code sections (i.e. before the JIT kicks in).
Clever Math
From math, we recall that the sum of consecutive numbers starting at zero is N*(N+1)/2 where N is the highest number in the series. Running this benchmark:@Benchmark public int math() { return ITERATIONS * (ITERATIONS + 1) / 2; }
gives us a performance increase of over 1,000 times over the previous implementations:
Benchmark Mode Cnt Score Error Units ForBenchmark.math thrpt 5 395561077.984 ± 11138012.141 ops/s
The more iterations, the more gain. Cleverness sometimes trumps brute force.
Ultra-fast Data Streams
With Speedment HyperStream, it is possible to get similar performance with data from databases. Read more here on HyperStream.Conclusions
On some commonly used hardware/JVMs, it does not matter if we iterate upwards or downwards in our for-loops. More modern JVMs are able to optimize stream iterations so they have equivalent or even better performance than for-loops.Stream code is generally more readable compared to for-loops in my opinion and so, I believe streams are likely to be the de facto iteration contrivance in some future.
Database content can be streamed with high performance using Speedment HyperStream.
Wouldn't it be better to avoid compiler optimisations to get more accurate results and use the techniques here (http://tutorials.jenkov.com/java-performance/jmh.html) ?
ReplyDeleteThis is how I used it in practice https://github.com/pgilad/java-micro-benchmark-example/blob/master/src/main/java/org/sample/ArraySlicingBenchmark.java
Hi. Not sure I understand your comment. I am using JMH and I do return the result of the computation to avoid dead code elimination. So, I believe the results are accurate and representative for the different variants.
DeleteIn your example, you use blackhole to consume the result which is another way of eliminating dead code elimination.