Minborg

Minborg
Minborg

Monday, September 2, 2019

Java Performance: For-looping vs. Streaming

Is counting upwards or downwards in a for-loop the most efficient way of iterating? Sometimes the answer is neither. Read this post and understand the impact of different iteration varieties.

Iteration Performance

There are many views on how to iterate with high performance. The traditional way of iterating in Java has been a for-loop starting at zero and then counting up to some pre-defined number:

private static final int ITERATIONS = 10_000;

@Benchmark
public int forUp() {
    int sum = 0;
    for (int i = 0; i < ITERATIONS; i++) {
        sum += i;
    }
    return sum;
}
Sometimes, we come across a for-loop that starts with a predetermined non-negative value and then it counts down instead. This is fairly common within the JDK itself, for example in the class String. Here is an example of solving the previous problem by counting down instead of up.

@Benchmark
public int forDown() {
    int sum = 0;
    for (int i = ITERATIONS; i-- > 0;) {
        sum += i;
    }
    return sum;
}

I think the rationale here is that checking how values relate to zero is potentially more efficient than testing how values relate to any other arbitrary value. In fact, all CPUs that I know about have machine code instructions that can check how a given value relates to zero. Another idea is that the count down idiom given above appears to only inspect the loop variable one time (it simultaneously checks the value and then decreases it) as opposed to the regular example at the top. I suspect that this has little or no influence on today’s efficient JIT compiler who will be able to optimize the first iteration just as good as the second. It might have an impact when the code runs in interpretation mode but this is not examined in this article.

Another way of doing the same thing using an IntStream looks like this:

@Benchmark
public int stream() {
    return IntStream.range(0, ITERATIONS)
        .sum();
}

If more performance is needed for large iterations, it is relatively easy to make the stream parallel by just adding a .parallel() operator to the stream. This is not examined in this article.

Performance under Graal VM

Running these tests under GraalVM (rc-11, with the new C2 compiler that ships with GraallVM) on my laptop (MacBook Pro mid 2015, 2.2 GHz Intel Core i7) gives the following:

Benchmark              Mode  Cnt       Score       Error  Units
ForBenchmark.forDown  thrpt    5  311419.166 ±  4201.724  ops/s
ForBenchmark.forUp    thrpt    5  309598.916 ± 12998.579  ops/s
ForBenchmark.stream   thrpt    5  312360.089 ±  8291.792  ops/s

It might come as a surprise for some that the stream solution is the fastest one, albeit by a margin that is well within error margins.

In a previous article, I presented some code metric advantages with streams and Declarative programming compared to traditional Imperative code. I have not tested performance for cold code sections (i.e. before the JIT kicks in).

Clever Math

From math, we recall that the sum of consecutive numbers starting at zero is N*(N+1)/2 where N is the highest number in the series. Running this benchmark:

@Benchmark
public int math() {
    return ITERATIONS * (ITERATIONS + 1) / 2;
}

gives us a performance increase of over 1,000 times over the previous implementations:

Benchmark           Mode  Cnt          Score          Error  Units
ForBenchmark.math  thrpt    5  395561077.984 ± 11138012.141  ops/s

The more iterations, the more gain. Cleverness sometimes trumps brute force.

Ultra-fast Data Streams

With Speedment HyperStream, it is possible to get similar performance with data from databases. Read more here on HyperStream.

Conclusions

On some commonly used hardware/JVMs, it does not matter if we iterate upwards or downwards in our for-loops. More modern JVMs are able to optimize stream iterations so they have equivalent or even better performance than for-loops.

Stream code is generally more readable compared to for-loops in my opinion and so, I believe streams are likely to be the de facto iteration contrivance in some future.

Database content can be streamed with high performance using Speedment HyperStream.

2 comments:

  1. Wouldn't it be better to avoid compiler optimisations to get more accurate results and use the techniques here (http://tutorials.jenkov.com/java-performance/jmh.html) ?
    This is how I used it in practice https://github.com/pgilad/java-micro-benchmark-example/blob/master/src/main/java/org/sample/ArraySlicingBenchmark.java

    ReplyDelete
    Replies
    1. Hi. Not sure I understand your comment. I am using JMH and I do return the result of the computation to avoid dead code elimination. So, I believe the results are accurate and representative for the different variants.

      In your example, you use blackhole to consume the result which is another way of eliminating dead code elimination.

      Delete

Note: Only a member of this blog may post a comment.