Monday, November 12, 2018

Shortest Code and Lowest Latency

Shortest Code and Lowest Latency

Who can write the shortest Java code with the lowest latency, and what tools are used?

At Oracle Code One two weeks ago, I promoted a code challenge during my speech. The contestants were given a specific problem and the winner would be the one with lowest possible latency multiplied with the number of code lines that was used  (i.e. having low latency and at the same time using as few lines as possible is good. I also shared the contest on social media to get as many developers as possible involved.

The input data was to be taken from the table “film” in the open-source Sakila database.

More specifically, the object was to develop a Java application that computes the sum, min, max, and average rental duration for five films out of the existing 1,000 films using a general solution. The five films should be the films around the median film length starting from the 498:th and ending with the 502:th film (inclusive) in ascending film length.

Contestants were free to use any Java library and any solution available such as Hibernate, JDBC or other ORM tools.

A Solution Based on SQL/JDBC

One way of solving the problem would be using SQL/JDBC directly as shown hereunder. Here is an example of SQL code that solves the computational part of the problem:

SELECT 
  sum(rental_duration),
  min(rental_duration),
  max(rental_duration),
  avg(rental_duration)
FROM 
  (select rental_duration from sakila.film 
  order by length 
  LIMIT 5 OFFSET 498) as A

If I run it on my laptop, I get a latency of circa 790 microseconds on the server side (standard MySQL 5.7.16). To use this solution we also need to add Java code to issue the SQL statement and to read the values back from JDBC into our Java variables. This means that the code will be even larger and will take a longer time to execute as shown hereunder:

try (Connection con = DriverManager
                       .getConnection("jdbc:mysql://somehost/sakila?"
                            + "user=sakila-user&password=sakila-password")) {

        try (Statement statement = con.createStatement()) {

            ResultSet resultSet = statement
                .executeQuery(
                    "SELECT " +
                        "  sum(rental_duration)," +
                        "  min(rental_duration)," +
                        "  max(rental_duration)," +
                        "  avg(rental_duration)" +
                        "FROM " +
                        "  (select rental_duration from sakila.film " +
                        "  order by length " +
                        "limit 5 offset 498) as A");

            if (resultSet.next()) {
                int sum = resultSet.getInt(1);
                int min = resultSet.getInt(2);
                int max = resultSet.getInt(3);
                double avg = resultSet.getDouble(4);
                // Handle the result
            } else {
                // Handle error
            }
      }
}

To give this alternative a fair chance, I reused the connection between calls in the benchmark rather than re-creating it each time (recreation is shown above but was not used in benchmarks).

Result: ~1,000 us and ~25 code lines

The Winning Contribution

However, the SQL example above was without a chance compared to the winning contribution. The winner was Sergejus Sosunovas (@SergejusS)  from Switzerland who currently works with developing an optimization and management system. He used Speedment in-JVM-memory acceleration and states: "It took less than an hour to start and build a solution." Here is the winning code:

IntSummaryStatistics result = app.getOrThrow(FilmManager.class).stream()
     .sorted(Film.LENGTH)
     .skip(498)
     .limit(5)
     .mapToInt(GeneratedFilm::getRentalDuration)
     .summaryStatistics();

This was much faster than SQL/JDBC and completed in as little as 6 microseconds.

Result: 6 us and 6 code lines

A Solution with Only Five Lines

One of the contestants, Corrado Lombard from Italy, is worth an honorary mention since he was able to solve the problem in only five lines. Unfortunately, there was a slight error in his original contribution, but when fixed, the solution looked like this:

IntSummaryStatistics result = app.getOrThrow(FilmManager.class).stream()
    .sorted(Film.LENGTH)
    .skip(498)
    .limit(5)
    .collect(summarizingInt(Film::getRentalDuration));

This fixed solution had about the same performance as the winning solution.

Optimized Speedment Solution

As a matter of fact, there is a way of improving latency even more than the winning contribution. By applying an IntFunction that is able to do in-place-deserialization (in situ) of the int value directly from RAM, improves performance even more. In-place-deserialization means that we do not have to deserialize the entire entity, but just extract the parts of it that are needed. This saves time, especially for entities with many columns. Here is how an optimized solution could look like:

IntSummaryStatistics result = app.getOrThrow(FilmManager.class).stream()
     .sorted(Film.LENGTH)
     .skip(498)
     .limit(5)
     .mapToInt(Film.RENTAL_DURATION.asInt()) // Use in-place-deserialization
     .summaryStatistics();

This was even faster and completed in a just 3 microsecond.

Result: 3 us and 6 code lines

GraalVM and Optimized Speedment Solution

GraalVM contains an improved C2 Compiler that is known to improve stream performance for many workloads. In particular, it seems like the benefits from inlining and escape-analysis are much better under Graal than under the normal OpenJDK.

I was curious to see how the optimized solution above could benefit from GraalVM. With no code change, I run it under GraalVM (1.0.0-rc9) and now latency was down to just 1 microsecond! This means that we could perform 1,000,000 such queries per second per thread on a laptop and presumably much more on a server grade computer.

Result: 1 us and 6 code lines

Overview

When we compare SQL/JDBC latency against the best Speedment solution, the speedup factor with Speedment was about 1,000 times. This is the same difference as comparing walking to work or taking the worlds fastest manned jet plane (SR-71 "Blackbird"). A huge improvement if you want to streamline your code.

To be able to plot the solutions in a diagram, I have removed the comparatively slow SQL/JDBC solution, and only showed how the different Speedment solutions and runtimes measure up:


Try it Out

The competition is over. However, feel free to challenge the solutions above or try them out for yourself. Full details of the competition and rules can be found here. If you can beat the fastest solution, let me know in the comments below.

Download Sakila database.

Download Speedment

Benchmark Notes

The benchmark results presented above were obtained when running on my MacBook Pro Mid 2015, 2.2 GHz Intel Core i7, 16 GB 1600 MHz DDR3, Java 8, JMH (1.21) and Speedment (3.1.8)

Conclusion

It is possible to reduce latencies by orders of magnitude and reduce code size at the same time using in-JVM-memory technology and Java streams.

GraalVM can improve your stream performance significantly under many conditions.

2 comments:

  1. Well, this just shows that connection pooling is a good thing. I believe it is implicitly done in speedment, so it's not fair to compare it to a bruteforce JDBC solution. Comparisons against the commonly used Spring JDBC template + HikariCP would be more interesting AND fair.

    ReplyDelete
    Replies
    1. I have actually re-used the same connection for all benchmark calls. So, connection pooling would not provide any additional performance increase. I wrote some texta about this just before the test results for SQL/JDBC.

      You are right that connection pooling is implicit in Speedment for database access. For in-JVM-acceleration with Speedment, connections are bypassed altogether.

      Delete

Note: Only a member of this blog may post a comment.