Minborg's Java Pot: September 2018

Tuesday, September 25, 2018

Java: GraalVM Database Stream Performance

GraalVM is the new kid on the JVM block. It is an open-source Virtual Machine that is able to run many programming languages, such as Java, Rust and JavaScript, at the same time. GraalVM has also a new internal code optimizer pipeline that can improve performance significantly compared to other JVMs under some conditions. Learn how to reap the benefits of GraalVM and execute your code faster with no code modification.

What is GraalVM?

Previous JVMs, such as Oracle JVM and OpenJDK JVM (both called “HotSpot”), has been around for a long time. They have evolved considerably over time and, over the course of the decades, we have seen performance rocketed compared to the Java 1.0 JVM. Significant JVM improvements include just-in-time compiling (JIT), C2 compiler, escape analysis etc. that all contributed to this positive development. But as with all technology, they will start to plateau at some point.

GraalVM is a fresh start whereby a new internal architecture has been developed from the ground up. In particular, the JIT compiler, called Gaal, has been reworked. Unsurprisingly, the JIT compiler itself is written in Java, just like all the other GraalVM components. As it turns out, Graal is sometimes able to optimize your code better than some existing JVMs. In particular, some Stream types appear to benefit from running under Graal.

Database Stream Performance

There are a number of ways to write Java streams. The most obvious way is to use one of the built-in Java functions Stream::of or Collection::stream methods. These methods, however, requires that the elements in the Stream are present a-priori in the shape of Java objects. This means that the compiler cannot optimize them away under most conditions.

I have therefore instead chosen to use the stream based ORM tool Speedment. This tool works with a technology that pulls in database content into an in-JVM-memory snapshot and creates Java streams directly from RAM. Thus, database tables are stored off-heap, thereby potentially avoiding the creation of Java Objects. Because Graal has an improved performance optimization pipeline, it is likely that it can better optimize away temporarily intermediary stream objects. In theory, Speedment and Graal would, therefore, be a perfect fit. I was therefore very eager to test how the already extreme performance of Speedement would be affected when running under GraalVM rather than running under HotSpot.

The following Speedment database streams were used to test performance. Read more on these streams and how they work in one of my previous article that you can find here.

private static final Predicate RATING_EQUALS_PG_13 =
    Film.RATING.equal(GeneratedFilm.Rating.PG13);

private static final Comparator LENGTH_DESCENDING = 
    Film.LENGTH.reversed();

@Benchmark
public long filterAndCount() {
    return films.stream()
        .filter(RATING_EQUALS_PG_13)
        .count();
}

@Benchmark
public IntSummaryStatistics Complex() {
    return films.stream()
        .sorted(LENGTH_DESCENDING)
        .skip(745)
        .limit(5)
        .mapToInt(Film.RENTAL_DURATION.asInt())
        .summaryStatistics();
}

The following JMH output was obtained for runs under GraalVM and HotSpot respectively:

Graal:
Benchmark              Mode  Cnt         Score        Error  Units
Bench.Complex         thrpt    5   8453285.715 ± 383634.200  ops/s
Bench.filterAndCount  thrpt    5  29755350.558 ± 674240.743  ops/s

HotSpot:
Benchmark              Mode  Cnt         Score        Error  Units
Bench.Complex         thrpt    5   5334041.755 ± 176368.317  ops/s
Bench.filterAndCount  thrpt    5  20809826.960 ± 963757.357  ops/s

Being able to produce and consume over 30 million database streams per second with GraalVM/Speedment on a laptop with 4 CPU cores is quite astonishing. Imagine the performance on a server grade node with 24 or 32 CPU cores.

Here is how it looks in a chart (higher is better):

Ordinary Stream Performance

Initial tests show varying relative performance figures for built-in Java streams like Stream.of(“A”, “B”, “C”) or List::stream with various operations applied, for the different JVMs. I expect also these stream types to gain performance across the board once GraalVM has matured. Perhaps I will cover this in a future article.

Setup

The following JMH setup was used for GraalVM and HotSpot:

# Detecting actual CPU count: 8 detected
# JMH version: 1.21
# VM version: JDK 1.8.0_172, GraalVM 1.0.0-rc6, 25.71-b01-internal-jvmci-0.48
# *** WARNING: JMH support for this VM is experimental. Be extra careful with the produced data.
# VM invoker: /Applications/graalvm-ce-1.0.0-rc6/Contents/Home/jre/bin/java
# VM options: -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 8 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time


# Detecting actual CPU count: 8 detected
# JMH version: 1.21
# VM version: JDK 1.8.0_171, Java HotSpot(TM) 64-Bit Server VM, 25.171-b11
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/jre/bin/java
# VM options: -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 8 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time

The tests above were performed on a MacBook Pro (Retina, 15-inch, Mid 2015), 2.2 GHz Intel Core i7, 16 GB 1600 MHz DDR3 with 4 CPU cores and 8 threads. As can be seen in the logs, we should be careful to draw conclusions using JMH figures for Graal as the JMH support is experimental at this time.

Give it a Spin

Use the Speedment initializer to create a Speedment project template here.

Download the latest version of GraalVM here.

The source code for the benchmarks is available here.

Feel free to reproduce the performance tests on another hardware platform and report the outcome in the comments below.

Conclusions

GraalVM seams to be a promising technology that can improve performance for certain Java stream types.

GraalVM in combination with Speedment’s in-JVM-memory acceleration can enable significant stream performance for data analytic applications.

Monday, September 24, 2018

Debugging Java Streams with IntelliJ

Streams are very powerful and can capture the gist of your intended functionality in just a few lines. But, just as smooth as they are when it all works, just as agonizing it can be when they don’t behave as expected. Learn how to use IntelliJ to debug your Java Streams and gain insight into the intermediate operations of a Stream.

In this article, I will use the Sakila sample database and Speedment Stream ORM in my examples.

The Principle

Let’s start with a simple Stream that we can use to establish the fundamentals of the Stream debugger in IntelliJ:

List<String> strings = Stream.of("C", "A", "B")
    .sorted()
    .collect(toList());

The code above first creates an initial Stream consisting of the String elements "C", "A", "B". Then, an intermediary operation sorted() is applied to the first Stream, thereby (at least in Java 8-10) creating a new Stream where the elements in the initial stream are sorted according to their natural order. I.e. the second stream will contain the elements "A", "B", "C". Lastly, these elements are collected into a List.

The code above is equivalent to:

Stream<String> s0 = Stream.of("C", "B", "A"); // "C", "A", "B"
Stream<String> s1 = s0.sorted();              // "A", "B", "C"
List<String> strings = s1.collect(toList());  // [“A”, “B”, “C”]

This is basically how the Stream debugger operates. It breaks up a stream pipeline into smaller segments and progressively invokes the different intermediate operators while retaining the elements for each step analyzed:

Stream.of("C", "B", "A")
  .peek(saveStep(0))
  .sorted()
  .peek(saveStep(1))
  .collect(toList()); // The final result is saved to step 2

NB: This is not exactly how it works technically, but it provides a good overall outline.

Visually, it looks like this in IntelliJ’s debugger:

This gives a clear and concise view of what is going on internally in the Stream pipeline between each intermediate operation and also shows the final result of the Stream.

Invocation

The stream debugger is invoked by first setting a breakpoint where a Stream is defined:

Then, start a debug session:

When the breakpoint is hit, the Stream debugger can be invoked by pressing its designated (and arguably somewhat concealed) button as indicated by the red circle below:

This will pull up the stream debugger as shown previously in the article.

Database Streams

I will use the stream ORM Speedment that allows databases to be queried using standard Java Streams and thus, these streams can also be debugged with IntelliJ. A Speedment project can be set up using the Speedment initializer.

The Java application itself can be set up like this:

Speedment app = new SakilaApplicationBuilder()
    .withPassword("sakila-password") // Replace with your own password
    .build();

FilmManager films = app.getOrThrow(FilmManager.class);

Now, we can stream the database table “film”. For example like this:

List<Film> map = films.stream()
    .filter(Film.LENGTH.equal(60))
    .sorted(Film.RATING.reversed())
    .collect(toList());

This will filter out all Film objects with a length equal to 60 minutes, then sort those Film objects according to the Film.RATING (descending) and then collect these elements into a List.

When we invoke the Stream debugger, we will see the following:

As can be seen, there are 1,000 films in the initial stream. After the filter operator, just 8 films remain which are subsequently sorted and then collected to a List.

Compute Statistics

Suppose we want to compute the min, max and average length of all films rated PG-13. This can be done like this:

IntSummaryStatistics stat = films.stream()
    .filter(Film.RATING.equal("PG-13"))
    .mapToInt(Film.LENGTH.asInt())
    .summaryStatistics();

And looks like this in the Stream debugger:

As can be seen, it is possible to interact with the Stream debugger and click on elements whereby their path in the stream pipeline is highlighted. It is also possible to scroll among the elements for individual steps.

Speedment normally optimizes away intermediary operations in a database Stream and merges these steps into the SQL query. However, when the Stream debugger is used, no such optimization takes place and we are able to see all steps in the stream pipeline.

Conclusions

The Stream debugger is a hidden gem that can be of significant help when working with Streams.

I think the IntelliJ team has come up with a really good feature.

Download Speedment here. Download IntelliJ here.

Thursday, September 20, 2018

Composition in Java will be Simplified with New JEP Draft

Favor Composition over Inheritance

The mantra "Favor Composition over Inheritance" has, with good reasons, been repeated many times in the literature. However, there is little or no language support in Java to simplify the composition of objects. However, with a new JEP draft named "Concise Method Bodies", the situation might improve slightly.

Brian Goetz is responsible for the JEP draft which likely will be handled under project "Amber". The complete draft can be found here.

Concise Method Bodies

The JEP, when implemented, allows for something called Concise Method Bodies (CMB) whereby, loosely speaking, a method body can be a lambda or a method reference. Here is one example:

Old Style:

int length(String s) {
  return s.length();
}

New CMB:

int length(String s) -> s.length();     //  -> is "single expression form"

or alternately simply:

int length(String s) = String::length;  //  = is "method reference form"

This will reduce boilerplate coding while improving code readability.

Composition

Consider the existing Java class Collections.UnmodifiableList which delegates an inner List class and prevents it from being modified (code shortened and reordered for readability):

  static class UnmodifiableList<E> extends UnmodifiableCollection<E>
                                  implements List<E> {

        final List<? extends E> list;

        UnmodifiableList(List<? extends E> list) {
            super(list);
            this.list = list;
        }

        public boolean equals(Object o) {return o == this || list.equals(o);}
        public int hashCode()           {return list.hashCode();}

        public E get(int index) {return list.get(index);}
        public int indexOf(Object o)            {return list.indexOf(o);}
        public int lastIndexOf(Object o)        {return list.lastIndexOf(o);}
        public E set(int index, E element) {
            throw new UnsupportedOperationException();
        }

With CMB, it can be implemented like this:

 static class UnmodifiableList<E> extends UnmodifiableCollection<E>
                                  implements List<E> {

        final List<? extends E> list;

        UnmodifiableList(List<? extends E> list) {
            super(list);
            this.list = list;
        }

        public boolean equals(Object o) = list::equals;
        public int hashCode()           = list::hashCode;
        public E get(int index)         = list::get;
        public int indexOf(Object o)    = list::indexOf;
        public int lastIndexOf(Object o)= list::lastIndexOf;
        public E set(int index, E element) {
            throw new UnsupportedOperationException();
        }

I think this feature would make sense. It is especially useful when delegating methods with one or several parameters.

Monday, September 17, 2018

Ultra-Low Latency Querying with Java Streams and In-JVM-Memory

Fundamental rules of nature, such as the speed of light and general information theory, set significant limits on the maximum performance we can obtain from traditional system architectures. Learn how you, as a Java developer, can improve performance by orders of magnitude using in-JVM-technology and Java Streams.

If, for example, the application server and the database server are located 100 m apart (about 330 feet), then the round trip delay imposed by the speed of light is slightly north of 600 ns. More importantly, due to TCP/IP protocol handling, a single packet round-trip delay on a 10 GBit/s connection can hardly be optimized down to less than 25 us (=25,000 ns) despite resorting to black belt tricks such as custom kernel builds, busy polling and CPU affinity.

In this article, I will show how we can create Java Streams directly from RAM using in-JVM-memory technology. We will use the Stream-based Java ORM named Speedment that can perform data analytics using standard java.util.stream.Stream objects and how some of these streams can be created and completed in under 200 ns which, surprisingly, is only about two times the latency of a CPU accessing 64-bit main memory.

200 ns is more than 125 times faster than the theoretical minimum latency from a remote database (100 m) whose internal processing delay is zero and where a single TCP packet can convey both the query and the response. In real time scenarios, databases’ internal processing delay is never zero and both queries and results are often sent in several TCP packages. So, the speedup factor could be 1,000 times or much more in many cases.

The Database

In the examples below, we are using data from the Sakila database content for MySQL. Sakila is an example database that models a movie rental store. It has tables called Film, Actor, Category and so on and it can be downloaded for free here. It should be noted that this is a small database but, as it turns out, many of the Speedment stream operations are O(1) or O(log(N) in terms of complexity, thereby ensuring the same speed regardless how big or small the data sets are.

Step 1: Create the project

First, we need to configure our pom.xml-file to use the latest Speedment dependencies and Maven plugin. The fastest way to do this is to generate a pom.xml-file using the Speedment Initializer that you can find here. First, choose the database type “MySQL” and make sure the “In-memory Acceleration” is enabled and then press “download”, and you will get an entire project folder with a Main.java-file generated automatically for you.

Next, unpack the project folder zip file, open a command line, go to the unpacked folder (where the pom.xml file is) and enter the following command:

mvn speedment:tool

Next, connect to the database and get started:

Step 2: Generate Code

When the schema data has been loaded from the database, the complete Java domain model can be generated by pressing the “Generate” button.

Step 3: Write the Application Code

In order to work with Speedment, you first need to create a Speedment instance. This can be done by using a builder that was automatically generated together with the domain model in step 2. Open the Main.java file and replace the code in the main() method with this snippet:

Speedment app = new SakilaApplicationBuilder()
    // Replace this with your own password
    .withPassword("sakila-password")
    // Enable in-JVM-memory acceleration
    // By just commenting away this line, we can disable acceleration
    .withBundle(InMemoryBundle.class)
    .build();

    // Load data from database into a snapshot view if
    // we have installed In-JVM-Acceleration
    app.get(DataStoreComponent.class)
        .ifPresent(DataStoreComponent::load);

As a demonstration of basic functionality, we will first write an application that just prints out all films:

// Obtains a FilmManager that allows us to
// work with the "film" table
FilmManager films = app.getOrThrow(FilmManager.class);

// Create a stream of films and print
// each and every film
films.stream()
    .forEach(System.out::println);

The code above will produce the following output (shortened for brevity):

FilmImpl { filmId = 1, title = ACADEMY DINOSAUR, …, length = 86, ... }
FilmImpl { filmId = 2, title = ACE GOLDFINGER, ..., length = 48, ...}
FilmImpl { filmId = 3, title = ADAPTATION HOLES, ..., length = 50, ...}
...

Step 3: Using Filters

Speedment streams support all stream operations including filters. Suppose we want to filter out only those films that are longer than 60 minutes and count how many occurrences we have. This can be accomplished like this:

films.stream()
  .filter(Film.LENGTH.greaterThan(60))
  .count();

System.out.format("There are %,d films longer than 60 minutes.", count);

This will produce the following output:

There are 896 films longer than 60 minutes

Any number of filters can be applied to a stream and the predicate supplied to a filter() method can be composed using and() / or() operators.

Step 4: Setting up JMH

So far, we have not seen any performance figures. We are going to use JMH for benchmarking in this article. JMH is a Java harness for building, running, and analyzing benchmarks written in Java and other languages targeting the JVM.

There are two stream types we are going to use for performance measurements:

A fairly simple stream where we count the films that has a rating equal to PG-13 called “Filter And Count”

A more complex stream where we sort all the films in LENGTH order (descending), then we skip the first 745 films and then process the following 5 films whereby we extract the rental duration from those five films and finally we compute statistics on these integers (i.e. min, max, and average values). This type is called “Complex”.

The following code extract shows the benchmarks we are about to run:

private static final Predicate RATING_EQUALS_PG_13 = 
    Film.RATING.equal(Rating.PG13);

private static final Comparator LENGTH_DESCENDING =
    Film.LENGTH.reversed();

@Benchmark
public long filterAndCount() {
    return films.stream()
       .filter(RATING_EQUALS_PG_13)
       .count();
}

@Benchmark
public IntSummaryStatistics complex() {
    return films.stream()
        .sorted(LENGTH_DESCENDING)
        .skip(745)
        .limit(5)
        .mapToInt(Film.RENTAL_DURATION.asInt())
        .summaryStatistics();
}

The following setup was used for single threaded latency measurements:

# JMH version: 1.21
# VM version: JDK 10, Java HotSpot(TM) 64-Bit Server VM, 10+46
# VM invoker: /Library/Java/JavaVirtualMachines/jdk-10.jdk/Contents/Home/bin/java
# VM options: -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=63173:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.example.Bench.complex

Streams using SQL with a MySQL Database

Running these queries against a standard MySQL database (version 5.7.16) on my laptop (MacBook Pro, mid-2015, 2.2 GHz Intel Core i7, 16 GB RAM) will produced the following output shown below:

SINGLE-THREADED LATENCY (Lower is better)
Benchmark             Mode  Cnt  Score    Error  Units
Bench.complex         avgt    5  0.003 ±  0.001   s/op
Bench.filterAndCount  avgt    5  0.001 ±  0.001   s/op

MULTI-THREADED THROUGHPUT (Higher is better)
Benchmark              Mode  Cnt     Score     Error  Units
Bench.complex         thrpt    5  1714.980 ± 343.655  ops/s
Bench.filterAndCount  thrpt    5  3154.984 ± 318.881  ops/s

Streams Using In-JVM-Memory Acceleration with a MySQL Database

Enabling in-JVM-memory acceleration and again running the same benchmarks on my laptop produced the following result:

SINGLE-THREADED LATENCY (Lower is better)
Benchmark             Mode  Cnt   Score    Error  Units
Bench.complex         avgt    5  ≈ 10⁻⁶            s/op
Bench.filterAndCount  avgt    5  ≈ 10⁻⁷            s/op

MULTI-THREADED THROUGHPUT (Higher is better)
Benchmark              Mode  Cnt         Score         Error  Units
Bench.complex         thrpt    5   4793915.881 ±  374680.158  ops/s
Bench.filterAndCount  thrpt    5  16958800.191 ± 1023015.568  ops/s

Being able to produce and consume almost 17 million streams per second on an old laptop is pretty astonishing. A modern server-grade computer with many CPU-cores will easily be able to produce and consume more than 25 million streams per second.

The JMH time resolution for latency was not sufficient to measure accurate enough. By running a throughput test with one thread and inverting the result, the average Filter And Count latency was estimated to 1/5,564,678 = 180 ns. This more accurate latency estimate gives an estimated performance boost factor of around 5,000 rather than 10,000.

Conclusions

Enabling in-JVM-memory acceleration can improve performance substantially. In the benchmarks above:

Single thread latency was reduced by a factor of:
Complex: ~ 3,000
Filter And Count: ~5,000

Multi-thread throughput was increased by a factor of:
Complex: 2,700
Filter and Count: 5,300

As an illustration, this means that a compound JVM operation with one million subqueries will have its aggregated data latency reduced from 1 h to 1 second.

Notes

For SQL performance, streams were (automatically) rendered to SQL queries. Here is how the rendered Filter And Count SQL query looked like:

SELECT COUNT(*) FROM (
    SELECT 
       `film_id`,`title`,`description`,
       `release_year`, `language_id`,`original_language_id`,
       `rental_duration`,`rental_rate`, `length`,
       `replacement_cost`,`rating`,`special_features`,
       `last_update` 
    FROM
       `sakila`.`film` 
   WHERE 
       (`rating`  = ? COLLATE utf8_bin)
) AS A
, values:[PG-13]

There was an index defined for the rating column.

As can be seen, all counting was done on the database side and the stream did not pull in any unnecessary Film objects from the database into the JMH application.

Source Code

The source code for the benchmarks can be seen here.

Summary

In this article, you have learned how to significantly reduce latencies in your data analytics Java applications and at the same time improve throughput using Speedment Free.

The speedup factors are several orders of magnitude.

Tuesday, September 11, 2018

Query Databases Using Java Streams

Query Databases using Java Streams

In this article, you will learn how you can write pure Java applications, that are able to work with data from an existing database, without writing a single line of SQL (or similar languages like HQL) and without spending hours putting everything together. After your application is ready, you will learn how to accelerate latency performance with a factor of more than 1,000 using in-JVM-acceleration by adding just two lines of code.

Throughout this article, we will use Speedment which is a Java stream ORM that can generate code directly from a database schema and that can automatically render Java Streams directly to SQL allowing you to write code in pure Java.

You will also discover that data access performance can increase significantly by means of an in-JVM-memory technology where Streams are run directly from RAM.

Example Database

We will use an example database from MySQL named Sakila. It has tables called Film, Actor, Category and so on and can be downloaded for free here.

Step 1: Connect to Your Database

We will start to configure the pom.xml file by using the Speedment Initializer that you can find here. Press “download”, and you will get project folder with a Main.java file generated automatically.

Next, unpack the project folder zip file, open a command line, go to the unpacked folder (where the pom.xml file located)

Then, enter the following command:

mvn speedment:tool

This will launch the Speedment tool and prompt you for a license key. Select “Start Free” and you will get a license automatically and for free. Now you can connect to the database and get started:

Step 2: Generate Code

Once the schema data has been loaded from the database, the complete Java domain model can be generated by pressing the “Generate” button.

This will only take a second or two.

Step 3: Write the Application Code

Together with the domain model in step 2, a builder for the Speedment instance was automatically generated. Open the Main.java file and replace the code in the main() method with this snippet:

SakilaApplication app = new SakilaApplicationBuilder()
    .withPassword("sakila-password") // Replace with your own password
    .build();

Next, we will write an application that will print out all films. Admittedly, it’s a small application but we will improve it over the course of this article.

// Obtains a FilmManager that allows us to
// work with the "film" table
FilmManager films = app.getOrThrow(FilmManager.class);

// Create a stream of all films and print
// each and every film
films.stream()
    .forEach(System.out::println);

Isn’t that simple?

When run, the Java stream will be automatically rendered to SQL under the hood. In order to actually see the SQL code rendered, modify our application builder and enable logging using the STREAM log type:

SakilaApplication app = new SakilaApplicationBuilder()
    .withPassword("sakila-password")
    .withLogging(ApplicationBuilder.LogType.STREAM)
    .build();

This is how the SQL code looks like when you run the application:

SELECT 
    `film_id`,`title`,`description`,`release_year`, 
    `language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
    `length`,`replacement_cost`,`rating`,`special_features`,`last_update`
 FROM
     `sakila`.`film`, 
values:[]

The SQL code rendered might differ depending on the database type you have selected (e.g. MySQL, MariaDB, PostgreSQL, Oracle, MS SQL Server, DB2, AS400 etc.). These variations are automatic.

The code above will produce the following output (shortened for brevity):

FilmImpl { filmId = 1, title = ACADEMY DINOSAUR, …, length = 86, ... }
FilmImpl { filmId = 2, title = ACE GOLDFINGER, ..., length = 48, ...}
FilmImpl { filmId = 3, title = ADAPTATION HOLES, ..., length = 50, ...}
...

Step 4: Using Filters

Speedment streams support all Stream operations including filters. Suppose we want to filter out only those films that are longer than 60 minutes. This can be accomplished by adding this line of code to our application:

films.stream()
    .filter(Film.LENGTH.greaterThan(60)) 
    .forEach(System.out::println);

Rendered SQL:

SELECT 
    `film_id`,`title`,`description`,`release_year`,
    `language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
     `length`,`replacement_cost`,`rating`,`special_features`,
    `last_update` 
FROM 
    `sakila`.`film` 
WHERE 
    (`length` > ?),
 values:[60]

Generated output:

FilmImpl { filmId = 1, title = ACADEMY DINOSAUR, ..., length = 86, ... }
FilmImpl { filmId = 4, title = AFFAIR PREJUDICE, ..., length = 117, ...}
FilmImpl { filmId = 5, title = AFRICAN EGG, ... length = 130, ...}

Filters can be combined to create more complex expressions as depicted hereunder:

films.stream()
    .filter(
        Film.LENGTH.greaterThan(60).or(Film.LENGTH.lessThan(30))
    )
    .forEach(System.out::println);

This will return all films that are either shorter than 30 minutes or longer than one hour. Check your log files and you will see that also this Stream is rendered to SQL.

Step 5: Define the Order of the Elements

By default, the order in which elements appear in a stream is undefined. To define a specific order, you apply a sorted() operation to a stream like this:

films.stream()
    .filter(Film.LENGTH.greaterThan(60))
    .sorted(Film.TITLE)
    .forEach(System.out::println);

Rendered SQL:

SELECT 
    `film_id`,`title`,`description`,`release_year`,
    `language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
    `length`,`replacement_cost`,`rating`,`special_features`,
    `last_update` 
FROM 
    `sakila`.`film` 
WHERE 
    (`length` > ?) 
ORDER BY 
    `length` ASC,
values:[60]

Generated output:

FilmImpl { filmId = 77, title = BIRDS PERDITION,..., length = 61,...}
FilmImpl { filmId = 106, title = BULWORTH COMMANDMENTS,..., length = 61,}
FilmImpl { filmId = 114, title = CAMELOT VACATION,..., length = 61,..}
...

You can also compose multiple sorters to define the primary order, the secondary order and so on.

films.stream()
    .filter(Film.LENGTH.greaterThan(60))
    .sorted(Film.LENGTH.thenComparing(Film.TITLE.reversed()))
    .forEach(System.out::println);

This will sort the film elements by LENGTH order (ascending) and then by TITLE order (descending). You can compose any number of fields.

NB: If you are composing two or more fields in ascending order, you should use the field’s method.comparator(). I.e. sorted(Film.LENGTH.thenComparing(Film.TITLE.comparator())) rather than just sorted(Film.LENGTH.thenComparing(Film.TITLE))

Step 6: Page and Avoid Large Object Chunks

Often one wants to page results to avoid working with unnecessary large object chunks. Assuming we want to see 50 elements per page, we could write the following generic method:

private static final int PAGE_SIZE = 50;

public static <T> Stream<T> page(
    Manager<T> manager,
    Predicate<? super T> predicate,
    Comparator<? super T> comparator,
    int pageNo
) {
    return manager.stream()
        .filter(predicate)
        .sorted(comparator)
        .skip(pageNo * PAGE_SIZE)
        .limit(PAGE_SIZE);
}

This utility method can page ANY table using ANY filter and sort it in ANY order.

For example, calling:

page(films, Film.LENGTH.greaterThan(60), Film.TITLE, 3)

will return a stream of films that are longer than 60 minutes and that are sorted by title showing the third page (i.e. skipping 150 films and showing the following 50 films).

Rendered SQL:

SELECT 
    `film_id`,`title`,`description`,`release_year`,
    `language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
    `length`,`replacement_cost`,`rating`,`special_features`,
    `last_update` 
FROM 
    `sakila`.`film` 
WHERE
    (`length` > ?) 
ORDER BY
     `title` ASC 
LIMIT ? OFFSET ?,
values:[60, 50, 150]

Generated output:

FilmImpl { filmId = 165, title = COLDBLOODED DARLING, ... length = 70,...}
FilmImpl { filmId = 166, title = COLOR PHILADELPHIA, ..., length = 149... }
FilmImpl { filmId = 167, title = COMA HEAD, ... length = 109,...}
...

Again, if we had used another database type, the SQL code would differ slightly.

Step 7: In-JVM-memory Acceleration

Since you used the standard configuration in the Initializer, In-JVM-memory acceleration was enabled in your pom.xml file. To activate acceleration in your application, you just modify your initialization code like this:

SakilaApplication app = new SakilaApplicationBuilder()
    .withPassword("sakila-password")
    .withBundle(InMemoryBundle.class)
    .build();
        
    // Load data from the database into an in-memory snapshot
    app.getOrThrow(DataStoreComponent.class).load();

Now, instead of rendering SQL-queries, table streams will be served directly from RAM. Filtering, sorting and skipping will also be accelerated by in-memory indexes. Both in-memory tables and indexes are stored off-heap so they will not contribute to Garbage Collection complexity.

On my laptop (Mac Book Pro, 15-inch, Mid 2015, 16 GB, i7 2.2 GHz) the query latency was reduced by a factor over 1,000 for streams where I counted films that matched a filter and on sorted streams compared to running against a standard installation of a MySQL database (Version 5.7.16) running on my local machine.

Summary

In this article, you have learned how easy it is to query existing databases using pure Java streams. You have also seen how you can accelerate access to your data using in-JVM-memory stream technology. Both the Sakila database and Speedment is free to download and use, try it out for yourself.