This is my Java blog with various tips and tricks that are targeted for medium and advanced Java users.
I work as a Java Core Library Developer at Oracle. The views on this blog are my own and are not necessarily the ones of Oracle, Inc.
Happy Reading! /Per Minborg
Who’s Been Naughty, Who’s Been Nice? Santa Gives You Java 11 Advice!
Ever wondered how Santa can deliver holiday gifts to all kids around the world? There are 2 billion
kids, each with an individual wishlist, and he does it in 24 hours. This means 43 microseconds per kid on average and he needs to check whether every child has been naughty or nice.
You do not need to wonder anymore. I will reveal the secret. He is using Java 11 and a modern stream ORM with superfast execution.
Even though Santa’s backing database is old and slow, he can analyze data in microseconds by using standard Java streams and in-JVM-memory technology. Santa’s database contains two tables; Child which holds every child in the world, and HolidayGift that specifies all the items available for production in Santa’s workshop. A child can only have one wish, such are the hash rules.
Viewing the Database as Streams
Speedment is a modern stream based ORM which is able to view relational database tables as standard Java streams. As we all know, only nice children get gifts, so it is important to distinguish between those who’s been naughty and those who’s been nice. This is easily accomplished with the following code:
var niceChildren = children.stream()
.filter(Child.NICE.isTrue())
.sorted(Child.COUNTRY.comparator())
.collect(Collectors.toList());
This stream will yield a long list containing only the kids that have been nice. To enable Santa to optimize his delivery route, the list is sorted by country of residence.
Joining Child and HolidayGift
This list seems incomplete though. How does Santa keep track of which gift goes to whom? Now the HolidayGift table will come in handy. Since some children provided Santa with their wish list, we can now join the two tables together to make a complete list containing all the nice children and their corresponding gift. It is important to include the children without any wish (they will get a random gift), therefore we make a left join.
var join = joinComponent
.from(ChildManager.IDENTIFIER)
.where(Child.NICE.isTrue())
.leftJoinOn(HolidayGift.GIFT_ID).equal(Child.GIFT_ID)
.build(Tuples::of);
Speedment is using a builder pattern to create a Join<T> object which can then be reused over and over again to create streams with elements of type T. In this case, it is used to join Child and HolidayGift. The join only includes children that are nice and matches rows which contain the same value in the gift_id fields.
As can be seen, Santa can easily deliver all the packages with parallel sleighs, carried by reindeers.
This will render the stream to an efficient SQL query but unfortunately, it is not quick enough to make it in time.
Using In-JVM-Memory Acceleration
Now to the fun part. Santa is activating the in-JVM-memory acceleration component in Speedment, called DataStore. This is done in the following way:
var santasWorkshop = new ApplicationBuilder()
.withPassword("north-pole")
// Activate DataStore
.withBundle(DataStoreBundle.class)
.build();
// Load a snapshot of the database into off-heap memory
santasWorkshop.get(DataStoreComponent.class)
.ifPresent(DataStoreComponent::load);
This startup configuration is the only needed adjustment to the application. All stream constructs above remain the same. When the application is started, a snapshot of the database is pulled into the JVM and is stored off-heap. Because the data is stored off-heap, it will not influence garbage collection and the amount of data is only limited by available RAM. Nothing prevents Santa from loading terabytes of data since he is using a cloud service and can easily expand his RAM. Now the application will run order of magnitudes faster and Santa will be able to deliver all packages in time.
Run Your Own Projects with In-JVM-Memory Acceleration
If you want to try for yourself how fast a database application can be, there is an Initializer that can be found here. Just tick in your desired database type (Oracle, MySQL, MariaDB, PostgreSQL, Microsoft SQL Server, DB2 or AS400) and you will get a POM and an application template automatically generated for you.
If you need more help setting up your project, check out the Speedment GitHub page or explore the user guide.
Authors
Thank you, Julia Gustafsson and Carina Dreifeldt for co-writing this article.
Explore how to create off-heap aggregations with a minimum of garbage collect impact and
maximum memory utilization.
Creating large aggregations using Java Map, List and Object normally creates a lot of heap memory overhead. This also means that the garbage collector will have to clean up these objects once the aggregation goes out of scope.
Read this short article and discover how we can use Speedment Stream ORM to create off-heap aggregations that can utilize memory more efficiently and with little or no GC impact.
Person
Let’s say we have a large number of Person objects that take the following shape:
public class Person {
private final int age;
private final short height;
private final short weight;
private final String gender;
private final double salary;
…
// Getters and setters hidden for brievity
}
For the sake of argument, we also have access to a method called persons() that will create a new Stream with all these Person objects.
Salary per Age
We want to create the average salary for each age bucket. To represent the results of aggregations we will be using a data class called AgeSalary which associates a certain age with an average salary.
public class AgeSalary {
private int age;
private double avgSalary;
…
// Getters and setters hidden for brievity
}
Age grouping for salaries normally entails less than 100 buckets being used and so this example is just to show the principle. The more buckets, the more sense it makes to aggregate off-heap.
Solution
Using Speedment Stream ORM, we can derive an off-heap aggregation solution with these three steps:
Create an Aggregator
var aggregator = Aggregator.builderOfType(Person.class, AgeSalary::new)
.on(Person::age).key(AgeSalary::setAge)
.on(Person::salary).average(AgeSalary::setAvgSalary)
.build();
The aggregator can be reused over and over again.
Compute an Aggregation
var aggregation = persons().collect(aggregator.createCollector());
Using the aggregator, we create a standard Java stream Collector that has its internal state completely off-heap.
Since the Aggregation holds data that is stored off-heap, it may benefit from explicit closing rather than just being cleaned up potentially much later. Closing the Aggregation can be done by calling the close() method, possibly by taking advantage of the AutoCloseable trait, or as in the example above by using streamAndClose() which returns a stream that will close the Aggregation after stream termination.
Everything in a One-Liner
The code above can be condensed to what is effective a one-liner:
Ever wondered how you could turn joined database tables into a Java Stream? Read this short article and find out how it is done using the Speedment Stream ORM. We will start with a Java 8 example and then look into the improvements with Java 11.
Java 8 and JOINs
Speedment allows dynamically JOIN:ed database tables to be consumed as standard Java Streams. We begin by looking at a solution for Java 8 using the Sakila exemplary database:
In the new Java version 11 there is Local-Variable-Type-Inference (aka var declaration) which makes it even easier to write joins with Speedment. We do not have to explicitly state the type of the join variable:
The from() method takes the first table we want to use (Language). The innerJoinOn() method takes a specific column of the second table we want to join. Then, the equal() method takes a column from the first table that we want to use as our join condition. So, in this example, we will get matched Language and Film entities where the column Film.LANGUAGE_ID equal Language.LANGUAGE_ID.
Finally, build() will construct our Join object that can, in turn, be used to create Java Streams. The Join object can be re-used over and over again.
JOIN Types and Conditions
We can use innerJoinOn()leftJoinOn(), rightJoinOn() and crossJoin() and tables can be joined using the conditions equal(), notEqual(), lessThan(), lessOrEqual(), greaterThan() and lessOrEqual().
What's Next?
Download open-source Java 11 here. (See also the article here)
Download Speedment here.
Read all about the JOIN functionality in the Speedment User's Guide.
Who can write the shortest Java code with the lowest latency, and what tools are used?
At Oracle Code One two weeks ago, I promoted a code challenge during my speech. The contestants were given a specific problem and the winner would be the one with lowest possible latency multiplied with the number of code lines that was used (i.e. having low latency and at the same time using as few lines as possible is good. I also shared the contest on social media to get as many developers as possible involved.
The input data was to be taken from the table “film” in the open-source Sakila database.
More specifically, the object was to develop a Java application that computes the sum, min, max, and average rental duration for five films out of the existing 1,000 films using a general solution. The five films should be the films around the median film length starting from the 498:th and ending with the 502:th film (inclusive) in ascending film length.
Contestants were free to use any Java library and any solution available such as Hibernate, JDBC or other ORM tools.
A Solution Based on SQL/JDBC
One way of solving the problem would be using SQL/JDBC directly as shown hereunder. Here is an example of SQL code that solves the computational part of the problem:
SELECT
sum(rental_duration),
min(rental_duration),
max(rental_duration),
avg(rental_duration)
FROM
(select rental_duration from sakila.film
order by length
LIMIT 5 OFFSET 498) as A
If I run it on my laptop, I get a latency of circa 790 microseconds on the server side (standard MySQL 5.7.16). To use this solution we also need to add Java code to issue the SQL statement and to read the values back from JDBC into our Java variables. This means that the code will be even larger and will take a longer time to execute as shown hereunder:
try (Connection con = DriverManager
.getConnection("jdbc:mysql://somehost/sakila?"
+ "user=sakila-user&password=sakila-password")) {
try (Statement statement = con.createStatement()) {
ResultSet resultSet = statement
.executeQuery(
"SELECT " +
" sum(rental_duration)," +
" min(rental_duration)," +
" max(rental_duration)," +
" avg(rental_duration)" +
"FROM " +
" (select rental_duration from sakila.film " +
" order by length " +
"limit 5 offset 498) as A");
if (resultSet.next()) {
int sum = resultSet.getInt(1);
int min = resultSet.getInt(2);
int max = resultSet.getInt(3);
double avg = resultSet.getDouble(4);
// Handle the result
} else {
// Handle error
}
}
}
To give this alternative a fair chance, I reused the connection between calls in the benchmark rather than re-creating it each time (recreation is shown above but was not used in benchmarks).
Result: ~1,000 us and ~25 code lines
The Winning Contribution
However, the SQL example above was without a chance compared to the winning contribution. The winner was Sergejus Sosunovas (@SergejusS) from Switzerland who currently works with developing an optimization and management system. He used Speedment in-JVM-memory acceleration and states:
"It took less than an hour to start and build a solution." Here is the winning code:
IntSummaryStatistics result = app.getOrThrow(FilmManager.class).stream()
.sorted(Film.LENGTH)
.skip(498)
.limit(5)
.mapToInt(GeneratedFilm::getRentalDuration)
.summaryStatistics();
This was much faster than SQL/JDBC and completed in as little as 6 microseconds.
Result: 6 us and 6 code lines
A Solution with Only Five Lines
One of the contestants, Corrado Lombard from Italy, is worth an honorary mention since he was able to solve the problem in only five lines. Unfortunately, there was a slight error in his original contribution, but when fixed, the solution looked like this:
IntSummaryStatistics result = app.getOrThrow(FilmManager.class).stream()
.sorted(Film.LENGTH)
.skip(498)
.limit(5)
.collect(summarizingInt(Film::getRentalDuration));
This fixed solution had about the same performance as the winning solution.
Optimized Speedment Solution
As a matter of fact, there is a way of improving latency even more than the winning contribution. By applying an IntFunction that is able to do in-place-deserialization (in situ) of the int value directly from RAM, improves performance even more. In-place-deserialization means that we do not have to deserialize the entire entity, but just extract the parts of it that are needed. This saves time, especially for entities with many columns. Here is how an optimized solution could look like:
IntSummaryStatistics result = app.getOrThrow(FilmManager.class).stream()
.sorted(Film.LENGTH)
.skip(498)
.limit(5)
.mapToInt(Film.RENTAL_DURATION.asInt()) // Use in-place-deserialization
.summaryStatistics();
This was even faster and completed in a just 3 microsecond.
Result: 3 us and 6 code lines
GraalVM and Optimized Speedment Solution
GraalVM contains an improved C2 Compiler that is known to improve stream performance for many workloads. In particular, it seems like the benefits from inlining and escape-analysis are much better under Graal than under the normal OpenJDK.
I was curious to see how the optimized solution above could benefit from GraalVM. With no code change, I run it under GraalVM (1.0.0-rc9) and now latency was down to just 1 microsecond! This means that we could perform 1,000,000 such queries per second per thread on a laptop and presumably much more on a server grade computer.
Result: 1 us and 6 code lines
Overview
When we compare SQL/JDBC latency against the best Speedment solution, the speedup factor with Speedment was about 1,000 times. This is the same difference as comparing walking to work or taking the worlds fastest manned jet plane (SR-71 "Blackbird"). A huge improvement if you want to streamline your code.
To be able to plot the solutions in a diagram, I have removed the comparatively slow SQL/JDBC solution, and only showed how the different Speedment solutions and runtimes measure up:
Try it Out
The competition is over. However, feel free to challenge the solutions above or try them out for yourself. Full details of the competition and rules can be found here. If you can beat the fastest solution, let me know in the comments below.
The benchmark results presented above were obtained when running on my MacBook Pro Mid 2015, 2.2 GHz Intel Core i7, 16 GB 1600 MHz DDR3, Java 8, JMH (1.21) and Speedment (3.1.8)
Conclusion
It is possible to reduce latencies by orders of magnitude and reduce code size at the same time using in-JVM-memory technology and Java streams.
GraalVM can improve your stream performance significantly under many conditions.
Java streams with just one element sometimes create unnecessary overhead in your applications. Learn how to use SingletonStream objects and gain over tenfold performance for some of these kinds of streams and learn how, at the same time, you can simplify your code.
Background
The Stream library in Java 8 is one of the most powerful additions to the Java language ever. Once you start to understand its versatility and resulting code readability, your Java code-style will change forever. Instead of bloating your code with all the nitty and gritty details with for, if and switch statements and numerous intermediate variables, you can use a Stream that just contains a description of what to do, and not really how it is done.
Some years ago, we had to make an API decision for a Java project: Which return type should we select for the two fast local in-memory data cache methods with;
a unique search key which returns either a value or no value
a non-unique search key which returns any number of values (zero to infinity).
This was the initial idea:
Optional<T> searchUnique(K key); // For unique keys
Stream<T> search(K key); // For non-unique keys
But, we would rather have the two methods look exactly the same and both return a Stream<T>. The API would then look much cleaner because a unique cache would then look exactly the same as a non-unique cache.
However, the unique search had to be very efficient and able to create millions of result objects each second without creating too much overhead.
The Solution
By implementing a SingletonStream that only takes a single element (and therefore can be highly optimized compared to a normal Stream with any number of elements), we were able to let both methods return a Stream while retaining performance. The method searchUnique(K key) would return an empty stream (Stream.empty()) if the key was not found, and it would return a SingletonStream with the value associated with the key if the key existed. We would get:
Stream<T> searchUnique(K key); // For unique keys
Stream<T> search(K key); // For non-unique keys
Great! We can eat the cookie and still have it!
The Implementation
The SingletonStream is a part of the Speedment Stream ORM and can be viewed here on GitHub. Feel free to use Speedment and any of it's component in your own projects using the Speedment initializer.
The SingletonStream is a good candidate for stack allocation using the JVM's Escape Analysis (read more on Escape Analysis in my previous posts here and here). The implementation comes in two shapes. if we set the STRICT value to true, we will get a completely lazy Stream, but the drawback is that we will lose the Singleton Property once we call some Intermediate Operations like .filter(), map() etc. If we, on the other hand, set the STRICT value to false, the SingletonStream will perform many of the Intermediate Operations eagerly and it will be able to return a new SingletonStream thereby retaining the Singleton Property. This will give better performance in many cases.
The solution devised here for reference streams could also easily be modified to the primitive incarnations of singleton streams. So, it would be almost trivial to write a SingletonIntStream, a SingletonLongStream and a SingletonDoubleStream. Here is a SingletonLongStream.
It should be noted that the class could be further developed so it could support lazy evaluation while still being always high performant. This is a future work.
Performance
There are many ways one could test the performance of the SingletonStream and compare it with a standard Stream implementation with one element.
Here is one way of doing it using JMH. The first tests (count) just counts the number of elements in the stream and the second tests (forEach) does something with one element of a stream.
@Benchmark
public long singletonStreamCount() {
return SingletonStream.of("A").count();
}
@Benchmark
public long streamCount() {
return Stream.of("A").count();
}
@Benchmark
public void singletonStreamForEach() {
SingletonStream.of("A")
.limit(1)
.forEach(blackHole());
}
@Benchmark
public void streamForEach() {
Stream.of("A")
.limit(1)
.forEach(blackHole());
}
private static <T> Consumer<T> blackHole() {
return t -> {};
}
This will produce the following result when run on my MacBook Pro laptop:
That's a speedup factor over 10 for the "count" operation. For the "forEach" operation, it looks like the JVM was able to completely optimize away the complete code path for the SingletonStream.
Test It
Download Speedment using the Speedment initializer.
Writing JUnit tests can be a tedious and boring process. Learn how you can improve your tests classes using permutations in combination with TestFactory methods and DynamicTest objects with a minimum of coding effort.
In this article, I will use the Java stream ORM Speedment because it includes a ready-made Permutation class and thereby helps me save development time. Speedment otherwise allows database tables to be connected to standard Java streams. Speedment is an open-source tool and is also available in a free version for commercial databases.
As can be seen, this test creates a Stream with the elements “CCC”, “A”, ”BB’ and “BB” and then applies a filter that will remove the “A” element (because its length is not greater than 1). After that, the elements are sorted, so that we have the elements “BB”, “BB” and “CCC” in the stream. Then, a distinct operation is applied, removing all duplicates in the stream, leaving the elements “BB” and “CCC” before the final terminating operator is invoked whereby these remaining elements are collected to a List.
After some consideration, it can be understood that the order in which the intermediate operations filter(), sorted() and distinct() are applied is irrelevant. Thus, regardless of the order of operator application, we expect the same result.
But, how can we wite a JUnit5 test that proves that the order is irelevant for all permutations without writing individual test cases for all six permutations manually?
Using a TestFactory
Instead of writing individual tests, we can use a TestFactory to produce any number of DynamicTest objects. Here is a short example demonstrating the concept:
This will produce two, arguably meaningless, tests named “A” and “B”. Note how we conveniently can return a Stream of DynamicTest objects without first having to collect them into a Collection such as a List.
Using Permutations
The Permutation class can be used to create all possible combinations of items of any type T. Here is a simple example with the type String:
Permutation.of("A", "B", "C")
.map(
is -> is.collect(toList())
)
.forEach(System.out::println);
Because Permutation creates a Stream of a Stream of type T, we have added an intermediary map operation where we collect the inner Stream to a List. The code above will produce the following output:
[A, B, C]
[A, C, B]
[B, A, C]
[B, C, A]
[C, A, B]
[C, B, A]
It is easy to prove that this is all the ways one can combine “A”, “B” and “C” whereby each element shall occur exactly one time.
Creating the Operators
In this article, I have opted to create Java objects for the intermediate operations instead of using lambdas because I want to override the toString() method and use that for method identification. Under other circumstances, it would have sufficed to use lambdas or method references directly:
UnaryOperator<Stream<String>> FILTER_OP = new UnaryOperator<Stream<String>>() {
@Override
public Stream<String> apply(Stream<String> s) {
return s.filter(string -> string.length() > 1);
}
@Override
public String toString() {
return "filter";
}
};
UnaryOperator<Stream<String>> DISTINCT_OP = new UnaryOperator<Stream<String>>() {
@Override
public Stream<String> apply(Stream<String> s) {
return s.distinct();
}
@Override
public String toString() {
return "distinct";
}
};
UnaryOperator<Stream<String>> SORTED_OP = new UnaryOperator<Stream<String>>() {
@Override
public Stream<String> apply(Stream<String> s) {
return s.sorted();
}
@Override
public String toString() {
return "sorted";
}
};
Testing the Permutations
We can now easily test the workings of Permutations on our Operators:
As can be seen, these are all permutation of the intermediate operations we want to test.
Stitching it up
By combining the learnings above, we can create our TestFactory that will test all permutations of the intermediate operations applied to the initial stream:
Note how we are using the Stream::reduce method to progressively apply the intermediate operations on the initial Stream.of("CCC", "A", "BB", "BB"). The combiner lambda (a, b) -> a is just a dummy, only to be used for combining parallel streams (which are not used here).
Blow up Warning
A final warning on the inherent mathematical complexity of permutation is in its place. The complexity of permutation is, by definition, O(n!) meaning, for example, adding just one element to a permutation of an existing eight element will increase the number of permutations from 40,320 to 362,880.
This is a double-edged sword. We get a lot of tests almost for free but we have to pay the price of executing each of the tests on each build.
GraalVM is the new kid on the JVM block. It is an open-source Virtual Machine that is able to run many programming languages, such as Java, Rust and JavaScript, at the same time. GraalVM has also a new internal code optimizer pipeline that can improve performance significantly compared to other JVMs under some conditions. Learn how to reap the benefits of GraalVM and execute your code faster with no code modification.
What is GraalVM?
Previous JVMs, such as Oracle JVM and OpenJDK JVM (both called “HotSpot”), has been around for a long time. They have evolved considerably over time and, over the course of the decades, we have seen performance rocketed compared to the Java 1.0 JVM. Significant JVM improvements include just-in-time compiling (JIT), C2 compiler, escape analysis etc. that all contributed to this positive development. But as with all technology, they will start to plateau at some point.
GraalVM is a fresh start whereby a new internal architecture has been developed from the ground up. In particular, the JIT compiler, called Gaal, has been reworked. Unsurprisingly, the JIT compiler itself is written in Java, just like all the other GraalVM components. As it turns out, Graal is sometimes able to optimize your code better than some existing JVMs. In particular, some Stream types appear to benefit from running under Graal.
Database Stream Performance
There are a number of ways to write Java streams. The most obvious way is to use one of the built-in Java functions Stream::of or Collection::stream methods. These methods, however, requires that the elements in the Stream are present a-priori in the shape of Java objects. This means that the compiler cannot optimize them away under most conditions.
I have therefore instead chosen to use the stream based ORM tool Speedment. This tool works with a technology that pulls in database content into an in-JVM-memory snapshot and creates Java streams directly from RAM. Thus, database tables are stored off-heap, thereby potentially avoiding the creation of Java Objects. Because Graal has an improved performance optimization pipeline, it is likely that it can better optimize away temporarily intermediary stream objects. In theory, Speedment and Graal would, therefore, be a perfect fit. I was therefore very eager to test how the already extreme performance of Speedement would be affected when running under GraalVM rather than running under HotSpot.
The following Speedment database streams were used to test performance. Read more on these streams and how they work in one of my previous article that you can find here.
private static final Predicate RATING_EQUALS_PG_13 =
Film.RATING.equal(GeneratedFilm.Rating.PG13);
private static final Comparator LENGTH_DESCENDING =
Film.LENGTH.reversed();
@Benchmark
public long filterAndCount() {
return films.stream()
.filter(RATING_EQUALS_PG_13)
.count();
}
@Benchmark
public IntSummaryStatistics Complex() {
return films.stream()
.sorted(LENGTH_DESCENDING)
.skip(745)
.limit(5)
.mapToInt(Film.RENTAL_DURATION.asInt())
.summaryStatistics();
}
The following JMH output was obtained for runs under GraalVM and HotSpot respectively:
Being able to produce and consume over 30 million database streams per second with GraalVM/Speedment on a laptop with 4 CPU cores is quite astonishing. Imagine the performance on a server grade node with 24 or 32 CPU cores.
Here is how it looks in a chart (higher is better):
Ordinary Stream Performance
Initial tests show varying relative performance figures for built-in Java streams like Stream.of(“A”, “B”, “C”) or List::stream with various operations applied, for the different JVMs. I expect also these stream types to gain performance across the board once GraalVM has matured. Perhaps I will cover this in a future article.
Setup
The following JMH setup was used for GraalVM and HotSpot:
# Detecting actual CPU count: 8 detected
# JMH version: 1.21
# VM version: JDK 1.8.0_172, GraalVM 1.0.0-rc6, 25.71-b01-internal-jvmci-0.48
# *** WARNING: JMH support for this VM is experimental. Be extra careful with the produced data.
# VM invoker: /Applications/graalvm-ce-1.0.0-rc6/Contents/Home/jre/bin/java
# VM options: -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 8 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
# Detecting actual CPU count: 8 detected
# JMH version: 1.21
# VM version: JDK 1.8.0_171, Java HotSpot(TM) 64-Bit Server VM, 25.171-b11
# VM invoker: /Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home/jre/bin/java
# VM options: -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 8 threads, will synchronize iterations
# Benchmark mode: Throughput, ops/time
The tests above were performed on a MacBook Pro (Retina, 15-inch, Mid 2015), 2.2 GHz Intel Core i7, 16 GB 1600 MHz DDR3 with 4 CPU cores and 8 threads. As can be seen in the logs, we should be careful to draw conclusions using JMH figures for Graal as the JMH support is experimental at this time.
Give it a Spin
Use the Speedment initializer to create a Speedment project template here.
Streams are very powerful and can capture the gist of your intended functionality in just a few lines. But, just as smooth as they are when it all works, just as agonizing it can be when they don’t behave as expected. Learn how to use IntelliJ to debug your Java Streams and gain insight into the intermediate operations of a Stream.
The code above first creates an initial Stream consisting of the String elements "C", "A", "B". Then, an intermediary operation sorted() is applied to the first Stream, thereby (at least in Java 8-10) creating a new Stream where the elements in the initial stream are sorted according to their natural order. I.e. the second stream will contain the elements "A", "B", "C". Lastly, these elements are collected into a List.
This is basically how the Stream debugger operates. It breaks up a stream pipeline into smaller segments and progressively invokes the different intermediate operators while retaining the elements for each step analyzed:
Stream.of("C", "B", "A")
.peek(saveStep(0))
.sorted()
.peek(saveStep(1))
.collect(toList()); // The final result is saved to step 2
NB: This is not exactly how it works technically, but it provides a good overall outline.
Visually, it looks like this in IntelliJ’s debugger:
This gives a clear and concise view of what is going on internally in the Stream pipeline between each intermediate operation and also shows the final result of the Stream.
Invocation
The stream debugger is invoked by first setting a breakpoint where a Stream is defined:
Then, start a debug session:
When the breakpoint is hit, the Stream debugger can be invoked by pressing its designated (and arguably somewhat concealed) button as indicated by the red circle below:
This will pull up the stream debugger as shown previously in the article.
Database Streams
I will use the stream ORM Speedment that allows databases to be queried using standard Java Streams and thus, these streams can also be debugged with IntelliJ. A Speedment project can be set up using the Speedment initializer.
The Java application itself can be set up like this:
Speedment app = new SakilaApplicationBuilder()
.withPassword("sakila-password") // Replace with your own password
.build();
FilmManager films = app.getOrThrow(FilmManager.class);
Now, we can stream the database table “film”. For example like this:
This will filter out all Film objects with a length equal to 60 minutes, then sort those Film objects according to the Film.RATING (descending) and then collect these elements into a List.
When we invoke the Stream debugger, we will see the following:
As can be seen, there are 1,000 films in the initial stream. After the filter operator, just 8 films remain which are subsequently sorted and then collected to a List.
Compute Statistics
Suppose we want to compute the min, max and average length of all films rated PG-13. This can be done like this:
IntSummaryStatistics stat = films.stream()
.filter(Film.RATING.equal("PG-13"))
.mapToInt(Film.LENGTH.asInt())
.summaryStatistics();
And looks like this in the Stream debugger:
As can be seen, it is possible to interact with the Stream debugger and click on elements whereby their path in the stream pipeline is highlighted. It is also possible to scroll among the elements for individual steps.
Speedment normally optimizes away intermediary operations in a database Stream and merges these steps into the SQL query. However, when the Stream debugger is used, no such optimization takes place and we are able to see all steps in the stream pipeline.
Conclusions
The Stream debugger is a hidden gem that can be of significant help when working with Streams.
I think the IntelliJ team has come up with a really good feature.
The mantra "Favor Composition over Inheritance" has, with good reasons, been repeated many times in the literature. However, there is little or no language support in Java to simplify the composition of objects. However, with a new JEP draft named "Concise Method Bodies", the situation might improve slightly.
Brian Goetz is responsible for the JEP draft which likely will be handled under project "Amber". The complete draft can be found here.
Concise Method Bodies
The JEP, when implemented, allows for something called Concise Method Bodies (CMB) whereby, loosely speaking, a method body can be a lambda or a method reference. Here is one example:
Old Style:
int length(String s) {
return s.length();
}
New CMB:
int length(String s) -> s.length(); // -> is "single expression form"
or alternately simply:
int length(String s) = String::length; // = is "method reference form"
This will reduce boilerplate coding while improving code readability.
Composition
Consider the existing Java class Collections.UnmodifiableList which delegates an inner List class and prevents it from being modified (code shortened and reordered for readability):
static class UnmodifiableList<E> extends UnmodifiableCollection<E>
implements List<E> {
final List<? extends E> list;
UnmodifiableList(List<? extends E> list) {
super(list);
this.list = list;
}
public boolean equals(Object o) {return o == this || list.equals(o);}
public int hashCode() {return list.hashCode();}
public E get(int index) {return list.get(index);}
public int indexOf(Object o) {return list.indexOf(o);}
public int lastIndexOf(Object o) {return list.lastIndexOf(o);}
public E set(int index, E element) {
throw new UnsupportedOperationException();
}
With CMB, it can be implemented like this:
static class UnmodifiableList<E> extends UnmodifiableCollection<E>
implements List<E> {
final List<? extends E> list;
UnmodifiableList(List<? extends E> list) {
super(list);
this.list = list;
}
public boolean equals(Object o) = list::equals;
public int hashCode() = list::hashCode;
public E get(int index) = list::get;
public int indexOf(Object o) = list::indexOf;
public int lastIndexOf(Object o)= list::lastIndexOf;
public E set(int index, E element) {
throw new UnsupportedOperationException();
}
I think this feature would make sense. It is especially useful when delegating methods with one or several parameters.
Ultra-Low Latency Querying with Java Streams and In-JVM-Memory
Fundamental rules of nature, such as the speed of light and general information theory, set significant limits on the maximum performance we can obtain from traditional system architectures. Learn how you, as a Java developer, can improve performance by orders of magnitude using in-JVM-technology and Java Streams.
If, for example, the application server and the database server are located 100 m apart (about 330 feet), then the round trip delay imposed by the speed of light is slightly north of 600 ns. More importantly, due to TCP/IP protocol handling, a single packet round-trip delay on a 10 GBit/s connection can hardly be optimized down to less than 25 us (=25,000 ns) despite resorting to black belt tricks such as custom kernel builds, busy polling and CPU affinity.
In this article, I will show how we can create Java Streams directly from RAM using in-JVM-memory technology. We will use the Stream-based Java ORM named Speedment that can perform data analytics using standard java.util.stream.Stream objects and how some of these streams can be created and completed in under 200 ns which, surprisingly, is only about two times the latency of a CPU accessing 64-bit main memory.
200 ns is more than 125 times faster than the theoretical minimum latency from a remote database (100 m) whose internal processing delay is zero and where a single TCP packet can convey both the query and the response. In real time scenarios, databases’ internal processing delay is never zero and both queries and results are often sent in several TCP packages. So, the speedup factor could be 1,000 times or much more in many cases.
The Database
In the examples below, we are using data from the Sakila database content for MySQL. Sakila is an example database that models a movie rental store. It has tables called Film, Actor, Category and so on and it can be downloaded for free here. It should be noted that this is a small database but, as it turns out, many of the Speedment stream operations are O(1) or O(log(N) in terms of complexity, thereby ensuring the same speed regardless how big or small the data sets are.
Step 1: Create the project
First, we need to configure our pom.xml-file to use the latest Speedment dependencies and Maven plugin. The fastest way to do this is to generate a pom.xml-file using the Speedment Initializer that you can find here. First, choose the database type “MySQL” and make sure the “In-memory Acceleration” is enabled and then press “download”, and you will get an entire project folder with a Main.java-file generated automatically for you.
Next, unpack the project folder zip file, open a command line, go to the unpacked folder (where the pom.xml file is) and enter the following command:
mvn speedment:tool
Next, connect to the database and get started:
Step 2: Generate Code
When the schema data has been loaded from the database, the complete Java domain model can be generated by pressing the “Generate” button.
Step 3: Write the Application Code
In order to work with Speedment, you first need to create a Speedment instance. This can be done by using a builder that was automatically generated together with the domain model in step 2. Open the Main.java file and replace the code in the main() method with this snippet:
Speedment app = new SakilaApplicationBuilder()
// Replace this with your own password
.withPassword("sakila-password")
// Enable in-JVM-memory acceleration
// By just commenting away this line, we can disable acceleration
.withBundle(InMemoryBundle.class)
.build();
// Load data from database into a snapshot view if
// we have installed In-JVM-Acceleration
app.get(DataStoreComponent.class)
.ifPresent(DataStoreComponent::load);
As a demonstration of basic functionality, we will first write an application that just prints out all films:
// Obtains a FilmManager that allows us to
// work with the "film" table
FilmManager films = app.getOrThrow(FilmManager.class);
// Create a stream of films and print
// each and every film
films.stream()
.forEach(System.out::println);
The code above will produce the following output (shortened for brevity):
Speedment streams support all stream operations including filters. Suppose we want to filter out only those films that are longer than 60 minutes and count how many occurrences we have. This can be accomplished like this:
films.stream()
.filter(Film.LENGTH.greaterThan(60))
.count();
System.out.format("There are %,d films longer than 60 minutes.", count);
This will produce the following output:
There are 896 films longer than 60 minutes
Any number of filters can be applied to a stream and the predicate supplied to a filter() method can be composed using and() / or() operators.
Step 4: Setting up JMH
So far, we have not seen any performance figures. We are going to use JMH for benchmarking in this article. JMH is a Java harness for building, running, and analyzing benchmarks written in Java and other languages targeting the JVM.
There are two stream types we are going to use for performance measurements:
A fairly simple stream where we count the films that has a rating equal to PG-13 called “Filter And Count”
A more complex stream where we sort all the films in LENGTH order (descending), then we skip the first 745 films and then process the following 5 films whereby we extract the rental duration from those five films and finally we compute statistics on these integers (i.e. min, max, and average values). This type is called “Complex”.
The following code extract shows the benchmarks we are about to run:
private static final Predicate RATING_EQUALS_PG_13 =
Film.RATING.equal(Rating.PG13);
private static final Comparator LENGTH_DESCENDING =
Film.LENGTH.reversed();
@Benchmark
public long filterAndCount() {
return films.stream()
.filter(RATING_EQUALS_PG_13)
.count();
}
@Benchmark
public IntSummaryStatistics complex() {
return films.stream()
.sorted(LENGTH_DESCENDING)
.skip(745)
.limit(5)
.mapToInt(Film.RENTAL_DURATION.asInt())
.summaryStatistics();
}
The following setup was used for single threaded latency measurements:
# JMH version: 1.21
# VM version: JDK 10, Java HotSpot(TM) 64-Bit Server VM, 10+46
# VM invoker: /Library/Java/JavaVirtualMachines/jdk-10.jdk/Contents/Home/bin/java
# VM options: -javaagent:/Applications/IntelliJ IDEA CE.app/Contents/lib/idea_rt.jar=63173:/Applications/IntelliJ IDEA CE.app/Contents/bin -Dfile.encoding=UTF-8
# Warmup: 5 iterations, 10 s each
# Measurement: 5 iterations, 10 s each
# Timeout: 10 min per iteration
# Threads: 1 thread, will synchronize iterations
# Benchmark mode: Average time, time/op
# Benchmark: com.example.Bench.complex
Streams using SQL with a MySQL Database
Running these queries against a standard MySQL database (version 5.7.16) on my laptop (MacBook Pro, mid-2015, 2.2 GHz Intel Core i7, 16 GB RAM) will produced the following output shown below:
Being able to produce and consume almost 17 million streams per second on an old laptop is pretty astonishing. A modern server-grade computer with many CPU-cores will easily be able to produce and consume more than 25 million streams per second.
The JMH time resolution for latency was not sufficient to measure accurate enough. By running a throughput test with one thread and inverting the result, the average Filter And Count latency was estimated to 1/5,564,678 = 180 ns. This more accurate latency estimate gives an estimated performance boost factor of around 5,000 rather than 10,000.
Conclusions
Enabling in-JVM-memory acceleration can improve performance substantially. In the benchmarks above:
Single thread latency was reduced by a factor of:
Complex: ~ 3,000
Filter And Count: ~5,000
Multi-thread throughput was increased by a factor of:
Complex: 2,700
Filter and Count: 5,300
As an illustration, this means that a compound JVM operation with one million subqueries will have its aggregated data latency reduced from 1 h to 1 second.
Notes
For SQL performance, streams were (automatically) rendered to SQL queries. Here is how the rendered Filter And Count SQL query looked like:
SELECT COUNT(*) FROM (
SELECT
`film_id`,`title`,`description`,
`release_year`, `language_id`,`original_language_id`,
`rental_duration`,`rental_rate`, `length`,
`replacement_cost`,`rating`,`special_features`,
`last_update`
FROM
`sakila`.`film`
WHERE
(`rating` = ? COLLATE utf8_bin)
) AS A
, values:[PG-13]
There was an index defined for the rating column.
As can be seen, all counting was done on the database side and the stream did not pull in any unnecessary Film objects from the database into the JMH application.
Source Code
The source code for the benchmarks can be seen here.
Summary
In this article, you have learned how to significantly reduce latencies in your data analytics Java applications and at the same time improve throughput using Speedment Free.
The speedup factors are several orders of magnitude.
In this article, you will learn how you can write pure Java applications, that are able to work with data from an existing database, without writing a single line of SQL (or similar languages like HQL) and without spending hours putting everything together. After your application is ready, you will learn how to accelerate latency performance with a factor of more than 1,000 using in-JVM-acceleration by adding just two lines of code.
Throughout this article, we will use Speedment which is a Java stream ORM that can generate code directly from a database schema and that can automatically render Java Streams directly to SQL allowing you to write code in pure Java.
You will also discover that data access performance can increase significantly by means of an in-JVM-memory technology where Streams are run directly from RAM.
Example Database
We will use an example database from MySQL named Sakila. It has tables called Film, Actor, Category and so on and can be downloaded for free here.
Step 1: Connect to Your Database
We will start to configure the pom.xml file by using the Speedment Initializer that you can find here. Press “download”, and you will get project folder with a Main.java file generated automatically.
Next, unpack the project folder zip file, open a command line, go to the unpacked folder (where the pom.xml file located)
Then, enter the following command:
mvn speedment:tool
This will launch the Speedment tool and prompt you for a license key. Select “Start Free” and you will get a license automatically and for free. Now you can connect to the database and get started:
Step 2: Generate Code
Once the schema data has been loaded from the database, the complete Java domain model can be generated by pressing the “Generate” button.
This will only take a second or two.
Step 3: Write the Application Code
Together with the domain model in step 2, a builder for the Speedment instance was automatically generated. Open the Main.java file and replace the code in the main() method with this snippet:
SakilaApplication app = new SakilaApplicationBuilder()
.withPassword("sakila-password") // Replace with your own password
.build();
Next, we will write an application that will print out all films. Admittedly, it’s a small application but we will improve it over the course of this article.
// Obtains a FilmManager that allows us to
// work with the "film" table
FilmManager films = app.getOrThrow(FilmManager.class);
// Create a stream of all films and print
// each and every film
films.stream()
.forEach(System.out::println);
Isn’t that simple?
When run, the Java stream will be automatically rendered to SQL under the hood. In order to actually see the SQL code rendered, modify our application builder and enable logging using the STREAM log type:
SakilaApplication app = new SakilaApplicationBuilder()
.withPassword("sakila-password")
.withLogging(ApplicationBuilder.LogType.STREAM)
.build();
This is how the SQL code looks like when you run the application:
SELECT
`film_id`,`title`,`description`,`release_year`,
`language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
`length`,`replacement_cost`,`rating`,`special_features`,`last_update`
FROM
`sakila`.`film`,
values:[]
The SQL code rendered might differ depending on the database type you have selected (e.g. MySQL, MariaDB, PostgreSQL, Oracle, MS SQL Server, DB2, AS400 etc.). These variations are automatic.
The code above will produce the following output (shortened for brevity):
Speedment streams support all Stream operations including filters. Suppose we want to filter out only those films that are longer than 60 minutes. This can be accomplished by adding this line of code to our application:
This will return all films that are either shorter than 30 minutes or longer than one hour. Check your log files and you will see that also this Stream is rendered to SQL.
Step 5: Define the Order of the Elements
By default, the order in which elements appear in a stream is undefined. To define a specific order, you apply a sorted() operation to a stream like this:
SELECT
`film_id`,`title`,`description`,`release_year`,
`language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
`length`,`replacement_cost`,`rating`,`special_features`,
`last_update`
FROM
`sakila`.`film`
WHERE
(`length` > ?)
ORDER BY
`length` ASC,
values:[60]
This will sort the film elements by LENGTH order (ascending) and then by TITLE order (descending). You can compose any number of fields.
NB: If you are composing two or more fields in ascending order, you should use the field’s method.comparator(). I.e. sorted(Film.LENGTH.thenComparing(Film.TITLE.comparator())) rather than just sorted(Film.LENGTH.thenComparing(Film.TITLE))
Step 6: Page and Avoid Large Object Chunks
Often one wants to page results to avoid working with unnecessary large object chunks. Assuming we want to see 50 elements per page, we could write the following generic method:
private static final int PAGE_SIZE = 50;
public static <T> Stream<T> page(
Manager<T> manager,
Predicate<? super T> predicate,
Comparator<? super T> comparator,
int pageNo
) {
return manager.stream()
.filter(predicate)
.sorted(comparator)
.skip(pageNo * PAGE_SIZE)
.limit(PAGE_SIZE);
}
This utility method can page ANY table using ANY filter and sort it in ANY order.
will return a stream of films that are longer than 60 minutes and that are sorted by title showing the third page (i.e. skipping 150 films and showing the following 50 films).
Rendered SQL:
SELECT
`film_id`,`title`,`description`,`release_year`,
`language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
`length`,`replacement_cost`,`rating`,`special_features`,
`last_update`
FROM
`sakila`.`film`
WHERE
(`length` > ?)
ORDER BY
`title` ASC
LIMIT ? OFFSET ?,
values:[60, 50, 150]
Generated output:
FilmImpl { filmId = 165, title = COLDBLOODED DARLING, ... length = 70,...}
FilmImpl { filmId = 166, title = COLOR PHILADELPHIA, ..., length = 149... }
FilmImpl { filmId = 167, title = COMA HEAD, ... length = 109,...}
...
Again, if we had used another database type, the SQL code would differ slightly.
Step 7: In-JVM-memory Acceleration
Since you used the standard configuration in the Initializer, In-JVM-memory acceleration was enabled in your pom.xml file. To activate acceleration in your application, you just modify your initialization code like this:
SakilaApplication app = new SakilaApplicationBuilder()
.withPassword("sakila-password")
.withBundle(InMemoryBundle.class)
.build();
// Load data from the database into an in-memory snapshot
app.getOrThrow(DataStoreComponent.class).load();
Now, instead of rendering SQL-queries, table streams will be served directly from RAM. Filtering, sorting and skipping will also be accelerated by in-memory indexes. Both in-memory tables and indexes are stored off-heap so they will not contribute to Garbage Collection complexity.
On my laptop (Mac Book Pro, 15-inch, Mid 2015, 16 GB, i7 2.2 GHz) the query latency was reduced by a factor over 1,000 for streams where I counted films that matched a filter and on sorted streams compared to running against a standard installation of a MySQL database (Version 5.7.16) running on my local machine.
Summary
In this article, you have learned how easy it is to query existing databases using pure Java streams. You have also seen how you can accelerate access to your data using in-JVM-memory stream technology. Both the Sakila database and Speedment is free to download and use, try it out for yourself.
Here is a look at how you can write a full stack database web application without using SQL, HQL, PHP, ASP, HTML, CSS or Javascript and instead relying purely on Java using Vaadin’s UI layer and Speedment Stream ORM.
Ever wanted to quickly create a web application connected to your existing database or build a professional application with short time-to-market requirements? The Java Stream API has unleashed the possibility to write database queries in pure Java.
In this article, we will demonstrate how fast and easy this can be done by leveraging two Java frameworks; Vaadin and Speedment. Because they both use Java Streams, it easy to connect them together. This means we will end up with a short, concise and type-safe application.
For this mini-project, we will use the My SQL sample database named "Employees" which provides approximately 160MB of data spread over six separate tables and comprising 4 million records.
The full application code is available at GitHub and you can clone this repository if you want to run the application in your own environment. You will also need trial licenses from both Vaadin and Speedment to use the features used in this article. These are available for free.
The intended end result is a web application where it is possible to analyze gender balance and salary distribution among different departments. The result is displayed graphically, using pure standard Vaadin Charts Java components as depicted in the video below:
Setting Up the Data Model
We are using Speedment Stream ORM to access the database. It is easy to set up any project using the Speedment initializer. Speedment can generate Java classes directly from the database’s schema data. After generation, we can create our Speedment instance like this:
Speedment speedment = new EmployeesApplicationBuilder()
.withUsername("...") // Username need to match database
.withPassword("...") // Password need to match database
.build();
Create a Dropdown for Departments
In our web application, we want to have a drop-down list of all departments. It is easy to retrieve the departments from the database as can be seen in this method:
Now we are going to create a join relation between Departments and Employees. In the database, there is a many-to-many relation table that connects these tables together named DeptEmpl.
First, we create a custom tuple class that will hold our three entries from the joined tables:
public final class DeptEmplEmployeesSalaries {
private final DeptEmp deptEmp;
private final Employees employees;
private final Salaries salaries;
public DeptEmplEmployeesSalaries(
DeptEmp deptEmp,
Employees employees,
Salaries salaries
) {
this.deptEmp = requireNonNull(deptEmp);
this.employees = requireNonNull(employees);
this.salaries = requireNonNull(salaries);
}
public DeptEmp deptEmp() { return deptEmp; }
public Employees employees() { return employees; }
public Salaries salaries() { return salaries; }
public static TupleGetter0 deptEmpGetter() {
return DeptEmplEmployeesSalaries::deptEmp;
}
public static TupleGetter1 employeesGetter() {
return DeptEmplEmployeesSalaries::employees;
}
public static TupleGetter2 salariesGetter() {
return DeptEmplEmployeesSalaries::salaries;
}
}
The DeptEmplEmployeesSalaries is simply an immutable holder of the three entities, except it has three additional “getter” methods that can be applied to extract the individual entities. Note that they return TupleGetter, which allows joins and aggregations to use optimized versions compared to just using an anonymous lambda or method reference.
Now that we have the custom tuple, we can easily define our Join relation:
private Join joinDeptEmpSal(Departments dept) {
// The JoinComponent is needed when creating joins
JoinComponent jc = speedment.getOrThrow(JoinComponent.class);
return jc.from(DeptEmpManager.IDENTIFIER)
// Only include data from the selected department
.where(DeptEmp.DEPT_NO.equal(dept.getDeptNo()))
// Join in Employees with Employees.EMP_NO equal DeptEmp.EMP_NO
.innerJoinOn(Employees.EMP_NO).equal(DeptEmp.EMP_NO)
// Join Salaries with Salaries.EMP_NO) equal Employees.EMP_NO
.innerJoinOn(Salaries.EMP_NO).equal(Employees.EMP_NO)
// Filter out historic salary data
.where(Salaries.TO_DATE.greaterOrEqual(currentDate))
.build(DeptEmplEmployeesSalaries::new);
}
When we are building our Join expression, we start off by first using the DeptEmp table (as we recall, this is the many-to-many relation table between Departments and Employees). For this table, we apply a where() statement so that we are able to filter out only those many-to-many relation that belongs to the department we want to appear in the join.
Next, we join in the Employees table and specify a join relation where newly joined table’s column Employees.EMP_NO equal DeptEmp.EMP_NO.
After that, we join in the Salaries table and specify another join relation where Salaries.EMP_NO equal Employees.EMP_NO. For this particular join relation, we also apply a where() statement so that we filter out salaries that are current (and not historic, past salaries for an employee).
Finally, we call the build() method and defines the constructor of our DeptEmplEmployeesSalaries class that holds the three entities DeptEmp, Employees, and Salaries.
Counting the Number of Employees for a Department
Armed with the join method above, it is very easy to count the number of Employees for a certain department in the Join stream. This is how we can go about:
public long countEmployees(Departments department) {
return joinDeptEmpSal(department)
.stream()
.count();
}
Calculating a Salary Distribution Aggregation
By using the built-in Speedment Aggregator, we can express aggregations quite easily. The Aggregator can consume regular Java Collections, Java Streams from a single table as well as Join Streams without constructing intermediary Java objects on the heap. This is because it stores all its data structures completely off-heap.
We first start with creating a “result object” in the form of a simple POJO that is going to be used as a bridge between the completed off-heap aggregation and the Java heap world:
Now that we have the POJO, we are able to build a method that returns an Aggregation like this:
public Aggregation freqAggregation(Departments dept) {
Aggregator aggregator =
// Provide a constructor for the "result object"
Aggregator.builder(GenderIntervalFrequency::new)
// Create a key on Gender
.firstOn(DeptEmplEmployeesSalaries.employeesGetter())
.andThen(Employees.GENDER)
.key(GenderIntervalFrequency::setGender)
// Create a key on salary divided by 1,000 as an integer
.firstOn(DeptEmplEmployeesSalaries.salariesGetter())
.andThen(Salaries.SALARY.divide(SALARY_BUCKET_SIZE).asInt())
.key(GenderIntervalFrequency::setInterval)
// For each unique set of keys, count the number of entitites
.count(GenderIntervalFrequency::setFrequency)
.build();
return joinDeptEmpSal(dept)
.stream()
.parallel()
.collect(aggregator.createCollector());
}
This requires a bit of explanation. When we invoke the Aggregator.builder() method, we provide a constructor of the “result object” that we are using as a bridge between the off-heap and the on-heap world.
After we have a builder, we can start defining our aggregation and usually the clearest way is to start off with the keys (i.e. groups) that we are going to use in the aggregation. When we are aggregating results for a Join operation, we first need to specify which entity we want to extract our key from. In this case, we want to use the employee’s gender so we invoke .firstOn(eptEmplEmployeesSalaries.employeesGetter()) which will extract the Employees entity from the tuple. Then we apply .andThen(Employees.GENDER) which, in turn, will extract the gender property from theEmployees entity. The key() method takes a method reference for a method that is going to be called once we want to actually read the result of the aggregation.
The second key is specified in much the same way, only here we apply the .firstOn(DeptEmplEmployeesSalaries.salariesGetter()) method to extract the Salaries entity instead of the Employees entity. When we then apply the .andThen() method we are using an expression to convert the salary so it is divided by 1,000 and seen as an integer. This will create separate income brackets for every thousand dollars in salary.
The count() operator simply says that we want to count the occurrence of each key pair. So, if there are two males that have an income in the 57 bracket (i.e. a salary between 57,000 and 57,999) the count operation will count those two for those keys.
Finally, in the line starting with return, the actual computation of the aggregation will take place whereby the application will aggregate all the thousands of salaries in parallel and return an Aggregation for all the income data in the database. An Aggregation can be thought of as a kind of List with all the keys and values, only that the data is stored off-heap.
Adding In-JVM-Memory Acceleration
By just adding two lines to our application, we can get a high-performance application with in-JVM-memory acceleration.
Speedment speedment = new EmployeesApplicationBuilder()
.withUsername("...") // Username need to match database
.withPassword("...") // Password need to match database
.withBundle(InMemoryBundle.class) // Add in-JVM-acceleration
.build();
// Load a snapshot of the database into off-heap JVM-memoory
speedment.get(DataStoreComponent.class)
.ifPresent(DataStoreComponent::load);
The InMemoryBundle allows the entire database to be pulled in to the JVM using off-heap memory and then allows Streams and Joins to be executed directly from RAM instead of using the database. This will improve performance and will make the Java application work more deterministically. Having data off-heap also means that data will not affect Java Garbage Collect allowing huge JVMs to be used with no GC impact.
Thanks to the In-memory acceleration, even the biggest department with over 60,000 salaries will be computed in less than 100 ms on my laptop. This will ensure that our UI stays responsive.
Building the UI in Java
Now that the data model is finished, we move on to the visual aspects of the application. This is as mentioned earlier done utilizing Vaadin, a framework which allows implementation of HTML5 web user interfaces using Java. The Vaadin framework is built on the notion of components, which could be a layout, a button or anything in between. The components are modeled as objects which can be customized and styled in an abundance of ways.
The image above describes the structure of the GUI we intend to build for our DataModel. It constitutes of nine components, out of which five read information from the database and present it to the user while the rest are static. Without further ado, let’s start configuring the UI.
A sketch showing the hierarchy of the components included in our GUI.
The Vaadin UI Layer
To integrate Vaadin in the application, we downloaded a starter pack from Vaadin to set up a simple project base. This will automatically generate a UI class which is the base of any Vaadin application.
@Theme("mytheme")
public class EmployeeUI extends UI {
@Override // Called by the server when the application starts
protected void init(VaadinRequest vaadinRequest) { }
// Standard Vaadin servlet which was not modified
@WebServlet(urlPatterns = "/*", name = "MyUIServlet", asyncSupported = true)
@VaadinServletConfiguration(ui = EmployeeUI.class, productionMode = false)
public static class MyUIServlet extends VaadinServlet { }
}
The overridden init() is called from the server when the application is started, hence this is where we soon will state what actions are to be performed when the application is running. EmployeeUI also contains MyUIServlet, which is a standard servlet class used for deployment. No modification was needed for the sake of this application.
Creation of Components
As mentioned above, all of our components will be declared in init(). This is not suggested as a best practice but works well for an application with a small scope. Although, we would like to collectively update the majority of the components from a separate method when a new department is selected, meaning those will be declared as instance variables along the way.
Application Title
We start off simple by creating a Label for the title. Since its value will not change, it can be locally
declared.
Label appTitle = new Label("Employee Application");
appTitle.setStyleName("h2");
In addition to a value, we give it a style name. Style names allow full control of the appearance of the component. In this case, we use the built-in Vaadin Valo Theme and select a header styling simply by setting the parameter to “h2”. This style name can also be used to target the component with custom CSS (for example .h2 { font-family: ‘Times New Roman; }).
Text Fields
To view the number of employees and the average salary for the selected department, we use the TextField component. TextField is mainly used for user text input, although by setting it to read-only, we prohibit any user interaction. Notice how two style name can be used by separating them with a blank space.
noOfEmployees = new TextField("Number of employees"); // Instance variable
noOfEmployees.setReadOnly(true);
// Multiple style names are separated with a blank space
noOfEmployees.setStyleName("huge borderless");
This code is duplicated for the averageSalary TextField although with a different caption and variable name.
Charts
Charts can easily be created with the Vaadin Charts addon, and just like any other component, a chart Java Object with corresponding properties. For this application, we used the COLUMN chart to view gender balance and an AREASPLINE for the salary distribution.
/* Column chart to view balance between female and male employees at a certain department */
genderChart = new Chart(ChartType.COLUMN);
Configuration genderChartConfig = genderChart.getConfiguration();
genderChartConfig.setTitle("Gender Balance");
// 0 is only used as an init value, chart is populated with data in updateUI()
maleCount = new ListSeries("Male", 0);
femaleCount = new ListSeries("Female", 0);
genderChartConfig.setSeries(maleCount, femaleCount);
XAxis x1 = new XAxis();
x1.setCategories("Gender");
genderChartConfig.addxAxis(x1);
YAxis y1 = new YAxis();
y1.setTitle("Number of employees");
genderChartConfig.addyAxis(y1);
Most of the properties associated with a chart are controlled by its configuration which is retrieved with getConfiguration(). This is then used to add a chart title, two data series, and the axis properties. For the genderChart, a simple ListSeries was used to hold the data because of its simple nature. Although for the salaryChart below, a DataSeries was chosen since it handles a larger and more complicated data sets.
The declaration of the salaryChart is very similar to that of the genderChart. Likewise, the configuration is retrieved and used to add a title and axes.
salaryChart = new Chart(ChartType.AREASPLINE);
Since both charts display data for male and females we decide to use a shared legend that we fix in the upper right corner of the salaryChart.
Lastly, we add two empty DataSeries which will be populated with data at a later stage.
// Instance variables to allow update from UpdateUI()
maleSalaryData = new DataSeries("Male");
femaleSalaryData = new DataSeries("Female");
salaryChartConfig.setSeries(maleSalaryData, femaleSalaryData);
Department Selector
The final piece is the department selector which controls the rest of the application.
/* Native Select component to enable selection of Department */
NativeSelect<Departments> selectDepartment = new NativeSelect<>("Select department");
selectDepartment.setItems(DataModel.departments());
selectDepartment.setItemCaptionGenerator(Departments::getDeptName);
selectDepartment.setEmptySelectionAllowed(false);
We implement it as a NativeSelect<T> component that calls departments(), which was previously defined in DataModel, to retrieve a Stream of Departments from the database. Next, we specify what property of Department to display in the dropdown list (default is toString()).
Since we do not allow empty selections, we set the defaultDept to the first element of the Department Stream. Note that the defaultDept is stored as a variable for later use.
/* Default department to use when starting application */
final Departments defaultDept = DataModel.departments().findFirst().orElseThrow(NoSuchElementException::new);
selectDepartment.setSelectedItem(defaultDept);
Adding the Components to the UI
So far we have only declared the components without adding them to the actual canvas. To be displayed in the application they all need to be added to the UI. This is usually done by attaching them to a Layout. Layouts are used to create a structured hierarchy and can be nested into one and other.
HorizontalLayout contents = new HorizontalLayout();
contents.setSizeFull();
VerticalLayout menu = new VerticalLayout();
menu.setWidth(350, Unit.PIXELS);
VerticalLayout body = new VerticalLayout();
body.setSizeFull();
As revealed in the code above, three layouts were used for this purpose, one horizontal and two vertical. Once the layouts are defined we can add the components.
menu.addComponents(appTitle, selectDepartment, noOfEmployees, averageSalary);
body.addComponents(genderChart, salaryChart);
contents.addComponent(menu);
// Body fills the area to the right of the menu
contents.addComponentsAndExpand(body);
// Adds contents to the UI
setContent(contents);
Components appear in the UI in the order they are added. For a VerticalLayout such as the menu, this means from top to bottom. Notice how the HorizontalLayout contents hold the two VerticalLayouts, placing them next to each other. This is necessary because the UI itself can hold only one component, namely contents which holds all components as one unit.
Reflecting the DataModel in the UI
Now that all visuals are in place, it is time to let them reflect the database content. This means we need to add values to the components by retrieving information from the DataModel. Bridging between our data model and EmployeeUI will be done by handling events from selectDepartment. This is accomplished by adding a selection listener as follows in init():
Since updateUI() was not yet defined, that is our next task.
private void updateUI(Departments dept) { }
Here is a quick reminder of what we want updateUI() to accomplish: When a new department is selected we want to calculate and display the total number of employees, the number of males and females, the total average salary and the salary distribution for males and females for that department.
Conveniently enough, we designed our DataModel with this in mind, making it easy to collect the information from the database.
The sum of the males and females gives the total number of employees. averageSalary() returns a Double which is cast to an int. Both values are formatted as a String before being passed to the text fields.
We can also use the Map counts to populate the first graph by retrieving the separate counts for male and female.
final List<DataSeriesItem> maleSalaries = new ArrayList<>();
final List<DataSeriesItem> femaleSalaries = new ArrayList<>();
DataModel.freqAggregation(dept)
.streamAndClose()
.forEach(agg -> {
(agg.getGender() == Gender.F ? femaleSalaries : maleSalaries)
.add(new DataSeriesItem(agg.getInterval() * 1_000, agg.getFrequency()));
});
Our DataModel provides an Aggregation which we can think of as a list containing tuples of a gender, a salary and a corresponding salary frequency (how many persons share that salary). By streaming over the Aggregation we can separate male and female data in two Lists containing DataSeriesItems. A DataSeriesItem is in this case used like a point with an x- and y-value.
Before adding the data to the chart, we sort it in rising order of the x-values, otherwise, the graph will look very chaotic. Now our two sorted List<DataSeriesItem> will fit perfectly with the DataSeries of salaryChart.
Since we are changing the whole data set rather than just a single point, we set the data for our DataSeries to the Lists of x and ys we just created. Unlike a change in a ListSeries, this will not trigger an update of the chart, meaning we have to force a manual update with drawChart().
Lastly, we need to fill the components with default values when the application starts. This can now be done by calling updateUI(defaultDept) at the end of init().
Styling in Java
Vaadin offers complete freedom when it comes to adding a personal feel to components. Since this is a pure Java application only the styling options available in their Java framework were used, although CSS styling will naturally give total control of the visuals.
A comparison before and after applying the ChartTheme.
To give our charts a personal touch we created a class ChartTheme which extends Theme. In the constructor, we defined what properties we would like to change, namely the color of the data series, background, legend, and text.
public class ChartTheme extends Theme {
public ChartTheme() {
Color[] colors = new Color[2];
colors[0] = new SolidColor("#5abf95"); // Light green
colors[1] = new SolidColor("#fce390"); // Yellow
setColors(colors);
getChart().setBackgroundColor(new SolidColor("#3C474C"));
getLegend().setBackgroundColor(new SolidColor("#ffffff"));
Style textStyle = new Style();
textStyle.setColor(new SolidColor("#ffffff")); // White text
setTitle(textStyle);
}
}
Then theme was applied to all charts by adding this row to init():
ChartOptions.get().setTheme(new ChartTheme());
Conclusion
We have used Speedment to interface the database and Vaadin to interface the end user. The only code needed in between is just a few Java Streams constructs that declaratively describe the application logic, which grants minimal time to market and cost of maintenance.