Making Pivot Tables with Java Streams from Databases
Raw data from database rows and tables does not provide so much insight to human readers. Instead, humans are much more likely to see data patterns if we perform some kind of aggregation on the databefore it is being presented to us. A pivot table is a specific form of aggregation where we can apply operations like sorting, averaging, or summing, and also often grouping of columns values.
In this article, I will show how you can compute pivot tables of data from a database in pure Java without writing a single line of SQL. You can easily reuse and modify the examples in this article to fit your own specific needs.
In the examples below, I have used open-source Speedment, which is a Java Stream ORM, and the open-source Sakila film database content for MySQL. Speedment works for any major relational database type such as MySQL, PostgreSQL, Oracle, MariaDB, Microsoft SQL Server, DB2, AS400 and more.
Pivoting
I will construct aMap
of Actor
objects and, for each Actor
, a corresponding List
of film ratings of films that a particular Actor
has appeared in. Here is an example of how a pivot entry for a specific Actor
might look like expressed verbally:
“John Doe participated in 9 films that were rated ‘PG-13’ and 4 films that were rated ‘R’”.
We are going to compute pivot values for all actors in the database. The Sakila database has three tables of interest for this particular application:
1) “film” containing all the films and how the films are rated (e.g. “PG-13”, “R”, etc.).
2) “actors” containing (made up) actors (e.g. “MICHAEL BOLGER”, “LAURA BRODY”, etc.).
3) “film_actor” which links films and actors together in a many-to-many relation.
The first part of the solution involves joining these three tables together. Joins are created using Speedment’s
JoinComponent
which can be obtained like this:
// Visit https://github.com/speedment/speedment // to see how a Speedment app is created. It is easy! Speedment app = …; JoinComponent joinComponent = app.getOrThrow(JoinComponent.class);
Once we have the
JoinComponent
, we can start defining Join relations that we need to compute our pivot table:
Join<Tuple3<FilmActor, Film, Actor>> join = joinComponent .from(FilmActorManager.IDENTIFIER) .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID) .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID) .build(Tuples::of);The
build()
takes a method reference Tuples::of
that will resolve to a constructor that takes three entities of type; FilmActor
, Film
and Actor
and that will create a compound immutable Tuple3
comprising those specific entities. Tuples are built into Speedment.
Armed with our Join object we now can create our pivot Map using a standard Java Stream obtained from the Join object:
Map<Actor, Map<String, Long>> pivot = join.stream() .collect( groupingBy( // Applies Actor as a first classifier Tuple3::get2, groupingBy( // Applies rating as second level classifier tu -> tu.get1().getRating().get(), counting() // Counts the elements ) ) );
Now that the pivot
Map
has been computed, we can print its content like this:
// pivot keys: Actor, values: Map<String, Long> pivot.forEach((k, v) -> { System.out.format( "%22s %5s %n", k.getFirstName() + " " + k.getLastName(), V ); });This will produce the following output:
MICHAEL BOLGER {PG-13=9, R=3, NC-17=6, PG=4, G=8} LAURA BRODY {PG-13=8, R=3, NC-17=6, PG=6, G=3} CAMERON ZELLWEGER {PG-13=8, R=2, NC-17=3, PG=15, G=5} ...
Mission completed! In the code above, the method
Tuple3::get2
will retrieve the third element from the tuple (an Actor
) whereas the method tu.get1()
will retrieve the second element from the tuple (a Film
).
Speedment will render SQL code automatically from Java and convert the result to a Java Stream. If we enable Stream logging, we can see exactly how the SQL was rendered:
SELECT A.`actor_id`,A.`film_id`,A.`last_update`, B.`film_id`,B.`title`,B.`description`, B.`release_year`,B.`language_id`,B.`original_language_id`, B.`rental_duration`,B.`rental_rate`,B.`length`, B.`replacement_cost`,B.`rating`,B.`special_features`, B.`last_update`, C.`actor_id`,C.`first_name`, C.`last_name`,C.`last_update` FROM `sakila`.`film_actor` AS A INNER JOIN `sakila`.`film` AS B ON (B.`film_id` = A.`film_id`) INNER JOIN `sakila`.`actor` AS C ON (C.`actor_id` = A.`actor_id`)
Joins with Custom Tuples
As we noticed in the example above, we have no actual use of theFilmActor
object in the Stream since it is only used to link Film
and Actor
entities together during the Join phase. Also, the generic Tuple3
had general get0()
, get1()
and get2()
methods that did not say anything about what they contained.
All this can be fixed by defining our own custom “tuple” called
ActorRating
like this:
private static class ActorRating { private final Actor actor; private final String rating; public ActorRating(FilmActor fa, Film film, Actor actor) { // fa is not used. See below why this.actor = actor; this.rating = film.getRating().get(); } public Actor actor() { return actor; } public String rating() { return rating; } }
When Join objects are built using the
build()
method, we can provide a custom constructor that we want to apply on the incoming entities from the database. This is a feature that we are going use as depicted below:
Join<ActorRating> join = joinComponent .from(FilmActorManager.IDENTIFIER) .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID) .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID) .build(ActorRating::new); // Use a custom constructor Map<Actor, Map<String, Long>> pivot = join.stream() .collect( groupingBy( ActorRating::actor, groupingBy( ActorRating::rating, counting() ) ) );In this example, we proved a class with a constructor (the method reference
ActorRating:new
gets resolved to new ActorRating(fa, actor, film)
) that just discards the linking FilmActor
object altogether. The class also provided better names for its properties which made the code more readable.
The solution with the custom ActorRating
class will produce exactly the same output result as the first example but it looks much nicer when used. I think the effort of writing a custom tuple is worth the extra effort over using generic Tuples in most cases.
Using Parallel Pivoting
One cool thing with Speedment is that it supports the Stream methodparallel()
out-of-the-box. So, if you have a server with many CPUs, you can take advantage of all those CPU cores when running database queries and joins. This is how parallel pivoting would look like:
Map<Actor, Map<String, Long>> pivot = join.stream() .parallel() // Make our Stream parallel .collect( groupingBy( ActorRating::actor, groupingBy( ActorRating::rating, counting() ) ) );We only have to add a single line of code to get parallel aggregation. The default parallel split strategy kicks in when we reach 1024 elements. Thus, parallel pivoting will only take place on tables or joins larger than this. It should be noted that the Sakila database only contains 1000 films, so we would have to run the code on a bigger database to actually be able to benefit from parallelism.
Take it for a Spin!
In this article, we have shown how you can compute pivot data from a database in Java without writing a single line of SQL code. Visit Speedment open-source on GitHub to learn more.Read more about other features in the the User's Guide.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.