Tuesday, November 12, 2019

Become a Master of Java Streams - Part 5: Turn Joined Database Tables into a Stream

Is it possible to turn joined database tables into a Java Stream? The answer is yes. Since we got this question so many times, we decided to throw in another hands-on-lab article explaining how to perform more advanced Stream Joins. So here you are, the fifth article out of six, complemented by a GitHub repository containing instructions and exercises to each unit.

Part 1: Creating Streams
Part 2: Intermediate Operations
Part 3: Terminal Operations
Part 4: Database Streams
Part 5: Turn Joined Database Tables into Streams
Part 6: Creating a Database Application Using Streams

Stream JOINs

In the last article, we pointed out the great resemblance between Streams and SQL constructs. Although, the SQL operation JOIN lacks a natural mapping in the general case. Therefore, Speedment leverages its own JoinComponent to join up to 10 tables (using INNER JOIN, RIGHT JOIN, LEFT JOIN or CROSS JOIN) in a type-safe way. Before we introduce the JoinComponent in more depth, we will elaborate on the similarities between individual tables and joins.

We previously used a Speedment Manager as a handle to a database table. This process is visualized below:

A Manager acts as a handle to a database table and can act as a stream source. In this case, every row corresponds to an instance of Film.

Now that we wish to retrieve data from multiple tables, the Manager on its own is not sufficient. An SQL JOIN-query outputs a virtual table that combines data from multiple tables in different ways (e.g. depending on the join-type and WHERE-clauses). In Speedment, that virtual table is represented as a Join<T> object holding tuples of type T.

Join Component

To retrieve a Join-object we need the previously mentioned JoinComponent which uses a builder pattern. The resulting Join-objects are reusable and acts as handles to "virtual join tables", as described by this image:



The JoinComponent creates a Join-object which acts as a handle to a virtual table (the result of the join) and can act as a stream source. In this case, every row corresponds to an instance of Tuple2.
Now that we have introduced the notion of the JoinComponent we can start demonstrating how it is used.


Many-to-one

We start by looking at a Many-to-One relationship where multiple rows from a first table can match the same single row in a second table. For example, a single language may be used in many films. We can combine the two tables Film and Language using the JoinCompontent:
Join<Tuple2<Film, Language>> join = joinComponent
    .from(FilmManager.IDENTIFIER)
    .innerJoinOn(Language.LANGUAGE_ID).equal(Film.LANGUAGE_ID)
    .build(Tuples::of);

Basically, we start with the Film table and perform an INNER JOIN with the Language table on rows that have matching language_id:s.

We can then use the Join-object to stream over the resulting Tuples and print them all out for display. As always with Streams, no specific order of the elements is guaranteed even if the same join-element is reused.
 
join.stream()
    .forEach(System.out::println);

Tuple2Impl {FilmImpl { filmId = 1, title = ACADEMY DINOSAUR, ... }, LanguageImpl { 
languageId = 1, name = English, ... }}

Tuple2Impl {FilmImpl { filmId = 2, title = ACE GOLDFINGER, ... }, LanguageImpl {
languageId = 1, name = English, ... }}

Tuple2Impl {FilmImpl { filmId = 3, title = ADAPTATION HOLES, ... }, LanguageImpl {
languageId = 1, name = English, ... }}
…

Many-to-Many

A Many-to-Many relationship is defined as a relationship between two tables where many multiple rows from a first table can match multiple rows in a second table. Often a third table is used to form these relations. For example, an actor may participate in several films and a film usually have several actors.

The relation between films and actors in Sakila is described by the FilmActor table which references films and actors using foreign keys. Hence, if we would like to relate each Film entry to the actors who starred in that movie we need to join all three tables:
 
Join<Tuple3<FilmActor, Film, Actor>> join = joinComponent
    .from(FilmActorManager.IDENTIFIER)
    .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
    .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
    .build(Tuples::of);

We start with the table describing the relation between the Film and Actor and perform and INNER JOIN with both Film and Actor on matching FILM_ID:s and ACTOR_ID:s respectively.


Collect Join Stream to Map

Our Join-object can now be used to create a Map that correlates a Film with a List of the starring Actor:s. Since the elements of our stream are Tuples we need to point to the desired entries. This is done using zero-indexed getters (get0() referencing FilmActor and so on).
 
Map<Film, List<Actor>> actorsInFilms = join.stream()
    .collect(
        groupingBy(Tuple3::get1,           
            mapping(Tuple3::get2, toList())   
        )
    );
Lastly we print the entries to display the name of the films and actors.
 
actorsInFilms.forEach((f, al) ->
    System.out.format("%s : %s%n",
        f.getTitle(),
        al.stream()
            .sorted(Actor.LAST_NAME)
            .map(a -> a.getFirstName() + " " + a.getLastName())
            .collect(joining(", ")
        )
     )
);
WONDERLAND CHRISTMAS : HARRISON BALE, CHRIS BRIDGES, HUMPHREY GARLAND, WOODY JOLIE, CUBA OLIVIER
BUBBLE GROSSE : VIVIEN BASINGER, ROCK DUKAKIS, MENA HOPPER
OPUS ICE : DARYL CRAWFORD, JULIA FAWCETT, HUMPHREY GARLAND, SEAN WILLIAMS
…

Filtering Tables

If we know initially that we are only interested in a subset of the Film entries, it is more efficient to get rid of these instances as we define the Join-object. This is done using the .where()-operator which is the equivalent to a filter() on a stream (and maps to the SQL keyword WHERE). As a filter it takes a Predicate that evaluates to true or false and should be expressed using Speedment Fields for optimization. Here we want to find the language of the films with titles beginning with an “A”:
 
Join<Tuple2<Film, Language>> join = joinComponent
    .from(FilmManager.IDENTIFIER)
        .where(Film.TITLE.startsWith(“A”))
    .innerJoinOn(Language.LANGUAGE_ID).equal(Film.LANGUAGE_ID)
    .build(Tuples::of);

If further filtering is needed, it is possible to stack any number of .where()-operations as they are combined with the SQL keyword AND under the hood.

Specialized Constructors

Sofar we have had to deal with the fairly abstract getters of the tuples (get0, get1 and so on). Although, upon building our Join-object we can provide any constructor to specialized objects. In the examples shown above, we have been interested in the title of the films and the name of the actors. That allows us to define our own object TitleActorName as such:
 
final class TitleActorName {
    
    private final String title;
    private final String actorName;

    TitleActorName(Film film, Actor actor) {
       this.title = film.getTitle();
       this.actorName = actor.getFirstName() + actor.getLastName();
    }
    public String title() {
        return title;
    }
    public String actorName() {
        return actorName;
    }
    @Override
    public String toString() {
        return "TitleLanguageName{" + "title=" + title + ", actorName=" + actorName + '}';
    }
}
We then provide the constructor of our own object to the Join builder and discard the linking FilmActor instance since it’s not used:

 
Join<TitleActorName> join = joinComponent
    .from(FilmActorManager.IDENTIFIER)
    .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
    .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
    .build((fa, f, a) -> new TitleActorName(f, a));

This greatly improves the readability of any operations involving the resulting Join-object.

 
Map<String, List<String>> actorsInFilms = join.stream()
    .collect(
        groupingBy(TitleActorName::title,
            mapping(TitleActorName::actorName, toList())
        )
    );
    actorsInFilms.forEach((f, al) ->
        System.out.format("%s : %s%n", f, al)
    );

Simplifying Types

When a large number of tables are joined, the Java type can be tedious to write (e.g. Tuple5<...>). If you use a more recent version of Java, you can simply omit the type for the local variable like this:

var join = joinComponent
    .from(FilmManager.IDENTIFIER)
        .where(Film.TITLE.startsWith(“A”))
    .innerJoinOn(Language.LANGUAGE_ID).equal(Film.LANGUAGE_ID)
    .build(Tuples::of);
In this case, Java will automatically infer the type to Join<Tuple2<Film, Language>>

If you are using an older Java version, you can inline the join-declaration and the stream operator like this:

joinComponent
    .from(FilmManager.IDENTIFIER)
        .where(Film.TITLE.startsWith(“A”))
    .innerJoinOn(Language.LANGUAGE_ID).equal(Film.LANGUAGE_ID)
    .build(Tuples::of)
    .stream()
    .forEach(System.out::println);

Exercises

This week’s exercises will require combined knowledge from all the previous units and therefore acts as a great follow-up on the previous modules. There is still a connection to an instance of the Sakila database in the cloud so no setup of Speedment is needed. As usual, the exercises can be located in this GitHub repo. The content of this article is sufficient to solve the fifth unit which is called MyUnit5Extra. The corresponding Unit5Extra interface contains JavaDocs which describe the intended implementation of the methods in MyUnit5Extra.
 
public interface Unit5Extra {
/**
 * Creates and returns a new Map with Actors as keys and
 * a List of Films in which they appear as values.
 * <p>
 * The result might look like this:
 *
 * ActorImpl { actorId = 126, firstName = FRANCES, lastName = TOMEI, ... }=[FilmImpl { filmId = 21, title = AMERICAN CIRCUS, ...}, ...]
 * …
 *
 * @param joinComponent for data input
 * @return a new Map with Actors as keys and
 *         a List of Films in which they appear as values
 */
Map<Actor, List<Film>> filmographies(JoinComponent joinComponent);

The provided tests (e.g. Unit5ExtraTest) will act as an automatic grading tool, letting you know if your solution was correct or not.


Next Article

By now we hopefully managed to demonstrate how neat the Stream API is for database queries. The next article will move beyond the realm of movie rentals and allow you to write standalone database applications in pure Java for any data source. Happy coding!

Authors

Per Minborg
Julia Gustafsson

Resources

GitHub Opensource Project Speedment
Speedment Stream ORM Initializer
GitHub Repository "hol-streams"
Article Part 1: Creating Streams
Article Part 2: Intermediate Operations
Article Part 3: Terminal Operations

3 comments:

  1. Hello, The link for the part 4 at the beginning of the article is wrong and point the part3.

    ReplyDelete
    Replies
    1. Hi and thanks for the heads up on the link . I have corrected the link now. Thanks again!

      Delete
  2. Part 4 - https://dzone.com/articles/java-streams-database-streams

    ReplyDelete

Note: Only a member of this blog may post a comment.