Wednesday, October 30, 2019

Become a Master of Java Streams - Part 4: Database Streams

SQL has always been a declarative language whereas Java for a long time has been imperative. Java streams have changed the game. Code your way through this hands-on-lab article and learn how Java streams can be used to perform declarative queries to an RDBMS database, without writing a single line of SQL code. You will discover, there is a remarkable similarity between the verbs of Java streams and SQL commands.

This article is the fourth out of five, complemented by a GitHub repository containing instructions and exercises to each unit.
Part 1: Creating Streams
Part 2: Intermediate Operations
Part 3: Terminal Operations
Part 4: Database Streams
Part 5: Creating a Database Application Using Streams

Database Streams 

When you familiarized yourself with the operations of Streams, you may have noticed a resemblance to the SQL constructs. Some of them have a more or less a direct mapping to Stream operations, such as LIMIT and COUNT. This resemblance is utilized by the open-source project Speedment to provide type-safe access to any relational database using pure Java.



This table shows how Speedment maps between SQL and Java Streams.

We are contributors to the Speedment open-source project and we will describe how Speedment allows us to use a database as the stream source and feed the pipeline with rows from any of the database tables.



As depicted in the visualization above, Speedment will establish a connection to the database and can then pass data to the application. There is no need to write any code for the database entries since Speedment analyses the underlying database and automatically generates all the required entity classes for the domain model. It saves a lot of time when you don’t have to write and maintain entity classes by hand for each table you want to use.

Sakila Database 

For the sake of this article, as well as the exercises, we use the MySQL example database Sakila as our data source. The Sakila database models an old-fashioned movie rentals business and therefore contains tables such as Film and Actor. An instance of the database is deployed in the cloud and is open for public access.

Speedment Manager 

In Speedment, the handle to a database table is a called a Manager. The managers are part of the automatically generated code.












A Manager acts as a handle to a database table and can act as a stream source. In this case, every row corresponds to an instance of Film. 


A Manager in Speedment is instantiated by calling:

FilmManager films = speedment.getOrThrow(FilmManager.class);

Note: speedment is an instance that can be obtained from an ApplicationBuilder (more on this topic in the next article).

If the FilmManager::stream is called, the result is a Stream to which we are free to apply any intermediate or terminal operations. For starters, we collect all rows in a list.
 
List<Film> allFilms = films.stream().collect(toList());
FilmImpl { filmId = 1, title = ACADEMY DINOSAUR, …
FilmImpl { filmId = 2, title = ACE GOLDFINGER, …
FilmImpl { filmId = 3, title = ADAPTATION HOLES, …
…

Filtering and Counting

Let’s look at a simple example that outputs the number of films having the rating “PG-13”. Just like a regular Stream, we can filter out the films with the correct rating, and then count these entries.
 
long pg13FilmCount = films.stream()
   .filter(Film.RATING.equal("PG-13"))
   .count();

pg13FilmCount: 195

One important property that follows with Speedment’s custom implementation of Streams is that the streams are able to optimize their own pipeline by introspection. It may look like the Stream will iterate over all rows of a table, but this is not the case. Instead, Speedment is able to translate the pipeline to an optimized SQL query that is passed on to the database. This means only relevant database entries are pulled into the Stream. Thus, in the example above, the stream will be automatically rendered to SQL similar to “SELECT … FROM film WHERE rating = ‘PG-13’ ”

This introspection requires that any use of anonymous lambdas (which do not contain any metadata that relates to the targeted column) are replaced with Predicates from Speedment Fields. In this case Film.RATING.equal(“PG-13”) returns a Predicate that will be tested on each Film and return true if and only if that Film has a Rating that is PG-13.

Although, this does not prevent us from expressing the predicate as:
 
    .filter(f -> f.getRating().equals(“PG-13”))

but this would force Speedment to fetch all the rows in the table and then apply the predicate, hence it is not recommended.

Finding the Longest Film

Here is an example that finds the longest film in the database using the max-operator with the Field Film.LENGTH:
 
Optional<Film> longestFilm = films.stream()
   .max(Film.LENGTH);

longestFilm: 
Optional[FilmImpl {filmId = 141, title = CHICAGO NORTH, length = 185, ...}]

Finding Three Short Films

Locating three short films (we defined short as <= 50 minutes) can be done by filtering away any films that are 50 minutes or shorter and picking the three first results. The predicate in the example looks at the value of the column “length” and determines if it is less than or equal to 50.

List<Film> threeShortFilms = films.stream()
 .filter(Film.LENGTH.lessOrEqual(50))
 .limit(3)
 .collect(toList());
threeShortFilms: [
    FilmImpl { filmId = 2, length = 48,..}, 
    FilmImpl { filmId = 3, length = 50, … }, 
    FilmImpl { filmId = 15, length = 46, ...}]

Pagination with Sorting

If we were to display all the films on a website or in an application, we would probably prefer to paginate the items, rather than loading (possibly) thousands of entries at once. This can be accomplished by combining the operation skip() and limit(). In the example below, we collect the content of the second page, assuming every “page” holds 25 entries. Recall that Streams do not guarantee a certain order of the elements, which means that we need to define an order with the sorted-operator for this to work as intended.
 
List<Film> filmsSortedByLengthPage2 = films.stream()
 .sorted(Film.LENGTH)
 .skip(25 * 1)
 .limit(25)
 .collect(toList());
filmsSortedByLengthPage2: 
[FilmImpl { filmId = 430, length = 49, …}, …]

Note: Finding the content of the n:th page is done by skipping (25 * (n-1)).
Note2: This stream will be automatically rendered to something like “SELECT ... FROM film ORDER BY length ASC LIMIT ? OFFSET ?, values:[25, 25]”

Films Starting with “A” Sorted by Length

We can easily locate any films starting with the capital letter “A” and sort them according to their length (with the shortest film first) like this:
 
List<Film> filmsTitleStartsWithA = films.stream()
 .filter(Film.TITLE.startsWith("A"))
 .sorted(Film.LENGTH)
 .collect(Collectors.toList());
filmsTitleStartsWithA: [
  FilmImpl { filmId=15, title=ALIEN CENTER, …, rating=NC-17, length = 46,
  FilmImpl { filmId=2, title=ACE GOLDFINGER, …, rating=G, length = 48,
… ]

Computing Frequency Tables of Film Length

We can also utilize the groupingBy-operator to sort the films in buckets depending on their lengths and count the total number of films in each bucket. This will create a so-called frequency table of film length.
 
Map<Short, Long> frequencyTableOfLength = films.stream()
 .collect(Collectors.groupingBy(
  Film.LENGTH.asShort(),
  counting()
 ));
frequencyTableOfLength: {46=5, 47=7, 48=11, 49=5, … }

Exercises

For this week’s exercises, you do not need to worry about connecting a database of your own. Instead, we have already provided a connection to an instance of the Sakila database in the cloud. As usual, the exercises can be located in this GitHub repo. The content of this article is sufficient to solve the fourth unit which is called MyUnit4Database. The corresponding Unit4Database Interface contains JavaDocs which describe the intended implementation of the methods in MyUnit4Database.

 
public interface Unit4Database {

   /**
    * Returns the total number of films in the database.
    *
    * @param films manager of film entities
    * @return the total number of films in the database
    */
   long countAllFilms(FilmManager films);
The provided tests (e.g. Unit4MyDatabaseTests) will act as an automatic grading tool, letting you know if your solution was correct or not.

Next Article

So far, we have only scraped the surface of database streams. The next article will allow you to write standalone database applications in pure Java. Happy coding!

Authors

Per Minborg
Julia Gustafsson

Resources

GitHub Opensource Project Speedment
Speedment Stream ORM Initializer
GitHub Repository "hol-streams"
Article Part 1: Creating Streams
Article Part 2: Intermediate Operations
Article Part 3: Terminal Operations

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.