Minborg

Minborg
Minborg

Friday, November 15, 2019

Become a Master of Java Streams - Part 6: Creating a New Database Application Using Streams

Have you ever wanted to develop an "express" version of your database application? In this Hands-On Lab article, you will learn a truly easy and straightforward method. The entire Java domain model will be automatically generated for you. You just connect to your existing database and then start developing using Java streams. You will be able to create, for example, a new web application for your existing database in minutes.

This article is the last article in the series on How to Become a Master of Java Streams.

Part 1: Creating Streams
Part 2: Intermediate Operations
Part 3: Terminal Operations
Part 4: Database Streams
Part 5: Turn Joined Database Tables Into a Stream
Part 6: Creating a Database Application Using Streams

So far, you got to experience Speedment in the articles and through the exercises. For brevity, we did not include any descriptions on how to start from scratch but rather wanted you to get a glimpse of what using Java Streams with databases could look like. In this article, we’ll show you how to leverage Speedment for applications running against any of your databases. Setup only takes a few minutes but will save you vasts amounts of time due to the expressiveness of Streams and the provided type-safety.

Getting Started 

To help you configure your project, Speedment provides a project Initializer. Once you fill out the details of your project, it provides you with a zip-file containing a pom.xml with the needed dependencies and a Main.java starter.



The Speedment Initializer can be used to configure a Speedment project. 

Once you have clicked “download”, unzip the file and open the project in your IDE as a Maven project. In IntelliJ, the easiest way to do that is to choose File -> Open and then select the pom.xml-file in the unzipped project folder.

If you rather want to use Speedment in an existing project, configure your project via the Initializer to make sure you get all needed dependencies. Then simply merge the provided pom.xml with your current one and reimport Maven.

As you may recall from the previous articles, Speedment relies on an automatically generated Java domain model. Hence, before we can write our application, we need to generate the required classes. This is done using the Speedment Tool which is started by running mvn speedment:tool in the terminal or by running the same target via the IDE:s built-in Maven menu.

Firstly, you will be asked to register for a free license and connect to your database. A free license can be used for all open-source databases (unlimited use) and commercial databases (up to 500 MB and doesn’t require any billing information).
A free license can be used with all open-source databases (unlimited) and commercial databases (up to 500 MB and does not require billing information.) 

Once you complete the registration, you will be asked to provide credentials for your database (make sure you selected the correct DB-type in the initializer). Either use a local database of your own or run some tests with the Sakila database we used in the exercises.

Sakila Database Credentials 
Type: MariaDB
Host: 35.203.190.83
Port: 3306
Database name: sakila
User: sakila
Password: sakila
Fill out the database credentials to connect to your data source. (Note: Speedment never stores your database password). 

 A click on the “Connect”-button will launch the Speedment Tool. It presents the database structure to the left-hand side and settings for the selected table or column on the right-hand side. In this case, the default settings are sufficient meaning we can go ahead and press “Generate” (If your application doesn’t require all the tables and/or columns you can disable these before generating).

The Speedment Tool visualizes the data structure and allows customizations of the generated code.

Next, Speedment will analyze the database metadata and generate the entire Java domain model. Once this process is completed you are ready to write your application. If you check out the Main.java-file, you will find a project starter containing something like this:

public class Main {
    
    public static void main(final String... args) {

        Speedment app = new MyApplicationBuilder()
            .withUsername("your-dbms-username")
            .withPassword("your-dbms-password")
            .build();

        app.stop();

    }


}

From here, you are ready to build your application using the examples we have provided in the previous articles. Thereby, we can close the circle by fetching a Manager for the Film table (a handle to the content of the film table) by typing:

FilmManager films = app.getOrThrow(FilmManager.class);

Using the Manager we can now query our connected database as we have shown: 



List<Film> filmsTitleStartsWithA = films.stream()
  .filter(Film.TITLE.startsWith("A"))
  .sorted(Film.LENGTH)
  .collect(Collectors.toList());
 
filmsTitleStartsWithA: [
   FilmImpl { filmId=15, title=ALIEN CENTER, …, rating=NC-17, length = 46,
   FilmImpl { filmId=2, title=ACE GOLDFINGER, …, rating=G, length = 48,
… ]

Exercises 

This week there is no associated GitHub repo for you to play with. Instead, we encourage you to integrate Speedment in a new or an existing database application to try out your newly acquired skills.


Extra Exercise

When you are ready with your project, we encourage you to try out HyperStream, especially if you have a large database and want to increase the reading performance.

HyperStream goes beyond Stream and adds in-JVM-memory capabilities which boost application speed by orders of magnitude. You only need to add a few lines of code in your existing pom.xml and your Main.java file:

    .withBundle(InMemoryBundle.class) // add to the app builder

    ...

    // Load data from database into materialized view
    app.getOrThrow(DataStoreComponent.class) .load();

Read more in the user-guide. The Stream API remains the same but performance is vastly increased.

Conclusion

During the past six weeks, we have demonstrated the usefulness of the Java Stream API and how it can be leveraged for writing type-safe database applications in pure Java. If you wish to learn more about Speedment, check out the user guide which also contains a more thorough guide on Java Streams.

Lastly - thank you for taking interest in our article series, it has been truly great to see that many of you have been following along with the provided exercises. Happy coding!

Authors

Per Minborg
Julia Gustafsson

Resources

Further reading about Speedment Stream JOINs 
Speedment Manual 
Speedment Initializer 
Speedment on GitHub

Tuesday, November 12, 2019

Become a Master of Java Streams - Part 5: Turn Joined Database Tables into a Stream

Is it possible to turn joined database tables into a Java Stream? The answer is yes. Since we got this question so many times, we decided to throw in another hands-on-lab article explaining how to perform more advanced Stream Joins. So here you are, the fifth article out of six, complemented by a GitHub repository containing instructions and exercises to each unit.

Part 1: Creating Streams
Part 2: Intermediate Operations
Part 3: Terminal Operations
Part 4: Database Streams
Part 5: Turn Joined Database Tables into Streams
Part 6: Creating a Database Application Using Streams

Stream JOINs

In the last article, we pointed out the great resemblance between Streams and SQL constructs. Although, the SQL operation JOIN lacks a natural mapping in the general case. Therefore, Speedment leverages its own JoinComponent to join up to 10 tables (using INNER JOIN, RIGHT JOIN, LEFT JOIN or CROSS JOIN) in a type-safe way. Before we introduce the JoinComponent in more depth, we will elaborate on the similarities between individual tables and joins.

We previously used a Speedment Manager as a handle to a database table. This process is visualized below:

A Manager acts as a handle to a database table and can act as a stream source. In this case, every row corresponds to an instance of Film.

Now that we wish to retrieve data from multiple tables, the Manager on its own is not sufficient. An SQL JOIN-query outputs a virtual table that combines data from multiple tables in different ways (e.g. depending on the join-type and WHERE-clauses). In Speedment, that virtual table is represented as a Join<T> object holding tuples of type T.

Join Component

To retrieve a Join-object we need the previously mentioned JoinComponent which uses a builder pattern. The resulting Join-objects are reusable and acts as handles to "virtual join tables", as described by this image:



The JoinComponent creates a Join-object which acts as a handle to a virtual table (the result of the join) and can act as a stream source. In this case, every row corresponds to an instance of Tuple2.
Now that we have introduced the notion of the JoinComponent we can start demonstrating how it is used.


Many-to-one

We start by looking at a Many-to-One relationship where multiple rows from a first table can match the same single row in a second table. For example, a single language may be used in many films. We can combine the two tables Film and Language using the JoinCompontent:
Join<Tuple2<Film, Language>> join = joinComponent
    .from(FilmManager.IDENTIFIER)
    .innerJoinOn(Language.LANGUAGE_ID).equal(Film.LANGUAGE_ID)
    .build(Tuples::of);

Basically, we start with the Film table and perform an INNER JOIN with the Language table on rows that have matching language_id:s.

We can then use the Join-object to stream over the resulting Tuples and print them all out for display. As always with Streams, no specific order of the elements is guaranteed even if the same join-element is reused.
 
join.stream()
    .forEach(System.out::println);

Tuple2Impl {FilmImpl { filmId = 1, title = ACADEMY DINOSAUR, ... }, LanguageImpl { 
languageId = 1, name = English, ... }}

Tuple2Impl {FilmImpl { filmId = 2, title = ACE GOLDFINGER, ... }, LanguageImpl {
languageId = 1, name = English, ... }}

Tuple2Impl {FilmImpl { filmId = 3, title = ADAPTATION HOLES, ... }, LanguageImpl {
languageId = 1, name = English, ... }}
…

Many-to-Many

A Many-to-Many relationship is defined as a relationship between two tables where many multiple rows from a first table can match multiple rows in a second table. Often a third table is used to form these relations. For example, an actor may participate in several films and a film usually have several actors.

The relation between films and actors in Sakila is described by the FilmActor table which references films and actors using foreign keys. Hence, if we would like to relate each Film entry to the actors who starred in that movie we need to join all three tables:
 
Join<Tuple3<FilmActor, Film, Actor>> join = joinComponent
    .from(FilmActorManager.IDENTIFIER)
    .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
    .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
    .build(Tuples::of);

We start with the table describing the relation between the Film and Actor and perform and INNER JOIN with both Film and Actor on matching FILM_ID:s and ACTOR_ID:s respectively.


Collect Join Stream to Map

Our Join-object can now be used to create a Map that correlates a Film with a List of the starring Actor:s. Since the elements of our stream are Tuples we need to point to the desired entries. This is done using zero-indexed getters (get0() referencing FilmActor and so on).
 
Map<Film, List<Actor>> actorsInFilms = join.stream()
    .collect(
        groupingBy(Tuple3::get1,           
            mapping(Tuple3::get2, toList())   
        )
    );
Lastly we print the entries to display the name of the films and actors.
 
actorsInFilms.forEach((f, al) ->
    System.out.format("%s : %s%n",
        f.getTitle(),
        al.stream()
            .sorted(Actor.LAST_NAME)
            .map(a -> a.getFirstName() + " " + a.getLastName())
            .collect(joining(", ")
        )
     )
);
WONDERLAND CHRISTMAS : HARRISON BALE, CHRIS BRIDGES, HUMPHREY GARLAND, WOODY JOLIE, CUBA OLIVIER
BUBBLE GROSSE : VIVIEN BASINGER, ROCK DUKAKIS, MENA HOPPER
OPUS ICE : DARYL CRAWFORD, JULIA FAWCETT, HUMPHREY GARLAND, SEAN WILLIAMS
…

Filtering Tables

If we know initially that we are only interested in a subset of the Film entries, it is more efficient to get rid of these instances as we define the Join-object. This is done using the .where()-operator which is the equivalent to a filter() on a stream (and maps to the SQL keyword WHERE). As a filter it takes a Predicate that evaluates to true or false and should be expressed using Speedment Fields for optimization. Here we want to find the language of the films with titles beginning with an “A”:
 
Join<Tuple2<Film, Language>> join = joinComponent
    .from(FilmManager.IDENTIFIER)
        .where(Film.TITLE.startsWith(“A”))
    .innerJoinOn(Language.LANGUAGE_ID).equal(Film.LANGUAGE_ID)
    .build(Tuples::of);

If further filtering is needed, it is possible to stack any number of .where()-operations as they are combined with the SQL keyword AND under the hood.

Specialized Constructors

Sofar we have had to deal with the fairly abstract getters of the tuples (get0, get1 and so on). Although, upon building our Join-object we can provide any constructor to specialized objects. In the examples shown above, we have been interested in the title of the films and the name of the actors. That allows us to define our own object TitleActorName as such:
 
final class TitleActorName {
    
    private final String title;
    private final String actorName;

    TitleActorName(Film film, Actor actor) {
       this.title = film.getTitle();
       this.actorName = actor.getFirstName() + actor.getLastName();
    }
    public String title() {
        return title;
    }
    public String actorName() {
        return actorName;
    }
    @Override
    public String toString() {
        return "TitleLanguageName{" + "title=" + title + ", actorName=" + actorName + '}';
    }
}
We then provide the constructor of our own object to the Join builder and discard the linking FilmActor instance since it’s not used:

 
Join<TitleActorName> join = joinComponent
    .from(FilmActorManager.IDENTIFIER)
    .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
    .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
    .build((fa, f, a) -> new TitleActorName(f, a));

This greatly improves the readability of any operations involving the resulting Join-object.

 
Map<String, List<String>> actorsInFilms = join.stream()
    .collect(
        groupingBy(TitleActorName::title,
            mapping(TitleActorName::actorName, toList())
        )
    );
    actorsInFilms.forEach((f, al) ->
        System.out.format("%s : %s%n", f, al)
    );

Simplifying Types

When a large number of tables are joined, the Java type can be tedious to write (e.g. Tuple5<...>). If you use a more recent version of Java, you can simply omit the type for the local variable like this:

var join = joinComponent
    .from(FilmManager.IDENTIFIER)
        .where(Film.TITLE.startsWith(“A”))
    .innerJoinOn(Language.LANGUAGE_ID).equal(Film.LANGUAGE_ID)
    .build(Tuples::of);
In this case, Java will automatically infer the type to Join<Tuple2<Film, Language>>

If you are using an older Java version, you can inline the join-declaration and the stream operator like this:

joinComponent
    .from(FilmManager.IDENTIFIER)
        .where(Film.TITLE.startsWith(“A”))
    .innerJoinOn(Language.LANGUAGE_ID).equal(Film.LANGUAGE_ID)
    .build(Tuples::of)
    .stream()
    .forEach(System.out::println);

Exercises

This week’s exercises will require combined knowledge from all the previous units and therefore acts as a great follow-up on the previous modules. There is still a connection to an instance of the Sakila database in the cloud so no setup of Speedment is needed. As usual, the exercises can be located in this GitHub repo. The content of this article is sufficient to solve the fifth unit which is called MyUnit5Extra. The corresponding Unit5Extra interface contains JavaDocs which describe the intended implementation of the methods in MyUnit5Extra.
 
public interface Unit5Extra {
/**
 * Creates and returns a new Map with Actors as keys and
 * a List of Films in which they appear as values.
 * <p>
 * The result might look like this:
 *
 * ActorImpl { actorId = 126, firstName = FRANCES, lastName = TOMEI, ... }=[FilmImpl { filmId = 21, title = AMERICAN CIRCUS, ...}, ...]
 * …
 *
 * @param joinComponent for data input
 * @return a new Map with Actors as keys and
 *         a List of Films in which they appear as values
 */
Map<Actor, List<Film>> filmographies(JoinComponent joinComponent);

The provided tests (e.g. Unit5ExtraTest) will act as an automatic grading tool, letting you know if your solution was correct or not.


Next Article

By now we hopefully managed to demonstrate how neat the Stream API is for database queries. The next article will move beyond the realm of movie rentals and allow you to write standalone database applications in pure Java for any data source. Happy coding!

Authors

Per Minborg
Julia Gustafsson

Resources

GitHub Opensource Project Speedment
Speedment Stream ORM Initializer
GitHub Repository "hol-streams"
Article Part 1: Creating Streams
Article Part 2: Intermediate Operations
Article Part 3: Terminal Operations

Wednesday, October 30, 2019

Become a Master of Java Streams - Part 4: Database Streams

SQL has always been a declarative language whereas Java for a long time has been imperative. Java streams have changed the game. Code your way through this hands-on-lab article and learn how Java streams can be used to perform declarative queries to an RDBMS database, without writing a single line of SQL code. You will discover, there is a remarkable similarity between the verbs of Java streams and SQL commands.

This article is the fourth out of five, complemented by a GitHub repository containing instructions and exercises to each unit.
Part 1: Creating Streams
Part 2: Intermediate Operations
Part 3: Terminal Operations
Part 4: Database Streams
Part 5: Creating a Database Application Using Streams

Database Streams 

When you familiarized yourself with the operations of Streams, you may have noticed a resemblance to the SQL constructs. Some of them have a more or less a direct mapping to Stream operations, such as LIMIT and COUNT. This resemblance is utilized by the open-source project Speedment to provide type-safe access to any relational database using pure Java.



This table shows how Speedment maps between SQL and Java Streams.

We are contributors to the Speedment open-source project and we will describe how Speedment allows us to use a database as the stream source and feed the pipeline with rows from any of the database tables.



As depicted in the visualization above, Speedment will establish a connection to the database and can then pass data to the application. There is no need to write any code for the database entries since Speedment analyses the underlying database and automatically generates all the required entity classes for the domain model. It saves a lot of time when you don’t have to write and maintain entity classes by hand for each table you want to use.

Sakila Database 

For the sake of this article, as well as the exercises, we use the MySQL example database Sakila as our data source. The Sakila database models an old-fashioned movie rentals business and therefore contains tables such as Film and Actor. An instance of the database is deployed in the cloud and is open for public access.

Speedment Manager 

In Speedment, the handle to a database table is a called a Manager. The managers are part of the automatically generated code.












A Manager acts as a handle to a database table and can act as a stream source. In this case, every row corresponds to an instance of Film. 


A Manager in Speedment is instantiated by calling:

FilmManager films = speedment.getOrThrow(FilmManager.class);

Note: speedment is an instance that can be obtained from an ApplicationBuilder (more on this topic in the next article).

If the FilmManager::stream is called, the result is a Stream to which we are free to apply any intermediate or terminal operations. For starters, we collect all rows in a list.
 
List<Film> allFilms = films.stream().collect(toList());
FilmImpl { filmId = 1, title = ACADEMY DINOSAUR, …
FilmImpl { filmId = 2, title = ACE GOLDFINGER, …
FilmImpl { filmId = 3, title = ADAPTATION HOLES, …
…

Filtering and Counting

Let’s look at a simple example that outputs the number of films having the rating “PG-13”. Just like a regular Stream, we can filter out the films with the correct rating, and then count these entries.
 
long pg13FilmCount = films.stream()
   .filter(Film.RATING.equal("PG-13"))
   .count();

pg13FilmCount: 195

One important property that follows with Speedment’s custom implementation of Streams is that the streams are able to optimize their own pipeline by introspection. It may look like the Stream will iterate over all rows of a table, but this is not the case. Instead, Speedment is able to translate the pipeline to an optimized SQL query that is passed on to the database. This means only relevant database entries are pulled into the Stream. Thus, in the example above, the stream will be automatically rendered to SQL similar to “SELECT … FROM film WHERE rating = ‘PG-13’ ”

This introspection requires that any use of anonymous lambdas (which do not contain any metadata that relates to the targeted column) are replaced with Predicates from Speedment Fields. In this case Film.RATING.equal(“PG-13”) returns a Predicate that will be tested on each Film and return true if and only if that Film has a Rating that is PG-13.

Although, this does not prevent us from expressing the predicate as:
 
    .filter(f -> f.getRating().equals(“PG-13”))

but this would force Speedment to fetch all the rows in the table and then apply the predicate, hence it is not recommended.

Finding the Longest Film

Here is an example that finds the longest film in the database using the max-operator with the Field Film.LENGTH:
 
Optional<Film> longestFilm = films.stream()
   .max(Film.LENGTH);

longestFilm: 
Optional[FilmImpl {filmId = 141, title = CHICAGO NORTH, length = 185, ...}]

Finding Three Short Films

Locating three short films (we defined short as <= 50 minutes) can be done by filtering away any films that are 50 minutes or shorter and picking the three first results. The predicate in the example looks at the value of the column “length” and determines if it is less than or equal to 50.

List<Film> threeShortFilms = films.stream()
 .filter(Film.LENGTH.lessOrEqual(50))
 .limit(3)
 .collect(toList());
threeShortFilms: [
    FilmImpl { filmId = 2, length = 48,..}, 
    FilmImpl { filmId = 3, length = 50, … }, 
    FilmImpl { filmId = 15, length = 46, ...}]

Pagination with Sorting

If we were to display all the films on a website or in an application, we would probably prefer to paginate the items, rather than loading (possibly) thousands of entries at once. This can be accomplished by combining the operation skip() and limit(). In the example below, we collect the content of the second page, assuming every “page” holds 25 entries. Recall that Streams do not guarantee a certain order of the elements, which means that we need to define an order with the sorted-operator for this to work as intended.
 
List<Film> filmsSortedByLengthPage2 = films.stream()
 .sorted(Film.LENGTH)
 .skip(25 * 1)
 .limit(25)
 .collect(toList());
filmsSortedByLengthPage2: 
[FilmImpl { filmId = 430, length = 49, …}, …]

Note: Finding the content of the n:th page is done by skipping (25 * (n-1)).
Note2: This stream will be automatically rendered to something like “SELECT ... FROM film ORDER BY length ASC LIMIT ? OFFSET ?, values:[25, 25]”

Films Starting with “A” Sorted by Length

We can easily locate any films starting with the capital letter “A” and sort them according to their length (with the shortest film first) like this:
 
List<Film> filmsTitleStartsWithA = films.stream()
 .filter(Film.TITLE.startsWith("A"))
 .sorted(Film.LENGTH)
 .collect(Collectors.toList());
filmsTitleStartsWithA: [
  FilmImpl { filmId=15, title=ALIEN CENTER, …, rating=NC-17, length = 46,
  FilmImpl { filmId=2, title=ACE GOLDFINGER, …, rating=G, length = 48,
… ]

Computing Frequency Tables of Film Length

We can also utilize the groupingBy-operator to sort the films in buckets depending on their lengths and count the total number of films in each bucket. This will create a so-called frequency table of film length.
 
Map<Short, Long> frequencyTableOfLength = films.stream()
 .collect(Collectors.groupingBy(
  Film.LENGTH.asShort(),
  counting()
 ));
frequencyTableOfLength: {46=5, 47=7, 48=11, 49=5, … }

Exercises

For this week’s exercises, you do not need to worry about connecting a database of your own. Instead, we have already provided a connection to an instance of the Sakila database in the cloud. As usual, the exercises can be located in this GitHub repo. The content of this article is sufficient to solve the fourth unit which is called MyUnit4Database. The corresponding Unit4Database Interface contains JavaDocs which describe the intended implementation of the methods in MyUnit4Database.

 
public interface Unit4Database {

   /**
    * Returns the total number of films in the database.
    *
    * @param films manager of film entities
    * @return the total number of films in the database
    */
   long countAllFilms(FilmManager films);
The provided tests (e.g. Unit4MyDatabaseTests) will act as an automatic grading tool, letting you know if your solution was correct or not.

Next Article

So far, we have only scraped the surface of database streams. The next article will allow you to write standalone database applications in pure Java. Happy coding!

Authors

Per Minborg
Julia Gustafsson

Resources

GitHub Opensource Project Speedment
Speedment Stream ORM Initializer
GitHub Repository "hol-streams"
Article Part 1: Creating Streams
Article Part 2: Intermediate Operations
Article Part 3: Terminal Operations

Monday, October 21, 2019

Become a Master of Java Streams - Part 3: Terminal Operations

Bill Gates once said: “I choose a lazy person to do a difficult job because a lazy person will find an easy way to do it.” Nothing can be more true when it comes to streams. In this article, you will learn how a Stream avoids unnecessary work by not performing any computations on the source elements before a terminal operation is invoked and how only a minimum amount of elements are ever produced by the source.

This article is the third out of five, complemented by a GitHub repository containing instructions and exercises to each unit.

Part 1: Creating Streams
Part 2: Intermediate Operations
Part 3: Terminal Operations
Part 4: Database Streams
Part 5: Creating a Database Application Using Streams

Terminal Operations 

Now that we are familiar with the initiation and construction of a Stream pipeline we need a way to handle the output. Terminal operations allow this by producing a result from the remaining elements (such as count()) or a side-effect (such as forEach(Consumer)).

A Stream will not perform any computations on the elements of the source before the terminal operation is initiated. This means that source elements are consumed only as needed - a smart way to avoid unnecessary work. This also means that once the terminal operation is applied, the Stream is consumed and no further operations can be added.


Let’s look at what terminal operations we can apply to the end of a Stream pipeline:

ForEach and ForEachOrdered

A possible use case of a stream could be to update a property of some, or all, elements or why not just print them out for debugging purposes. In either way, we are not interested in collecting or counting the output, but rather by generating a side-effect without returning value.

This is the purpose of forEach() or forEachOrdered(). They both take a Consumer and terminates the Stream without returning anything. The difference between these operations simply being that forEachOrdered() promises to invoke the provided Consumer in the order the elements appear in the Stream whereas forEach() only promises to invoke the Consumer but in any order. The latter variant is useful for parallel Streams.

In the simple case below, we print out every element of the Stream in one single line.

Stream.of(
   "Monkey", "Lion", "Giraffe", "Lemur", “Lion”
)
   .forEachOrdered(System.out::print);
This will produce the following output:

MonkeyLionGiraffeLemurLion


Collecting Elements

A common usage of Streams is to build a “bucket” of the elements or more specifically, to build data structures containing a specific collection of elements. This can be accomplished by calling the terminal operation collect() at the end of the Stream thus asking it to collect the elements into a given data structure. We can provide something called a Collector to the collect() operation and there are a number of different predefined types that can be used depending on the problem at hand. Here are some very useful options:

Collect to Set

We can collect all elements into a Set simply by collecting the elements of the Stream with the collector toSet().

Set<String> collectToSet = Stream.of(
   "Monkey", "Lion", "Giraffe", "Lemur", "Lion"
) 
   .collect(Collectors.toSet());

toSet: [Monkey, Lion, Giraffe, Lemur]


Collect to List

Similarly, the elements can be collected into a List using toList() collector.


List<String> collectToList = Stream.of(
   "Monkey", "Lion", "Giraffe", "Lemur", "Lion"
)
   .collect(Collectors.toList());

collectToList: [Monkey, Lion, Giraffe, Lemur, Lion]


Collect to General Collections

In a more general case, it is possible to collect the elements of the Stream into any Collection by just providing a constructor to the desired Collection type. Example of constructors are LinkedList::new, LinkedHashSet::new and PriorityQueue::new

LinkedList<String> collectToCollection = Stream.of(
   "Monkey", "Lion", "Giraffe", "Lemur", "Lion"
)
   .collect(Collectors.toCollection(LinkedList::new));

collectToCollection: [Monkey, Lion, Giraffe, Lemur, Lion]

Collect to Array
Since an Array is a fixed size container rather than a flexible Collection, there are good reasons to have a special terminal operation, toArray(), to create and store the elements in an Array. Note that just calling toArray() will result in an Array of Objects since the method has no way to create a typed array by itself. Below we show how a constructor of a String array can be used to give a typed array String[].

String[] toArray = Stream.of(
   "Monkey", "Lion", "Giraffe", "Lemur", "Lion"
)
   .toArray(String[]::new);

toArray: [Monkey, Lion, Giraffe, Lemur, Lion]


Collect to Map

We might want to extract information from the elements and provide the result as a Map. To do that, we use the collector toMap() which takes two Functions corresponding to a key-mapper and a value-mapper.

The example shows how different animals can be related to the number of distinct characters in their names. We use the intermediate operation distinct() to assure that we only add unique keys in the Map (If the keys are not distinct, we have to provide a variant of the toMap() collector where a resolver must be provided that is used to merge results from keys that are equal).

Map<String, Integer> toMap = Stream.of(
    "Monkey", "Lion", "Giraffe", "Lemur", "Lion"
)
   .distinct()
   .collect(Collectors.toMap(
       Function.identity(),   //Function<String, K> keyMapper
       s -> (int) s.chars().distinct().count()// Function<String, V> valueMapper
   ));

toMap: {Monkey=6, Lion=4, Lemur=5, Giraffe=6}   (*)

(*) Note that the key order is undefined.

Collect GroupingBy

Sticking to the bucket analogy, we can actually handle more than one bucket simultaneously. There is a very useful Collector named groupingBy() which divides the elements in different groups depending on some property whereby the property is extracted by something called a “classifier”. The output of such an operation is a Map. Below we demonstrate how the animals are grouped based on the first letter of their name.


Map<Character, List<String>> groupingByList =  Stream.of(
    "Monkey", "Lion", "Giraffe", "Lemur", "Lion"
)
   .collect(Collectors.groupingBy(
       s -> s.charAt(0) // Function<String, K> classifier
   ));

groupingByList: {G=[Giraffe], L=[Lion, Lemur, Lion], M=[Monkey]}


Collect GroupingBy Using Downstream Collector

In the previous example, a "downstream collector" toList() was applied for the values in the Map by default, collecting the elements of each bucket into a List. There is an overloaded version of groupingBy() that allows the use of a custom “downstream collector” to get better control over the resulting Map. Below is an example of how the special downstream collector counting() is applied to count, rather than collecting, the elements of each bucket.


Map<Character, Long> groupingByCounting =  Stream.of(
    "Monkey", "Lion", "Giraffe", "Lemur", "Lion"
)
   .collect(Collectors.groupingBy(
       s -> s.charAt(0), // Function<String, K> classifier
       counting()        // Downstream collector
   ));

groupingByCounting: {G=1, L=3, M=1}

Here is an illustration of the process:


Any collector can be used as a downstream collector. In particular, it is worth noting that a collector groupingBy() can take a downstream collector that is also a groupingBy() collector, allowing secondary grouping of the result of the first grouping-operation. In our animal case, we could perhaps create a Map<Character, Map<Character, Long>> where the first map contains keys with the first character and the secondary maps contain the second character as keys and number of occurrences as values.

Occurrence of Elements

The intermediate operation filter() is a great way to eliminate elements that do not match a given predicate. Although, in some cases, we just want to know if there is at least one element that fulfills the predicate. If so, it is more convenient and efficient to use anyMatch(). Here we look for the occurrence of the number 2:

boolean containsTwo = IntStream.of(1, 2, 3).anyMatch(i -> i == 2);

containsTwo: true


Operations for Calculation

Several terminal operations output the result of a calculation. The simplest calculation we can perform being count() which can be applied to any Stream. It can, for example, be used to count the number of animals:


long nrOfAnimals = Stream.of(
    "Monkey", "Lion", "Giraffe", "Lemur"
)
 .count(); 

nrOfAnimals: 4

Although, some terminal operations are only available for the special Stream implementations that we mentioned in the first article; IntStream, LongStream and DoubleStream. Having access to a Stream of such type we can simply sum all the elements like this:

int sum = IntStream.of(1, 2, 3).sum();

sum: 6

Or why not compute the average value of the integers with .average():

OptionalDouble average = IntStream.of(1, 2, 3).average();

average: OptionalDouble[2.0]

Or retrieve the maximal value with .max().
 
int max = IntStream.of(1, 2, 3).max().orElse(0);

max: 3

Like average(), the result of the max() operator is an Optional, hence by stating .orElse(0) we automatically retrieve the value if its present or fall back to 0 as our default. The same solution can be applied to the average-example if we rather deal with a primitive return type.

In case we are interested in all of these statistics, it is quite cumbersome to create several identical streams and apply different terminal operations for each one. Luckily, there is a handy operation called summaryStatistics() which allows several common statistical properties to be combined in a SummaryStatistics object.

IntSummaryStatistics statistics = IntStream.of(1, 2, 3).summaryStatistics();

statistics: IntSummaryStatistics{count=3, sum=6, min=1, average=2.000000, max=3}

Exercises 

Hopefully, you are familiar with the format of the provided exercises at this point. If you just discovered the series or just felt a bit lazy lately (maybe you’ve had your reasons too) we encourage you to clone the GitHub repo and start using the follow-up material. The content of this article is sufficient to solve the third unit which is called MyUnit3Terminal. The corresponding Unit3Terminal Interface contains JavaDocs which describe the intended implementation of the methods in MyUnit3Terminal.

public interface Unit3Terminal { 
 /**
 * Adds each element in the provided Stream
 * to the provided Set.
 * * An input stream of ["A", "B", "C"] and an
 * empty input Set will modify the input Set
 * to contain : ["A", "B", "C"]
 *
 * @param stream with input elements
 * @param set to add elements to
 */

void addToSet(Stream stream, Set set);

The provided tests (e.g. Unit3MyTerminalTest) will act as an automatic grading tool, letting you know if your solution was correct or not.

Next Article 

The next article will show how all the knowledge we have accumulated so far can be applied to database queries.
Hint: Bye-bye SQL, Hello Streams… Until then - happy coding!

Authors 

Per Minborg
Julia Gustafsson

Tuesday, October 15, 2019

Become a Master of Java Streams - Part 2: Intermediate Operations

Just like a magic wand, an Intermediate operation transforms a Stream into another Stream. These operations can be combined in endless ways to perform anything from simple to highly complex tasks in a readable and efficient manner.

This article is the second out of five, complemented by a GitHub repository containing instructions and exercises to each unit.

Intermediate Operations

Intermediate operations act as a declarative (functional) description of how elements of the Stream should be transformed.Together, they form a pipeline through which the elements will flow. What comes out at the end of the line, naturally depends on how the pipeline is designed.

As opposed to a mechanical pipeline, an intermediate operation in a Stream pipeline may(*) render a new Stream that may depend on elements from previous stages. In the case of a map-operation (which we will introduce shortly) the new Stream might even contain elements of a different type.


(*) Strictly speaking, an intermediate operation is not mandated to create a new Stream. Instead, it can update its internal state or, if the intermediate operation did not change anything (such as .skip(0)) return the existing Stream from the previous stage.


To get a glimpse of what a pipeline can look like, recall the example used in the previous article :
List<String> list = Stream.of("Monkey", "Lion", "Giraffe","Lemur")
    .filter(s -> s.startsWith("L"))
    .map(String::toUpperCase)
    .sorted()
    .collect(toList());

System.out.println(list); 
 [LEMUR, LION]

We will now go on to explain the meaning of these and other operations in more detail.


Filter

Based on our experience, filter() is one of the most useful operations of the Stream API. It enables you to narrow down a Stream to elements that fit certain criteria. Such criteria must be expressed as a Predicate (a function resulting in a boolean value) e.g. a lambda. The intention of the code below is to find the Strings that start with the letter “L” and discard the others.
 Stream<String> startsWithT = Stream.of(
   "Monkey", "Lion", "Giraffe", "Lemur"
)

    .filter(s -> s.startsWith("L")); 

startsWithT: [Lion, Lemur]


Limit

There are some very simple, but yet powerful, operations that provide a way to select or discard elements based on their position in the Stream. The first of these operations is limit(n) which basically does what it says - it creates a new stream that only contains the first n elements of the stream it is applied on. The example below illustrates how a Stream of four animals is shortened to only “Monkey” and “Lion”.
Stream<String> firstTwo = Stream.of(
   "Monkey", "Lion", "Giraffe", "Lemur"
)
   .limit(2);

firstTwo: [Monkey, Lion]


Skip

Similarly, if we are only interested in some of the elements down the line, we can use the .skip(n)-operation. If we apply skip(2) to our Stream of animals, we are left with the tailing two elements “Giraffe” and “Lemur”.
Stream<String> firstTwo = Stream.of(
   "Monkey", "Lion", "Giraffe", "Lemur"
)
   .skip(2);

lastTwo: [Giraffe, Lemur]


Distinct

There are also situations where we only need one occurrence of each element of the Stream. Rather than having to filter out any duplicates manually, a designated operation exists for this purpose - distinct(). It will check for equality using Object::equals and returns a new Stream with only unique elements. This is akin to a Set.
Stream<String> uniqueAnimals = Stream.of(
   "Monkey", "Lion", "Giraffe", "Lemur", "Lion"
)
   .distinct();
uniqueAnimals: [“Monkey”, “Lion”, “Giraffe”, “Lemur”]


Sorted

Sometimes the order of the elements is important, in which case we want control over how things are ordered. The simplest way to do this is with the sorted-operation which will arrange the elements in the natural order. In the case of the Strings below, that means alphabetical order.
Stream<String> alphabeticOrder = Stream.of(
    "Monkey", "Lion", "Giraffe", "Lemur"
)
   .sorted();
alphabeticOrder: [Giraffe, Lemur, Lion, Monkey]


Sorted with comparator

Just having the option to sort in natural order can be a bit limiting sometimes. Luckily, it is possible to apply a custom Comparator to inspect a certain property of the element. We could for example order the Strings after their lengths accordingly:
Stream<String> lengthOrder = Stream.of(
    "Monkey", "Lion", "Giraffe", "Lemur"
)
   .sorted(Comparator.comparing(String::length));
lengthOrder: [Lion, Lemur, Monkey, Giraffe]


Map

One of the most versatile operations we can apply to a Stream is map(). It allows elements of a Stream to be transformed into something else by mapping them to another value or type. This means the result of this operation can be a Stream of any type R. The example below performs a simple mapping from String to String, replacing any capital letters with their lower case equivalent.
Stream<String> lowerCase = Stream.of(
    "Monkey", "Lion", "Giraffe", "Lemur"
)
   .map(String::toLowerCase);
lowerCase: [monkey, lion, giraffe, lemur]


Map to Integer, Double or Long

There are also three special implementations of the map-operation which are limited to mapping elements to the primitive types int, double and long.
.mapToInt();
.mapToDouble();
.mapToLong();

Hence, the result of these operations always corresponds to an IntStream, DoubleStream or LongStream. Below, we demonstrate how .mapToInt() can be used to map our animals to the length of their names:
IntStream lengths = Stream.of(
    "Monkey", "Lion", "Giraffe", "Lemur"
)
   .mapToInt(String::length);
lengths: [6, 4, 7, 5] 
Note: String::length is the equivalent of the lambda s -> s.length(). We prefer the former notation since it makes the code more concise and readable.


FlatMap

The last operation that we will cover in this article might be more tricky to understand even though it can be quite powerful. It is related to the map() operation but instead of taking a Function that goes from a type T to a return type R, it takes a Function that goes from a type T and returns a Stream of R. These “internal” streams are then flattened out to the resulting streams resulting in a concatenation of all the elements of the internal streams.
Stream<Character> chars = Stream.of(
    "Monkey", "Lion", "Giraffe", "Lemur"
)
    .flatMap(s -> s.chars().mapToObj(i -> (char) i));
chars: [M, o, n, k, e, y, L, i, o, n, G, i, r, a, f, f, e, L, e, m, u, r]


Exercises

If you haven’t already cloned the associated GitHub repo we encourage you to do so now. The content of this article is sufficient to solve the second unit which is called MyUnit2Intermediate. The corresponding Unit2Intermediate Interface contains JavaDocs which describes the intended implementation of the methods in MyUnit2MyIntermediate.
public interface Unit2Intermediate {
   /**
    * Return a Stream that contains words that are
    * longer than three characters. Shorter words
    * (i.e. words of length 0, 1, 2 and 3)
    * shall be filtered away from the stream.
    * <p>
    *  A Stream of
    *      ["The", "quick", "quick", "brown", "fox",
    *      "jumps", "over", "the", "lazy", "dog"]
    *  would produce a Stream of the elements
    *      ["quick", "quick", "brown", "jumps",
    *      "over", "lazy"]
    */

   Stream<String> wordsLongerThanThreeChars(Stream<String> stream);
The provided tests (e.g. Unit2MyIntermediateTest) will act as an automatic grading tool, letting you know if your solution was correct or not.


Next Article

In the next article, we proceed to terminal operations and explore how we can collect, count or group the resulting elements of our pipeline. Until then - happy coding!

Authors

Per Minborg and Julia Gustafsson

Resources

Become a Master of Java Streams - Part 1: Creating Streams
GitHub Repository "hol-streams"
Speedment Stream ORM Initializer

Friday, October 4, 2019

Become a Master of Java Streams - Part 1: Creating Streams

Declarative code (e.g. functional composition with Streams) provides superior code metrics in many cases. Code your way through this hands-on-lab article series and mature into a better Java programmer by becoming a Master of Java Streams.

The whole idea with Streams is to represent a pipeline through which data will flow and the pipeline’s functions operate on the data. This way, functional-style operations on Streams of elements can be expressed. This article is the first out of five where you will learn firsthand how to become a Master of Streams. We start with basic stream examples and progress with more complex tasks until you know how to connect standard Java Streams to databases in the Cloud.

Once you have completed all five articles, you will be able to drastically reduce your codebase and know how to write pure Java code for the entire applications in a blink.

Here is a summary of the upcoming articles:


Since we are firm believers in the concept of ”Learning by doing”, the series is complemented by a GitHub repository that contains Stream exercises split into 5 Units - each corresponding to the topic of an article. Instructions on how to use the source code are provided in the README-file.

What are Java Streams?

The Java Stream interface was first introduced in Java 8 and, together with lambdas, acts as a milestone in the development of Java since it contributes greatly to facilitating a declarative (functional) programming style. If you want to learn more about the advantages of declarative coding we refer you to this article.

A Java Stream can be visualized as a pipeline through which data will flow (see the image below). The pipeline’s functions will operate on the data by e.g. filtering, mapping and sorting the items. Lastly, a terminal operation can be performed to collect the items in a preferred data structure such as a List, an Array or a Map. An important thing to notice is that a Stream can only be consumed once.


A Stream Pipeline contains three main parts; the stream source, the intermediate operation(s) (zero to many) and a terminal operation.

Let’s have a look at an example to get a glimpse of what we will be teaching throughout this series. We encourage you to look at the code below and try to figure out what the print-statement will result in before reading the next paragraph.

List<String> list = Stream.of("Monkey", "Lion", "Giraffe","Lemur")
    .filter(s -&gt; s.startsWith("L"))
    .map(String::toUpperCase)
    .sorted()
    .collect(toList());
System.out.println(list);

Since the Stream API is descriptive and most often intuitive to use, you will probably have a pretty good understanding of the meaning of these operations regardless if you have encountered them before or not. We start off with a Stream of a List containing four Strings, each representing an African animal. The operations then filter out the elements that start with the letter “L”, converts the remaining elements to uppercase letters, sorts them in natural order (which in this case means alphabetical order) and lastly collects them into a List. Hence, resulting in the output [“LEMUR”, “LION”].

It is important to understand that Streams are “lazy” in the sense that elements are “requested” by the terminal operation (in this case the .collect() statement). If the terminal operation only needs one element (like, for example, the terminal operation .findFirst()), then at most one element is ever going to reach the terminal operation and the reminding elements (if any) will never be produced by the source. This also means that just creating a Stream is often a cheap operation whereas consuming it might be expensive depending on the stream pipeline and the number of potential elements in the stream.

In this case, the Stream Source was a List although many other types can act as a data source. We will spend the rest of this article describing some of the most useful source alternatives.


Stream Sources

Streams are mainly suited for handling collections of objects and can operate on elements of any type T. Although, there exist three special Stream implementations; IntStream, LongStream, and DoubleStream which are restricted to handle the corresponding primitive types.

An empty Stream of any of these types can be generated by calling Stream.empty() in the following manner:

Stream<T>     Stream.empty()
IntStream     IntStream.empty()
LongStream    LongStream.empty()
DoubleStream  DoubleStream.empty()

Empty Streams are indeed handy in some cases, but the majority of the time we are interested in filling our Stream with elements. This can be accomplished in a large number of ways. We will start by looking at the special case of an IntStream since it provides a variety of useful methods.


Useful IntStreams

A basic case is generating a Stream over a small number of items. This can be accomplished by listing the integers using IntStream.of(). The code below yields a simple stream of elements 1, 2 and 3.
IntStream oneTwoThree = IntStream.of(1, 2, 3);
Listing all elements manually can be tedious if the number of items grows large. In the case where we are interested in values in a certain range, the command .rangeClosed() is more effective. The operation is inclusive, meaning that the following code will produce a stream of all elements from 1 to 9.
IntStream positiveSingleDigits = IntStream.rangeClosed(1, 9);

An even more powerful command is .iterate() which enables greater flexibility in terms of what numbers to include. Below, we show an example of how it can be used to produce a Stream of all numbers that are powers of two.
IntStream powersOfTwo = IntStream.iterate(1, i -> i * 2);
There are also several perhaps more unexpected ways of producing a Stream. The method chars() can be used to Stream over the characters in a String, in this case, the elements “A”, “B” and “C”.
IntStream chars = "ABC".chars();
There is also a simple way to generate a Stream of random integers.
IntStream randomInts = new Random().ints();

Stream an Array

Streaming existing data collections is another option. We can stream the elements of an existing Array or choose to list items manually using Stream.of() as previously shown and repeated below.
String[] array = {"Monkey", "Lion", "Giraffe", "Lemur"};
Stream<String> stream2 = Stream.of(array);
Stream<String> stream = Stream.of("Monkey", "Lion", "Giraffe", "Lemur");

Stream from a Collection

It is also very simple to stream any Collection. The examples below demonstrate how a List or Set can be streamed with the simple command .stream().
List<String> list = Arrays.asList("Monkey", "Lion", "Giraffe", "Lemur");
Stream<String> streamFromList = list.stream();
Set<String> set = new HashSet<>(list);
Stream<String> streamFromSet = set.stream();

Stream from a Text File

Sometimes it can also be useful to stream the contents of a text-file. The following command will provide a Stream<String> that holds every line from the referenced file as a separate element.

Stream<String> lines = Files.lines(Paths.get("file.txt"));


Exercise

Now that we have familiarized you with some of the ways of creating a Stream, we encourage you to clone this GitHub repo and start practicing. The content of the article will be enough to solve the first Unit which is called Create. The Unit1Create interface contains JavaDocs which describes the intended implementation of the methods in Unit1MyCreate.
public interface Unit1Create {
 /**
  * Creates a new Stream of String objects that contains
  * the elements "A", "B" and "C" in order.
  *
  * @return a new Stream of String objects that contains
  *   the elements "A", "B" and "C" in order
  */
  Stream<String> newStreamOfAToC();
The provided tests (e.g. Unit1MyCreateTest) will act as an automatic grading tool, letting you know if you solution was correct or not.


If you have not done so yet, go ahead and solve the work items in the Unit1MyCreate class. “Gotta catch ‘em all”.

In the next article, we will continue to describe several intermediate operations that can be applied to these Streams and that will convert them into other Streams. See you soon!

Authors 

Per Minborg
Julia Gustafsson

Monday, September 2, 2019

Java Performance: For-looping vs. Streaming

Is counting upwards or downwards in a for-loop the most efficient way of iterating? Sometimes the answer is neither. Read this post and understand the impact of different iteration varieties.

Iteration Performance

There are many views on how to iterate with high performance. The traditional way of iterating in Java has been a for-loop starting at zero and then counting up to some pre-defined number:

private static final int ITERATIONS = 10_000;

@Benchmark
public int forUp() {
    int sum = 0;
    for (int i = 0; i < ITERATIONS; i++) {
        sum += i;
    }
    return sum;
}
Sometimes, we come across a for-loop that starts with a predetermined non-negative value and then it counts down instead. This is fairly common within the JDK itself, for example in the class String. Here is an example of solving the previous problem by counting down instead of up.

@Benchmark
public int forDown() {
    int sum = 0;
    for (int i = ITERATIONS; i-- > 0;) {
        sum += i;
    }
    return sum;
}

I think the rationale here is that checking how values relate to zero is potentially more efficient than testing how values relate to any other arbitrary value. In fact, all CPUs that I know about have machine code instructions that can check how a given value relates to zero. Another idea is that the count down idiom given above appears to only inspect the loop variable one time (it simultaneously checks the value and then decreases it) as opposed to the regular example at the top. I suspect that this has little or no influence on today’s efficient JIT compiler who will be able to optimize the first iteration just as good as the second. It might have an impact when the code runs in interpretation mode but this is not examined in this article.

Another way of doing the same thing using an IntStream looks like this:

@Benchmark
public int stream() {
    return IntStream.range(0, ITERATIONS)
        .sum();
}

If more performance is needed for large iterations, it is relatively easy to make the stream parallel by just adding a .parallel() operator to the stream. This is not examined in this article.

Performance under Graal VM

Running these tests under GraalVM (rc-11, with the new C2 compiler that ships with GraallVM) on my laptop (MacBook Pro mid 2015, 2.2 GHz Intel Core i7) gives the following:

Benchmark              Mode  Cnt       Score       Error  Units
ForBenchmark.forDown  thrpt    5  311419.166 ±  4201.724  ops/s
ForBenchmark.forUp    thrpt    5  309598.916 ± 12998.579  ops/s
ForBenchmark.stream   thrpt    5  312360.089 ±  8291.792  ops/s

It might come as a surprise for some that the stream solution is the fastest one, albeit by a margin that is well within error margins.

In a previous article, I presented some code metric advantages with streams and Declarative programming compared to traditional Imperative code. I have not tested performance for cold code sections (i.e. before the JIT kicks in).

Clever Math

From math, we recall that the sum of consecutive numbers starting at zero is N*(N+1)/2 where N is the highest number in the series. Running this benchmark:

@Benchmark
public int math() {
    return ITERATIONS * (ITERATIONS + 1) / 2;
}

gives us a performance increase of over 1,000 times over the previous implementations:

Benchmark           Mode  Cnt          Score          Error  Units
ForBenchmark.math  thrpt    5  395561077.984 ± 11138012.141  ops/s

The more iterations, the more gain. Cleverness sometimes trumps brute force.

Ultra-fast Data Streams

With Speedment HyperStream, it is possible to get similar performance with data from databases. Read more here on HyperStream.

Conclusions

On some commonly used hardware/JVMs, it does not matter if we iterate upwards or downwards in our for-loops. More modern JVMs are able to optimize stream iterations so they have equivalent or even better performance than for-loops.

Stream code is generally more readable compared to for-loops in my opinion and so, I believe streams are likely to be the de facto iteration contrivance in some future.

Database content can be streamed with high performance using Speedment HyperStream.