Minborg

Minborg
Minborg

Tuesday, May 21, 2019

Java: How to Slash Down Building Times Using the Cloud

Java: How to Slash Down Building Times Using the Cloud

Building larger Java projects on a laptop with Maven can be frustrating and slow. Learn how you could slash down building times by building in the cloud instead.

Setup

As a founder of open-source Speedment Stream ORM, I usually build the project several times per day on my now somewhat old laptop (Macbook Pro, Mid 2015). The Speedment project consists of over 60 modules and the build process is managed by Maven. The project lives here on Github.

I wanted to find out if I could save time by building the project in the cloud instead. In this short article, I will share my results. I have compared my laptop with Oracle Cloud, running the same build process.

I am using the following setup:


Laptop Oracle Cloud
Java JDK OracleJDK 1.8.0_191 OracleJDK 1.8.0_201
Maven Version 3.6.0 3.5.4
CPU Cores 4 4
CPU Type 2.2 GHz Intel Core i7 2.0 GHz Intel Xeon Platinum 8167M
RAM 30G 16G

I should mention that we also have continuous integration servers that run in the cloud using Jenkins.

Laptop

Pers-MBP:speedment pemi$ time mvn clean install

...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  07:46 min
[INFO] Finished at: 2019-04-09T15:34:25+02:00
[INFO] ------------------------------------------------------------------------

real 7m48.065s
user 12m33.850s
sys 0m50.476s

Oracle Cloud

[opc@instance-20190409-xxxx speedment]$ time mvn clean install

...
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 03:41 min
[INFO] Finished at: 2019-04-09T13:30:20Z
[INFO] ------------------------------------------------------------------------

real 3m42.602s
user 10m22.353s
sys 0m32.967s

Parallel Builds

Running parallel builds reduce building time:

Pers-MBP:speedment pemi$ time mvn -T 4 clean install

real 4m47.629s
user 14m24.607s
sys 0m56.834s


[opc@instance-20190409-xxxx speedment]$ time mvn -T 4 clean install

real 3m21.731s
user 11m15.436s
sys 0m34.000s

Summary

The following graph shows a comparison for sequential Speedment Maven builds on my laptop vs. Oracle Cloud (lower is better):



The next graph shows a comparison for parallel builds (lower is better):



The conclusion is that sequential build time was reduced by over 50% when I used the cloud solution and the parallel build time was reduced by 30%.

If I re-build completely two times a day, this means I will save 2 hours per month. More importantly, I will get feedback faster so I could stay “in the development flow”.

As a final word, it should be noted that there are other complementary ways of reducing building times including selecting appropriate maven and JVM parameters, only build changed modules and running the build under GraalVM.

Resources

Speedment Open Source: https://github.com/speedment/speedment
Oracle Cloud: https://cloud.oracle.com/home

Monday, May 13, 2019

Java: How to Become More Productive with Hazelcast in Less Than 5 Minutes

Java: How to Become More Productive with Hazelcast in Less Than 5 Minutes

What if you want to use a Hazelcast In-Memory Data Grid (IMDG) to speed up your database applications, but you have hundreds of tables to handle? Manually coding all Java POJOs and serialization support would entail weeks of work and when done, maintaining that domain model by hand would soon turn into a nightmare. Read this article and learn how to save time and do it in 5 minutes.

Now there is a graceful way to manage these sorts of requirements. The Hazelcast Auto DB Integration Tool allows connection to an existing database which can generate all these boilerplate classes automatically. We get true POJOs, serialization support, configuration, MapStore/MapLoad, ingest and more without having to write a single line of manual code. As a bonus, we get Java Stream support for Hazelcast distributed maps.

Using the Tool

Let us try an example. As in many of my articles, I will be using the Sakila open-source example database. It can be downloaded as a file or as a Docker instance. Sakila contains 16 tables and a total of 90 columns in those tables. It also includes seven views with additional columns.

To start, we use the Hazelcast Auto DB Integration Initializer and a trial license key.


Fill in the values as shown above and press “Download” and your project is saved to your computer. Then, follow the instructions on the next page explaining how to unzip, start the tool and get the trial license.

Next, we connect to the database:



The tool now analyses the schema metadata and then visualizes the database schema in another window:



Just press the “Generate” button and the complete Hazelcast domain model will be generated automatically within 2 or 3 seconds.



Now, we are almost ready to write our Hazelcast IMDG application. We need to create a Hazelcast IMDG to store the actual data in first.

Architecture

This is how the architecture looks like where the Application talks to the Hazelcast IMDG which, in turn, gets its data from the underlying Database:





The code generated by the tool need only be present in the Application and not in the Hazelcast IMDG.

Creating a Hazelcast IMDG

Creating a Hazelcast IMDG is easy. Add the following dependency to your pom.xml file:

<dependency>
     <groupId>com.hazelcast</groupId>
     <artifactId>hazelcast</artifactId>
     <version>3.11</version>
</dependency>

Then, copy the following class to your project:

public class Server {

    public static void main(String... args) throws InterruptedException {
        final HazelcastInstance instance = Hazelcast.newHazelcastInstance();
        while (true) {
            Thread.sleep(1000);
        }
    }

}

Run this main method three times to create three Hazelcast nodes in a cluster. More recent versions of IDEA requires “Allow parallel run” to be enabled in the Run/Debug Configurations. If you only run it once, that is ok too. The example below will still work even though we would just have one node in our cluster.

Running the main method tree times will produce something like this:

Members {size:3, ver:3} [
 Member [172.16.9.72]:5701 - d80bfa53-61d3-4581-afd5-8df36aec5bc0
 Member [172.16.9.72]:5702 - ee312d87-abe6-4ba8-9525-c4c83d6d99b7
 Member [172.16.9.72]:5703 - 71105c36-1de8-48d8-80eb-7941cc6948b4 this
]
Nice! Our three-node-cluster is up and running!

Data Ingest

Before we can run any business logic, we need to ingest data from our database into the newly created Hazelcast IMDG. Luckily, the tool does this for us too. Locate the generated class named SakilaIngest and run it with the database password as the first command line parameter or modify the code so it knows about the password. This is what the generated class looks like.

public final class SakilaIngest {
    
    public static void main(final String... argv) {
        if (argv.length == 0) { 
            System.out.println("Usage: " + SakilaIngest.class.getSimpleName() + " database_password");
         } else {
            try (Speedment app = new SakilaApplicationBuilder()
                .withPassword(argv[0]) // Get the password from the first command line parameter
                .withBundle(HazelcastBundle.class)
                .build()) {
            
                IngestUtil.ingest(app).join();
            }
        }
    }
}
When run, the following output is shown (shortened for brevity):

...
Completed          599 row(s) ingest of data for Hazelcast Map sakila.sakila.customer_list
Completed            2 row(s) ingest of data for Hazelcast Map sakila.sakila.sales_by_store
Completed       16,049 row(s) ingest of data for Hazelcast Map sakila.sakila.payment
Completed       16,044 row(s) ingest of data for Hazelcast Map sakila.sakila.rental
Completed          200 row(s) ingest of data for Hazelcast Map sakila.sakila.actor_info

We now have all data from the database in the Hazelcast IMDG. Nice!

Hello World

Now that our grid is live and we have ingested data, we have access to populated Hazelcast maps. Here is a program that prints all films of length greater than one hour to the console using the Map interface:
public static void main(final String... argv) {
        try (Speedment app = new SakilaApplicationBuilder()
            .withPassword("your-db-password-goes-here")
            .withBundle(HazelcastBundle.class)
            .build()) {

            HazelcastInstance hazelcast = app.getOrThrow(HazelcastInstanceComponent.class).get();

            IMap<Integer, Film> filmMap = hazelcast.getMap("sakila.sakila.film");
            filmMap.forEach((k, v) -> {
                if (v.getLength().orElse(0) > 60) {
                    System.out.println(v);
                }
            });

        }
    }

The film length is an optional variable (i.e., nullable in the database) so it gets automatically mapped to an OptionalLong. It is possible to set this behavior to “legacy POJO” that returns null if that is desirable in the project at hand.

There is also an additional feature with the tool: We get Java Stream support! So, we could write the same functionality like this:

public static void main(final String... argv) {
    try (Speedment app = new SakilaApplicationBuilder()
        .withPassword("your-db-password-goes-here")
        .withBundle(HazelcastBundle.class)
        .build()) {

        FilmManager films = app.getOrThrow(FilmManager.class);
            
        films.stream()
            .filter(Film.LENGTH.greaterThan(60))
            .forEach(System.out::println);

    }

Under the Hood

The tool generates POJOs that implements Hazelcast’s “Portable” serialization support. This means that data in the grid is accessible from applications written in many languages like Java, Go, C#, JavaScript, etc.

The tool generates the following Hazelcast classes:

POJO

One for each table/view that implements the Portable interface.

Serialization Factory

One for each schema. This is needed to efficiently create Portable POJOs when de-serializing data from the IMDG in the client.

MapStore/MapLoad

One for each table/view. These classes can be used by the IMDG to load data directly from a database.

Class Definition

One for each table/view. These classes are used for configuration.

Index utility method

One per project. This can be used to improve the indexing of the IMDG based on the database indexing.

Config support

One per project. Creates automatic configuration of serialization factories, class definitions, and some performance setting.

Ingest support

One per project. Template for ingesting data from the database into the Hazelcast IMDG.

The tool also contains other features such as support for Hazelcast Cloud and Java Stream support.

A particularly appealing property is that the domain model (e.g., POJOs and serializers) does not need to be on the classpath of the servers. They only need to be on the classpath on the client side. This dramatically simplifies the setup and management of the grid. For example, if you need more nodes, add a new generic grid node and it will join the cluster and start participating directly.

Hazelcast Cloud

Connections to Hazelcast Cloud instances can easily be configured using the application builder as shown in this example:

Speedment hazelcastApp = new SakilaApplicationBuilder()
            .withPassword(“<db-password>")
            .withBundle(HazelcastBundle.class)
            .withComponent(HazelcastCloudConfig.class, 
                () -> HazelcastCloudConfig.create(
                            "<name of cluster>",
                            "<cluster password>",
                            "<discovery token>"
                )
            )
            .build();

Savings

I estimate that the tool saved me several hours (if not days) of boilerplate coding just for the smaller example Sakila database. In an enterprise-grade project with hundreds of tables, the tool would save a massive amount of time, both in terms of development and maintenance.

Now that you have learned how to create code for your first exemplary project and have set up all the necessary tools, I am convinced that you could generate code for any Hazelcast database project in under 5 minutes.

Resources

Sakila: https://dev.mysql.com/doc/index-other.html or https://hub.docker.com/r/restsql/mysql-sakila
Initializer: https://www.speedment.com/hazelcast-initializer/
Manual: https://speedment.github.io/speedment-doc/hazelcast.html

Monday, April 15, 2019

Java Stream: Part 2, Is a Count Always a Count?

Java Stream: Part 2, Is a Count Always a Count?

In my previous article on the subject, we learned that JDK 8’s stream()::count takes longer time to execute the more elements there are in the Stream. For more recent JDKs, such as Java 11, that is no longer the case for simple stream pipelines. Learn how things have gotten improved within the JDK itself and how to do if you have more complex stream pipelines.

Java 8

In my previous article, we could conclude that the operation list.stream().count() is O(N) under Java 8, i.e. the execution time depends on the number of elements in the original list. Read the article here.

Java 9 and Upwards

As rightfully pointed out by Nikolai Parlog (@nipafx) and Brian Goetz (@BrianGoetz) on Twitter, the implementation of Stream::count was improved beginning from Java 9. Here is a comparison of the underlying Stream::count code between Java 8 and later Java versions:

Java 8 (from the ReferencePipeline class)

return mapToLong(e -> 1L).sum();

Java 9 and later (from the ReduceOps class)

if (StreamOpFlag.SIZED.isKnown(flags)) {
    return spliterator.getExactSizeIfKnown();
}
...

It appears Stream::count in Java 9 and later is O(1) for Spliterators of known size rather than being O(N). Let’s verify that hypothesis.

Benchmarks

The big-O property can be observed by running the following JMH benchmarks under Java 8 and Java 11:

@State(Scope.Benchmark)
public class CountBenchmark {

    private List<Integer> list;

    @Param({"1", "1000", "1000000"})
    private int size;

    @Setup
    public void setup() {
        list = IntStream.range(0, size)
            .boxed()
            .collect(toList());
    }

    @Benchmark
    public long listSize() {
        return list.size();
    }

    @Benchmark
    public long listStreamCount() {
        return list.stream().count();
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(CountBenchmark.class.getSimpleName())
            .mode(Mode.Throughput)
            .threads(Threads.MAX)
            .forks(1)
            .warmupIterations(5)
            .measurementIterations(5)
            .build();

        new Runner(opt).run();

    }

}

This will produce the following outputs on my laptop (MacBook Pro mid 2015, 2.2 GHz Intel Core i7):

JDK 8 (from my previous article)

Benchmark                        (size)   Mode  Cnt          Score           Error  Units
CountBenchmark.listSize               1  thrpt    5  966658591.905 ± 175787129.100  ops/s
CountBenchmark.listSize            1000  thrpt    5  862173760.015 ± 293958267.033  ops/s
CountBenchmark.listSize         1000000  thrpt    5  879607621.737 ± 107212069.065  ops/s
CountBenchmark.listStreamCount        1  thrpt    5   39570790.720 ±   3590270.059  ops/s
CountBenchmark.listStreamCount     1000  thrpt    5   30383397.354 ±  10194137.917  ops/s
CountBenchmark.listStreamCount  1000000  thrpt    5        398.959 ±       170.737  ops/s

JDK 11

Benchmark                                  (size)   Mode  Cnt          Score           Error  Units
CountBenchmark.listSize                         1  thrpt    5  898916944.365 ± 235047181.830  ops/s
CountBenchmark.listSize                      1000  thrpt    5  865080967.750 ± 203793349.257  ops/s
CountBenchmark.listSize                   1000000  thrpt    5  935820818.641 ±  95756219.869  ops/s
CountBenchmark.listStreamCount                  1  thrpt    5   95660206.302 ±  27337762.894  ops/s
CountBenchmark.listStreamCount               1000  thrpt    5   78899026.467 ±  26299885.209  ops/s
CountBenchmark.listStreamCount            1000000  thrpt    5   83223688.534 ±  16119403.504  ops/s

As can be seen, in Java 11, the list.stream().count() operation is now O(1) and not O(N).

Brian Goetz pointed out that some developers, who were using Stream::peek method calls under Java 8, discovered that these methods were no longer invoked if the Stream::count terminal operation was run under Java 9 and onwards. This generated some negative feedback to the JDK developers. Personally, I think it was the right decision by the JDK developers and that this instead presented a great opportunity for Stream::peek users to get their code right.

More Complex Stream Pipelines

In this chapter, we will take a look at more complex stream pipelines.

JDK 11

Tagir Valeev concluded that pipelines like stream().skip(1).count() are not O(1) for List::stream.
This can be observed by running the following benchmark:
@Benchmark
public long listStreamSkipCount() {
    return list.stream().skip(1).count();
}
CountBenchmark.listStreamCount                  1  thrpt    5  105546649.075 ±  10529832.319  ops/s
CountBenchmark.listStreamCount               1000  thrpt    5   81370237.291 ±  15566491.838  ops/s
CountBenchmark.listStreamCount            1000000  thrpt    5   75929699.395 ±  14784433.428  ops/s
CountBenchmark.listStreamSkipCount              1  thrpt    5   35809816.451 ±  12055461.025  ops/s
CountBenchmark.listStreamSkipCount           1000  thrpt    5    3098848.946 ±    339437.339  ops/s
CountBenchmark.listStreamSkipCount        1000000  thrpt    5       3646.513 ±       254.442  ops/s

Thus, list.stream().skip(1).count() is still O(N).

Speedment

Some stream implementations are actually aware of their sources and can take appropriate shortcuts and merge stream operations into the stream source itself. This can improve performance massively, especially for large streams with more complex stream pipelines like stream().skip(1).count()

The Speedment ORM tool allows databases to be viewed as Stream objects and these streams can optimize away many stream operations like the Stream::count, Stream::skip, Stream::limit operation as demonstrated in the benchmark below. I have used the open-source Sakila exemplary database as data input. The Sakila database is all about rental films, artists etc.
@Benchmark
public long rentalsSkipCount() {
    return rentals.stream().skip(1).count();
}

@Benchmark
public long filmsSkipCount() {
    return films.stream().skip(1).count();
}
When run, the following output will be produced:

SpeedmentCountBenchmark.filmsSkipCount        N/A  thrpt    5   68052838.621 ±    739171.008  ops/s
SpeedmentCountBenchmark.rentalsSkipCount      N/A  thrpt    5   68224985.736 ±   2683811.510  ops/s
The “rental” table contains over 10,000 rows whereas the “film” table only contains 1,000 rows. Nevertheless, their stream().skip(1).count() operations complete in almost the same time. Even if a table would contain a trillion rows, it would still count the elements in the same elapsed time. Thus, the stream().skip(1).count() implementation has a complexity that is O(1) and not O(N).

Note: The benchmark above were run with “DataStore” in-JVM-memory acceleration. If run with no acceleration directly against a database, the response time would depend on the underlying database’s ability to execute a nested “SELECT count(*) …” statement.

Summary

Stream::count was significantly improved in Java 9.

There are stream implementations, such as Speedment, that are able to compute Stream::count in O(1) time even for more complex stream pipelines like stream().skip(...).count() or even stream.filter(...).skip(...).count().

Resources

Speedment Stream ORM Initializer: https://www.speedment.com/initializer/

Sakila: https://dev.mysql.com/doc/index-other.html or https://hub.docker.com/r/restsql/mysql-sakila

Monday, April 8, 2019

Java Stream: Is a Count Always a Count?

Java Stream: Is a Count Always a Count?

It might appear obvious that counting the elements in a Stream takes longer time the more elements there are in the Stream. But actually, Stream::count can sometimes be done in a single operation, no matter how many elements you have. Read this article and learn how.

Count Complexity

The Stream::count terminal operation counts the number of elements in a Stream. The complexity of the operation is often O(N), meaning that the number of sub-operations is proportional to the number of elements in the Stream.

In contrast, the List::size method has a complexity of O(1) which means that regardless of the number of elements in the List, the size() method will return in constant time. This can be observed by running the following JMH benchmarks:
@State(Scope.Benchmark)
public class CountBenchmark {

    private List<Integer> list;

    @Param({"1", "1000", "1000000"})
    private int size;

    @Setup
    public void setup() {
        list = IntStream.range(0, size)
            .boxed()
            .collect(toList());
    }

    @Benchmark
    public long listSize() {
        return list.size();
    }

    @Benchmark
    public long listStreamCount() {
        return list.stream().count();
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(CountBenchmark.class.getSimpleName())
            .mode(Mode.Throughput)
            .threads(Threads.MAX)
            .forks(1)
            .warmupIterations(5)
            .measurementIterations(5)
            .build();

        new Runner(opt).run();

    }

}

This produced the following output on my laptop (MacBook Pro mid 2015, 2.2 GHz Intel Core i7):

Benchmark                        (size)   Mode  Cnt          Score           Error  Units
CountBenchmark.listSize               1  thrpt    5  966658591.905 ± 175787129.100  ops/s
CountBenchmark.listSize            1000  thrpt    5  862173760.015 ± 293958267.033  ops/s
CountBenchmark.listSize         1000000  thrpt    5  879607621.737 ± 107212069.065  ops/s
CountBenchmark.listStreamCount        1  thrpt    5   39570790.720 ±   3590270.059  ops/s
CountBenchmark.listStreamCount     1000  thrpt    5   30383397.354 ±  10194137.917  ops/s
CountBenchmark.listStreamCount  1000000  thrpt    5        398.959 ±       170.737  ops/s


As can be seen, the throughput of List::size is largely independent of the number of elements in the List whereas the throughput of Stream::count drops of rapidly as the numbers of elements grow. But, is this really always the case for all Stream implementation per se?

Source Aware Streams

Some stream implementations are actually aware of their sources and can take appropriate shortcuts and merge stream operations into the stream source itself. This can improve performance massively, especially for large streams. The Speedment ORM tool allows databases to be viewed as Stream objects and these streams can optimize away many stream operations like the Stream::count operation as demonstrated in the benchmark below. I have used the open-source Sakila exemplary database as data input. The Sakila database is all about rental films, artists etc.

@State(Scope.Benchmark)
public class SpeedmentCountBenchmark {

    private Speedment app;
    private RentalManager rentals;
    private FilmManager films;

    @Setup
    public void setup() {
        app =  new SakilaApplicationBuilder()
            .withBundle(DataStoreBundle.class)
            .withLogging(ApplicationBuilder.LogType.STREAM)
            .withPassword(ExampleUtil.DEFAULT_PASSWORD)
            .build();

        app.get(DataStoreComponent.class).ifPresent(DataStoreComponent::load);

        rentals = app.getOrThrow(RentalManager.class);
        films = app.getOrThrow(FilmManager.class);

    }

    @TearDown
    public void tearDown() {
        app.close();
    }


    @Benchmark
    public long rentalsCount() {
        return rentals.stream().count();
    }


    @Benchmark
    public long filmsCount() {
        return films.stream().count();
    }


    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(SpeedmentCountBenchmark.class.getSimpleName())
            .mode(Mode.Throughput)
            .threads(Threads.MAX)
            .forks(1)
            .warmupIterations(5)
            .measurementIterations(5)
            .build();

        new Runner(opt).run();

    }

}
When run, the following output will be produced:

Benchmark                              Mode  Cnt         Score          Error  Units
SpeedmentCountBenchmark.filmsCount    thrpt    5  71037544.648 ± 75915974.254  ops/s
SpeedmentCountBenchmark.rentalsCount  thrpt    5  69750012.675 ± 37961414.355  ops/s


The “rental” table contains over 10,000 rows whereas the “film” table only contains 1,000 rows. Nevertheless, their Stream::count operations complete in almost the same time. Even if a table would contain a trillion rows, it would still count the elements in the same elapsed time. Thus, the Stream::count implementation has a complexity that is O(1) and not O(N).

Note: The benchmark above were run with Speedment's “DataStore” in-JVM-memory acceleration. If run with no acceleration directly against a database, the response time would depend on the underlying database’s ability to execute a “SELECT count(*) FROM film” query.

Summary

It is possible to create Stream implementation that counts their elements in a single operation rather than counting each and every element in the stream. This can improve performance significantly, especially for streams with many elements.

Resources

Speedment Stream ORM Initializer: https://www.speedment.com/initializer/
Sakila: https://dev.mysql.com/doc/index-other.html or https://hub.docker.com/r/restsql/mysql-sakila

Wednesday, March 27, 2019

Java 12: Mapping with Switch Expressions

Java 12: Mapping with Switch Expressions

In this article, we will be looking at the new Java 12 feature “Switch Expressions” and how it can be used in conjunction with the Stream::map operation and some other Stream operations. Learn how you can make your code better with Streams and Switch Expressions.

Switch Expressions

Java 12 comes with “preview” support for “Switch Expressions”. Switch Expression allows switch statements to return values directly as shown hereunder:

public String newSwitch(int day) {
    return switch (day) {
        case 2, 3, 4, 5, 6 -> "weekday";
        case 7, 1 -> "weekend";
        default -> "invalid";
    } + " category";
}
Invoking this method with 1 will return “weekend category”.

This is great and makes our code shorter and more concise. We do not have to bother will fall-through concerns, blocks, mutable temporary variables or missed cases/default that might be the case for the good ole’ switch. Just look at this corresponding old switch example and you will see what I mean:
public String oldSwitch(int day) {
    final String attr;
    switch (day) {
        case 2: case 3: case 4: case 5: case 6: {
            attr = "weekday";
            break;
        }
        case 7: case 1: {
            attr = "weekend";
            break;
        }
        default: {
            attr = "invalid";
        }
    }
    return attr + " category";
}

Switch Expressions is a Preview Feature

In order to get Switch Expression to work under Java 12, we must pass “--enable-preview” as a command line argument both when we compile and run our application. This proved to be a bit tricky but hopefully, it will get easier with the release of new IDE versions and/or/if Java incorporates this feature as a fully supported feature. IntelliJ users need to use version 2019.1 or later.

Switch Expressions in Stream::map

Switch Expressions are very easy to use in Stream::map operators, especially when compared with the old switch syntax. I have used Speedment Stream ORM and the Sakila exemplary database in the examples below. The Sakila database is all about films, actors and so forth.

Here is a stream that decodes a film language id (a short) to a full language name (a String) using map() in combination with a Switch Expression:

public static void main(String... argv) {

    try (Speedment app = new SakilaApplicationBuilder()
        .withPassword("enter-your-db-password-here")
        .build()) {

        FilmManager films = app.getOrThrow(FilmManager.class);

        List<String> languages = films.stream()
            .map(f -> "the " + switch (f.getLanguageId()) {
                case 1 -> "English";
                case 2 -> "French";
                case 3 -> "German";
                default -> "Unknown";
            } + " language")
            .collect(toList());

        System.out.println(languages);
    }
}
This will create a stream of all the 1,000 films in the database and then it will map each film to a corresponding language name and collect all those names into a List. Running this example will produce the following output (shortened for brevity):

[the English language, the English language, … ]

If we would have used the old switch syntax, we would have gotten something like this:
        ...
        List<String> languages = films.stream()
            .map(f -> {
                final String language;
                switch (f.getLanguageId()) {
                    case 1: {
                        language = "English";
                        break;
                    }
                    case 2: {
                        language = "French";
                        break;
                    }
                    case 3: {
                        language = "German";
                        break;
                    }
                    default: {
                       language = "Unknown";
                    }
                }
                return "the " + language + " language";
            })
            .collect(toList());
        ...
Or, perhaps something like this:
        ...
        List<String> languages = films.stream()
            .map(f -> {
                switch (f.getLanguageId()) {
                    case 1: return "the English language";
                    case 2: return "the French language";
                    case 3: return "the German language";
                    default: return "the Unknown language";
                }
            })
            .collect(toList());
         ...
The latter example is shorter but duplicates logic.

Switch Expressions in Stream::mapToInt

In this example, we will compute summary statistics about scores we assign based on a film’s rating. The more restricted, the higher score according to our own invented scale:
IntSummaryStatistics statistics = films.stream()
    .mapToInt(f -> switch (f.getRating().orElse("Unrated")) {
        case "G", "PG" ->  0;
        case "PG-13"   ->  1;
        case "R"       ->  2;
        case "NC-17"   ->  5;
        case "Unrated" -> 10;
        default -> 0;
    })
    .summaryStatistics();

 System.out.println(statistics);
This will produce the following output:
IntSummaryStatistics{count=1000, sum=1663, min=0, average=1.663000, max=5}

In this case, the difference between the Switch Expressions and the old switch is not that big. Using the old switch we could have written:

IntSummaryStatistics statistics = films.stream()
    .mapToInt(f -> { 
        switch (f.getRating().orElse("Unrated")) {
            case "G": case "PG": return 0;
            case "PG-13":   return 1;
            case "R":       return 2;
            case "NC-17":   return 5;
            case "Unrated": return 10;
            default: return 0;
        }
    })
   .summaryStatistics();

Switch Expressions in Stream::collect

This last example shows the use of a switch expression in a grouping by Collector. In this case, we would like to count how many films that can be seen by a person of a certain minimum age. Here, we are using a Map with the minimum age as keys and counted films as values.
Map<Integer, Long> ageMap = films.stream()
     .collect(
         groupingBy( f -> switch (f.getRating().orElse("Unrated")) {
                 case "G", "PG" -> 0;
                 case "PG-13"   -> 13;
                 case "R"       -> 17;
                 case "NC-17"   -> 18;
                 case "Unrated" -> 21;
                 default -> 0;
             },
             TreeMap::new,
             Collectors.counting()
          )
      );

System.out.println(ageMap);
This will produce the following output:
{0=372, 13=223, 17=195, 18=210}
By providing the (optional) groupingBy Map supplier TreeMap::new, we get our ages in sorted order. Why PG-13 can be seen from 13 years of age but NC-17 cannot be seen from 17 but instead from 18 years of age is mysterious but outside the scope of this article.

Summary

I am looking forward to the Switch Expressions feature being officially incorporated in Java. Switch Expressions can sometimes replace lambdas and method references for many stream operation types.

Resources

Sakila: https://dev.mysql.com/doc/index-other.html or https://hub.docker.com/r/restsql/mysql-sakila
JDK 12 Download: https://jdk.java.net/12/
Speedment Stream ORM Initializer: https://www.speedment.com/initializer/

Wednesday, December 19, 2018

Who’s Been Naughty, Who’s Been Nice? Santa Gives You Java 11 Advice!

Who’s Been Naughty, Who’s Been Nice? Santa Gives You Java 11 Advice!


Ever wondered how Santa can deliver holiday gifts to all kids around the world? There are 2 billion
kids, each with an individual wishlist, and he does it in 24 hours. This means 43 microseconds per kid on average and he needs to check whether every child has been naughty or nice.

You do not need to wonder anymore. I will reveal the secret. He is using Java 11 and a modern stream ORM with superfast execution.

Even though Santa’s backing database is old and slow, he can analyze data in microseconds by using standard Java streams and in-JVM-memory technology. Santa’s database contains two tables; Child which holds every child in the world, and HolidayGift that specifies all the items available for production in Santa’s workshop. A child can only have one wish, such are the hash rules.

Viewing the Database as Streams

Speedment is a modern stream based ORM which is able to view relational database tables as standard Java streams. As we all know, only nice children get gifts, so it is important to distinguish between those who’s been naughty and those who’s been nice. This is easily accomplished with the following code:
var niceChildren = children.stream()
        .filter(Child.NICE.isTrue())
        .sorted(Child.COUNTRY.comparator()) 
        .collect(Collectors.toList());

This stream will yield a long list containing only the kids that have been nice. To enable Santa to optimize his delivery route, the list is sorted by country of residence.

Joining Child and HolidayGift

This list seems incomplete though. How does Santa keep track of which gift goes to whom? Now the HolidayGift table will come in handy. Since some children provided Santa with their wish list, we can now join the two tables together to make a complete list containing all the nice children and their corresponding gift. It is important to include the children without any wish (they will get a random gift), therefore we make a left join.
var join = joinComponent
    .from(ChildManager.IDENTIFIER)
        .where(Child.NICE.isTrue())
    .leftJoinOn(HolidayGift.GIFT_ID).equal(Child.GIFT_ID)
    .build(Tuples::of);

Speedment is using a builder pattern to create a Join<T> object which can then be reused over and over again to create streams with elements of type T. In this case, it is used to join Child and HolidayGift. The join only includes children that are nice and matches rows which contain the same value in the gift_id fields.

This is how Santa deliver all packages:
join.stream()
    .parallel() 
    .forEach(SleighUtil::deliver);
As can be seen, Santa can easily deliver all the packages with parallel sleighs, carried by reindeers.

This will render the stream to an efficient SQL query but unfortunately, it is not quick enough to make it in time.

Using In-JVM-Memory Acceleration

Now to the fun part. Santa is activating the in-JVM-memory acceleration component in Speedment, called DataStore. This is done in the following way:
var santasWorkshop = new ApplicationBuilder()
    .withPassword("north-pole")
    // Activate DataStore
    .withBundle(DataStoreBundle.class)
    .build();

    // Load a snapshot of the database into off-heap memory
    santasWorkshop.get(DataStoreComponent.class)
        .ifPresent(DataStoreComponent::load);
This startup configuration is the only needed adjustment to the application. All stream constructs above remain the same. When the application is started, a snapshot of the database is pulled into the JVM and is stored off-heap. Because the data is stored off-heap, it will not influence garbage collection and the amount of data is only limited by available RAM. Nothing prevents Santa from loading terabytes of data since he is using a cloud service and can easily expand his RAM. Now the application will run order of magnitudes faster and Santa will be able to deliver all packages in time.

Run Your Own Projects with In-JVM-Memory Acceleration

If you want to try for yourself how fast a database application can be, there is an Initializer that can be found here. Just tick in your desired database type (Oracle, MySQL, MariaDB, PostgreSQL, Microsoft SQL Server, DB2 or AS400) and you will get a POM and an application template automatically generated for you.

If you need more help setting up your project, check out the Speedment GitHub page or explore the user guide.

Authors

Thank you, Julia Gustafsson and Carina Dreifeldt for co-writing this article.

Tuesday, December 18, 2018

Java: Aggregate Data Off-Heap

Java: Aggregate Data Off-Heap

Explore how to create off-heap aggregations with a minimum of garbage collect impact and
maximum memory utilization.

Creating large aggregations using Java Map, List and Object normally creates a lot of heap memory overhead. This also means that the garbage collector will have to clean up these objects once the aggregation goes out of scope.

Read this short article and discover how we can use Speedment Stream ORM to create off-heap aggregations that can utilize memory more efficiently and with little or no GC impact.

Person

Let’s say we have a large number of Person objects that take the following shape:

public class Person {
    private final int age;
    private final short height;
    private final short weight;        
    private final String gender;
    private final double salary;
    …
    // Getters and setters hidden for brievity
}

For the sake of argument, we also have access to a method called persons() that will create a new Stream with all these Person objects.

Salary per Age

We want to create the average salary for each age bucket. To represent the results of aggregations we will be using a data class called AgeSalary which associates a certain age with an average salary.

public class AgeSalary {
     private int age;
     private double avgSalary;
     … 
    // Getters and setters hidden for brievity
}

Age grouping for salaries normally entails less than 100 buckets being used and so this example is just to show the principle. The more buckets, the more sense it makes to aggregate off-heap.

Solution

Using Speedment Stream ORM, we can derive an off-heap aggregation solution with these three steps:

Create an Aggregator

var aggregator = Aggregator.builderOfType(Person.class, AgeSalary::new)
    .on(Person::age).key(AgeSalary::setAge)
    .on(Person::salary).average(AgeSalary::setAvgSalary)
    .build();

The aggregator can be reused over and over again.

Compute an Aggregation

var aggregation = persons().collect(aggregator.createCollector());

Using the aggregator, we create a standard Java stream Collector that has its internal state completely off-heap.

Use the Aggregation Result

aggregation.streamAndClose()
    .forEach(System.out::println);

Since the Aggregation holds data that is stored off-heap, it may benefit from explicit closing rather than just being cleaned up potentially much later. Closing the Aggregation can be done by calling the close() method, possibly by taking advantage of the AutoCloseable trait, or as in the example above by using streamAndClose() which returns a stream that will close the Aggregation after stream termination.

Everything in a One-Liner

The code above can be condensed to what is effective a one-liner:

persons().collect(Aggregator.builderOfType(Person.class, AgeSalary::new)
    .on(Person::age).key(AgeSalary::setAge)
    .on(Person::salary).average(AgeSalary::setAvgSalary)
    .build()
    .createCollector()
).streamAndClose()
    .forEach(System.out::println);

There is also support for parallel aggregations. Just add the stream operation Stream::parallel and aggregation is done using the ForkJoin pool.

Resources

Download Speedment here

Read more about off-heap aggregations here