Monday, September 11, 2017

The Need for Speed, Access Existing Data 1,000x Faster

Learn how you can speed up your analytics database applications by a factor of 1,000 by using standard Java 8 streams and Speedment’s In-JVM-Memory accelerator.

Web and mobile applications are sometimes slow because the backing database is slow and/or the connection to the database imposes latencies. Modern UIs and interactive applications require fast back-ends with ideally no observable latency or else users will move on to other services or will just get tired and stop using the service altogether.

In this article we will learn how analytic database applications can be speed up by orders of magnitude using standard Java 8 streams and Speedment's in-JVM-memory acceleration technology. At the end, we will run a JMH test suit with representative benchmarks that indicate a speedup factor exceeding 1,000 times.

Viewing the Database as Streams

Speedment is a modern stream based ORM meaning that the tables are viewed as standard Java 8 streams. In this article we will use the “Sakila” database that is an open-source example database available directly from Oracle here. The Sakila example database contains films, actors, etcetera. This is how a Java 8 stream from the database might look like:

    List<Film> secondPage = films.stream()
        .filter(Film.RATING.equal("PG-13"))
        .sorted(Film.TITLE.comparator())
        .skip(50)
        .limit(50)
        .collect(Collectors.toList());

This stream will filter out only those films that has a rating equal to “PG-13” and will then sort the remaining films by film title. After that, the first 50 films are skipped and then the next 50 films are collected to a list. Thus, we get the second page of all PG-13 films sorted in title order.

Usually, we also need to know how many films there are in total that has a rating of “PG-13” in order to show a correctly scaled scrollbar in our application. This can be done like this:

     long count = films.stream()
        .filter(Film.RATING.equal("PG-13"))
        .count();

Using a Database

Speedment will automatically render the Streams to SQL. This way, we can remain in a pure typesafe Java environment without having to write SQL code. By enabling logging, we can see that the first paging stream will be rendered to the following SQL query (assuming we are using MySQL):
 
SELECT
    `film_id`,`title`,`description``release_year`,
    `language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
    `length`,`replacement_cost`,`rating`,`special_features`,`last_update` 
FROM 
    `sakila`.`film` 
WHERE 
    (`sakila`.`film`.`rating`  = ? COLLATE utf8_bin) 
ORDER BY 
    `sakila`.`film`.`title` ASC 
LIMIT ? OFFSET ?

values:[PG-13, 50, 50]
The second count stream will be rendered to:
 
SELECT COUNT(*) FROM (
    SELECT
        `film_id`,`title`,`description``release_year`,
        `language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
        `length`,`replacement_cost`,`rating`,`special_features`,`last_update`
     FROM
             `sakila`.`film` 
        WHERE 
            (`sakila`.`film`.`rating`  = ? COLLATE utf8_bin)
) AS A

values:[PG-13]

Thus, the stream operations are rendered to efficient SQL. When running one thousand of these queries in parallel on a laptop class of computer with the MySQL standard server configuration, they complete in 700 ms and 175 ms aggregated latency respectively. If you are thinking about how the second SQL statement could be efficient, the fact is that the database will be able to basically eliminate the inner select.

Using In-JVM-Memory Acceleration

Now to the fun part. Let us activate the in-JVM-memory acceleration component in Speedment, called DataStore, in our application. This is done in this way:

     SakilaApplication app = new SakilaApplicationBuilder()
        .withPassword("sakila-password")
        // Activate DataStore
        .withBundle(DataStoreBundle.class)
        .build();

        // Load a snapshot of the database into off heap memory
        app.get(DataStoreComponent.class)
            .ifPresent(DataStoreComponent::load);

When the application is started, a snapshot of the database is pulled into the JVM and is stored off-heap. Because data is stored off-heap, data will not influence garbage collection and the amount of data is only limited by the available RAM. Nothing prevents us from loading terabytes of data if we have that amount of RAM available.

If we now run the same application again, we get 22 ms and 1 ms aggregated latency. This means that the latency is reduced by a factor of 30 and 170 respectively. A significant improvement it has to be said. But, it is getting better still.

Using In-JVM-Memory Acceleration and Json

REST and JSON are commonly used to serve clients that request data these days. Speedment has a special collector that can collect JSON data using something called in-place de-serialization whereby only the fields that are needed by the collector are deserialized from off-heap memory. We can depend on the Json plugin by first adding a dependency in our pom file:

     <dependency>
        <groupId>com.speedment.enterprise.plugins</groupId>
        <artifactId>json-stream</artifactId>
        <version>${speedment.enterprise.version}</version>
    </dependency>

Then, we install the plugin in the ApplicationBuilder as shown hereunder:


     SakilaApplication app = new SakilaApplicationBuilder()
        .withPassword("sakila-password")
        .withBundle(DataStoreBundle.class)
        // Install the Json Plugin
        .withBundle(JsonBundle.class)
        .build();

If we only want the Film fields “title”, “rating” and “length” in the json output, we can create a Json encoder like this:


    final JsonComponent json = app.getOrThrow(JsonComponent.class);

    final JsonEncoder<Film> filmEncoder = json.<Film>emptyEncoder()
        .put(Film.TITLE)
        .put(Film.RATING)
        .put(Film.LENGTH)
        .build();

This decoder is immutable and can be reused over and over again in our application:


     String json = films.stream()
        .filter(Film.RATING.equal("PG-13"))
        .sorted(Film.TITLE.comparator())
        .skip(50 * pageNo)
        .limit(50)
        .collect(JsonCollectors.toList(filmEncoder));

This gives us an additional speed up factor of 2 compared to dealing with entire entities. The JsonComponent can do much more than simply collecting things to a list. For example, it can  create aggregations, also using in-place de-serialization. 



Run Your Own Projects with In-JVM-Memory Acceleration

It is easy to try in-JVM-Memory acceleration by yourself. There is a free Initializer that can be found here. Just tick in your desired database type and you will get a POM and an application template automatically generated for you. You also need a license key to run. Just click “Request a Free Trial License Key” on the same page to get one. If you need more help setting up your project, check out the Speedment GitHub page or explore the manual.

How Fast is it for Real?

Speedment supports a number of database types including Oracle, MySQL, MariaDB, PostgreSQL, Microsoft SQL Server, DB2 and AS400. Speedment can also work with Avro files that is used by Hadoop. In this example, we will run MySQL.

Testing performance in Java application is notoriously difficult. Using the JMH framework, I have written a number of typical applications and have run each test hundreds of thousand times and compared the result for pure MySQL and MySQL with Speedment’s in-JVM accelerator. The performance figures below are given as operations/second (higher is better).


Benchmark
Pure MySQL
MySQL with Speedment in-JVM
Speed up factor
Count all
5,324
43,615,967
8,000
Count with filter
5,107
2,465,928
400
Filtering
449
597,702
1,300
Sorting
109
171,304
1,500
Paging
1,547
1,443,015
900
Iterate all
108
5,556
50
Aggregation
117
167,728
1,400
Aggregation filter
453
608,763
1,300

As can be seen, MySQL with Speedment In-JVM accelerator outperforms Pure MySQL by a factor of 1,000 or more in most cases. The smallest speed up factor observed was 50 times which is still very good.

Test Environment

MySQL, 5.7.16 standard installation, MySQL JDBC Driver 5.1.42, Oracle Java 1.8.0_131, Speedment Enterprise 1.1.10, macOS Sierra 10.12.6, Macbook Pro 2.2 GHz i7 (mid 2015), 16 GB RAM.

Benchmark Code

Below are some examples of how the benchmark code looks like. The complete benchmark application can be found on GitHub here. I encourage you to clone it an run it to see the speedup factor on your own target machine.
    @Benchmark
    public String paging() {
        return films.stream()
            .filter(Film.RATING.equal("PG-13"))
            .skip(50)
            .limit(50)
            .collect(filmCollector);
    }


    @Benchmark
    public String aggregationWithFilter() {
        return films.stream()
            .filter(Film.RATING.equal("PG-13"))
            .collect(sumLengthCollector);
    }

How Much RAM is Needed?

Speedment can often store data in RAM more efficiently than the database itself. The Sakila database in the benchmark takes 6.6 MB on disk but Speedment is only using 3 MB of memory. Considering that Speedment indexes all columns by default whereas the database only indexes a few columns, the Speedment is remarkably memory efficient.

How Long Time Does it Take to Load Data?

The Sakila database was loaded and indexed by Speedment in less than 1 second. Speedment can refresh data from the database in the background and will keep track of which streams are running against which database snapshot version (MVCC).

How Much Faster Will My Own Applications Run?

How much latency reduction one would get in a particular project is anyone's guess. Is is x10, x50, x100 or even more? Take the opportunity and find out how much speed you can gain in your own projects!

Take it for a Spin

Learn more about Speedment on GitHub and kick-start your own project using the Speedment Initializer and remember to tick “enable in-memory acceleration” and use the initializer to get a free evaluation license key too. Explore the manual section for the Speedment in-JVM accelerator here or use my twitter handle @PMinborg