Minborg's Java Pot: July 2019

Wednesday, July 31, 2019

Java: ChronicleMap Part 3, Fast Microservices

Standard Java Maps needs to be initialized upon startup. Learn how to leverage ChronicleMaps that is initializable from a file and reduce microservice startup times significantly and how to share Maps between JVMs.

The built-in Map implementations, such as HashMap and ConcurrentHashMap are fast but they must be initialized with mappings before they can be used for looking up values. Also, they are limited in size by practical means such as heap and RAM size. Lastly, they are local to the JVM it runs in.

The initialization process can slow down critical startup for microservices, especially when reading mappings from a remote REST interface or a remote database. In this article, you will learn how you can start your microservice applications in seconds instead of minutes by using memory-mapped ChronicleMap instances and how Maps can be shared between JVMs in this third article in an article series about CronicleMap.

Read more about the fundamentals of CronicleMap in the first article.

Read more about file mapped CronicleMap objects in the second article.

Creating a Shared Map

As described in the second article in the series, we can easily create a file mapped Map like this:

private static Map<Long, Point> createFileMapped() {
    try {
        return ChronicleMap
            .of(Long.class, Point.class)
            .averageValueSize(8)
            .valueMarshaller(PointSerializer.getInstance())
            .entries(10_000_000)
            .createPersistedTo(new File("my-map"));

    } catch (IOException ioe) {
        throw new RuntimeException(ioe);
    }
}

Created Map objects can now be accessed by any JVM that has access to the “my-map” file. Updates to the maps will be shared among the participating JVMs via the shared file.

Initializing the Map

As also shown in the second article, we could create and initialize a Map like this:

final Map<Long, Point> m3 = LongStream.range(0, 10_000_000)
    .boxed()
        .collect(
            toMap(
                Function.identity(),
                FillMaps::pointFrom,
                (u, v) -> {
                    throw new IllegalStateException();
                },
                FillMaps::createFileMapped
            )
        );

When running on my laptop (MacBook Pro mid 2015, 16 GB, 2.2 GHz Intel Core i7), it takes about 10 seconds to create and fill the Map with 10 million entries.

If the Map contents were retrieved externally (as opposed to being created locally by the pointFrom() method), it would likely take much longer time to fill the Map. For example, if we get 50 Mbit/s REST throughput and each JSON Point representation consumes 25 bytes, then it would take some 60 seconds to fill the Map.

Starting a new JVM

Now that there is a pre-existing mapped file, we can start directly off this file as shown in this snippet:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(10_000_000)
    .createOrRecoverPersistedTo(new File("my-map"));

This will create a Map directly from the existing “my-map” file.

Running this on my laptop will yield a start time of 5 seconds. This could be compared to the 60 second REST example, yielding a 90% startup time reduction.

Running Several JVMs on the Same Node

We could elect to run several JVMs on the same physical server node. By doing so, we benefit from the OS’es ability to make mappings of the file available for each JVM by exposing shared memory. This constitutes an efficient and low latency means of communication between the JVMs. The fact that there is a common pool of mapped memory makes the memory management much more efficient compared to a situation where each and every JVM/OS would have to maintain its own separate mappings.

Summary

ChronicleMaps can be shared between participating JVM via shared files
Startup times can be reduced significantly using shared files
If JVMs are running on the same physical machine, performance and efficiency is further improved
Shared files via ChronicleMap provides a low latency means of communication between JVMs

Tuesday, July 30, 2019

Java: ChronicleMap Part 2, Super RAM Maps

The standard Java Maps, such as the ubiquitous HashMap, are ultimately limited by the available RAM. Read this article and learn how you can create Java Maps with virtually unlimited sizes even exceeding the target machine’s RAM size.

The built-in Map implementations, such as HashMap and ConcurrentHashMap work fine as long as they are relatively small. In all cases, they are limited by the available heap and therefore eventually the available RAM size. ChronicleMap can store its contents in files, thereby circumventing this limitation, opening up for terabyte-sized mappings as shown in this second article in an article series about CronicleMap.

Read more about the fundamentals of CronicleMap in my previous first article.

File Mapping

Mapping of a file is made by invoking the createPersistedTo() method on a ChronicleMap builder as shown in the method below:

private static Map<Long, Point> createFileMapped() {
   try {
        return ChronicleMap
            .of(Long.class, Point.class)
            .averageValueSize(8)
            .valueMarshaller(PointSerializer.getInstance())
            .entries(10_000_000)
            .createPersistedTo(new File("my-map"));

    } catch (IOException ioe) {
        throw new RuntimeException(ioe);
    }
}

This will create a Map that will layout its content in a memory-mapped file named “my-map” rather than in direct memory. The following example shows how we can create 10 million Point objects and store them all in a file mapped map:

final Map<Long, Point> m3 = LongStream.range(0, 10_000_000)
    .boxed()
    .collect(
        toMap(
            Function.identity(),
            FillMaps::pointFrom,
            (u, v) -> {
                throw new IllegalStateException();
           },
           FillMaps::createFileMapped
       )
   );

The following command shows the newly created file:

Pers-MacBook-Pro:target pemi$ ls -lart my-map 
-rw-r--r--  1 pemi  staff  330305536 Jul 10 16:56 my-map

As can be seen, the file is about 33 MB and thus, each entry occupies 33 bytes on average.

Persistence

When the JVM terminates, the mapped file is still there, making it easy to pick up a previously created map including its content. This works much like a rudimentary superfast database. Here is how we can start off from an existing file:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(10_000_000)
    .createOrRecoverPersistedTo(new File("my-map"));

The Map will be available directly, including its previous content.

Java Map Exceeding RAM Limit

One interesting aspect of memory-mapped files is that they can exceed both the heap and RAM limits. The file mapping logic will make sure that the parts being currently used are loaded into RAM on demand. The mapping logic will also retain recent portions of accessed mapped memory in physical memory to improve performance. This occurs behind-the-scenes and need not be managed by the application itself.

My desktop computer is an older MacBook Pro with only 16GB of memory (Yes, I know that sucks). Nevertheless, I can allocate a Map with 1 billion entries potentially occupying 33 * 1,000,000,000 = 33 GB memory (We remember from above that each entry occupied 33 bytes on average). The code looks like this:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(1_000_000_000)
    .createPersistedTo(new File("huge-map"));

Even though I try to create a Java Map with 2x my RAM size, the code runs flawlessly and I get this file:

Pers-MacBook-Pro:target pemi$ ls -lart | grep huge-map 
-rw-r--r--   1 pemi  staff  34573651968 Jul 10 18:52 huge-map

Needless to say, you should make sure that the file you are mapping to is located on a file system with high random access performance. For example, a filesystem located on a local SSD.

Summary

ChronicleMap can be mapped to an external file
The mapped file is retained when the JVM exits
New applications can pick up an existing mapped file
ChronicleMap can hold more data than there is RAM
Mapped files are best placed on file systems with high random access performance

Friday, July 26, 2019

Java: ChronicleMap Part 1, Go Off-Heap

Filling up a HashMap with millions of objects will quickly lead to problems such as inefficient memory usage, low performance and garbage collection problems. Learn how to use off-heap CronicleMap that can contain billions of objects with little or no heap impact.

The built-in Map implementations, such as HashMap and ConcurrentHashMap are excellent tools when we want to work with small to medium-sized data sets. However, as the amount of data grows, these Map implementations are deteriorating and start to exhibit a number of unpleasant drawbacks as shown in this first article in an article series about open-sourceed CronicleMap.

Heap Allocation

In the examples below, we will use Point objects. Point is a POJO with a public default constructor and getters and setters for X and Y properties (int). The following snippet adds a million Point objects to a HashMap:

final Map<Long, Point> m = LongStream.range(0, 1_000_000)
    .boxed()
    .collect(
        toMap(
            Function.identity(),
            FillMaps::pointFrom,
            (u,v) -> { throw new IllegalStateException(); },
             HashMap::new
        )
    );

    // Conveniency method that creates a Point from
    // a long by applying modulo prime number operations
    private static Point pointFrom(long seed) {
        final Point point = new Point();
        point.setX((int) seed % 4517);
        point.setY((int) seed % 5011);
        return point;
    }

We can easily see the number of objects allocated on the heap and how much heap memory these objects consume:

Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34366 | head
 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:       1002429       32077728  java.util.HashMap$Node (java.base@10)
   2:       1000128       24003072  java.lang.Long (java.base@10)
   3:       1000000       24000000  com.speedment.chronicle.test.map.Point
   4:           454        8434256  [Ljava.util.HashMap$Node; (java.base@10)
   5:          3427         870104  [B (java.base@10)
   6:           185         746312  [I (java.base@10)
   7:           839         102696  java.lang.Class (java.base@10)
   8:          1164          89088  [Ljava.lang.Object; (java.base@10)

For each Map entry, a Long, a HashMap$Node and a Point object need to be created on the heap. There are also a number of arrays with HashMap$Node objects created. In total, these objects and arrays consume 88,515,056 bytes of heap memory. Thus, each entry consumes on average 88.5 bytes.

NB: The extra 2429 HashMap$Node objects come from other HashMap objects used internally by Java.

Off-Heap Allocation

Contrary to this, a CronicleMap uses very little heap memory as can be observed when running the following code:

final Map<Long, Point> m2 = LongStream.range(0, 1_000_000)
    .boxed()
    .collect(
        toMap(
            Function.identity(),
            FillMaps::pointFrom,
            (u,v) -> { throw new IllegalStateException(); },
            () -> ChronicleMap
                .of(Long.class, Point.class)
                .averageValueSize(8)
                .valueMarshaller(PointSerializer.getInstance())
                .entries(1_000_000)
                .create()
        )
    );

Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34413 | head
 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:          6537        1017768  [B (java.base@10)
   2:           448         563936  [I (java.base@10)
   3:          1899         227480  java.lang.Class (java.base@10)
   4:          6294         151056  java.lang.String (java.base@10)
   5:          2456         145992  [Ljava.lang.Object; (java.base@10)
   6:          3351         107232  java.util.concurrent.ConcurrentHashMap$Node (java.base@10)
   7:          2537          81184  java.util.HashMap$Node (java.base@10)
   8:           512          49360  [Ljava.util.HashMap$Node; (java.base@10)

As can be seen, there are no Java heap objects allocated for the CronicleMap entries and consequently no heap memory either.

Instead of allocating heap memory, CronicleMap allocates its memory off-heap. Provided that we start our JVM with the flag -XX:NativeMemoryTracking=summary, we can retrieve the amount off-heap memory being used by issuing the following command:

Pers-MacBook-Pro:chronicle-test pemi$ jcmd 34413 VM.native_memory | grep Internal
-                  Internal (reserved=30229KB, committed=30229KB)

Apparently, our one million objects were laid out in off-heap memory using a little more than 30 MB of off-heap RAM. This means that each entry in the CronicleMap used above needs on average 30 bytes.

This is much more memory effective than a HashMap that required 88.5 bytes. In fact, we saved 66% of RAM memory and almost 100% of heap memory. The latter is important because the Java Garbage Collector only sees objects that are on the heap.

Note that we have to decide upon creation how many entries the CronicleMap can hold at maximum. This is different compared to HashMap which can grow dynamically as we add new associations. We also have to provide a serializer (i.e. PointSerializer.getInstance()), which will be discussed in detail later in this article.

Garbage Collection

Many Garbage Collection (GC) algorithms complete in a time that is proportional to the square of objects that exist on the heap. So if we, for example, double the number of objects on the heap, we can expect the GC would take four times longer to complete.

If we, on the other hand, create 64 times more objects, we can expect to suffer an agonizing 1,024 fold increase in expected GC time. This effectively prevents us from ever being able to create really large HashMap objects.

With ChronicleMap we could just put new associations without any concern of garbage collection times.

Serializer

The mediator between heap and off-heap memory is often called a serializer. ChronicleMap comes with a number of pre-configured serializers for most built-in Java types such as Integer, Long, String and many more.

In the example above, we used a custom serializer that was used to convert a Point back and forth between heap and off-heap memory. The serializer class looks like this:

public final class PointSerializer implements
    SizedReader<Point>,
    SizedWriter<Point> {

    private static PointSerializer INSTANCE = new PointSerializer();

    public static PointSerializer getInstance() { return INSTANCE; }

    private PointSerializer() {}

    @Override
    public long size(@NotNull Point toWrite) {
        return Integer.BYTES * 2;
    }

    @Override
    public void write(Bytes out, long size, @NotNull Point point) {
        out.writeInt(point.getX());
        out.writeInt(point.getY());
    }

    @NotNull
    @Override
    public Point read(Bytes in, long size, @Nullable Point using) {
        if (using == null) {
            using = new Point();
        }
        using.setX(in.readInt());
        using.setY(in.readInt());
        return using;
    }

}

The serializer above is implemented as a stateless singleton and the actual serialization in the methods write() and read() are fairly straight forward. The only tricky part is that we need to have a null check in the read() method if the “using” variable does not reference an instantiated/reused object.

How to Install it?

When we want to use ChronicleMap in our project, we just add the following Maven dependency in our pom.xml file and we have access to the library.

<dependency>
    <groupId>net.openhft</groupId>
    <artifactId>chronicle-map</artifactId>
    <version>3.17.3</version>
</dependency>

If you are using another build tool, for example, Gradle, you can see how to depend on ChronicleMap by clicking this link.

The Short Story

Here are some properties of ChronicleMap:

Stores data off-heap
Is almost always more memory efficient than a HashMap
Implements ConcurrentMap
Does not affect garbage collection times
Sometimes needs a serializer
Has a fixed max entry size
Can hold billions of associations
Is free and open-source