Minborg

Minborg
Minborg

Friday, July 26, 2019

Java: ChronicleMap Part 1, Go Off-Heap

Filling up a HashMap with millions of objects will quickly lead to problems such as inefficient memory usage, low performance and garbage collection problems. Learn how to use off-heap CronicleMap that can contain billions of objects with little or no heap impact.

The built-in Map implementations, such as HashMap and ConcurrentHashMap are excellent tools when we want to work with small to medium-sized data sets. However, as the amount of data grows, these Map implementations are deteriorating and start to exhibit a number of unpleasant drawbacks as shown in this first article in an article series about open-sourceed  CronicleMap.

Heap Allocation

In the examples below, we will use Point objects. Point is a POJO with a public default constructor and getters and setters for X and Y properties (int). The following snippet adds a million Point objects to a HashMap:

final Map<Long, Point> m = LongStream.range(0, 1_000_000)
    .boxed()
    .collect(
        toMap(
            Function.identity(),
            FillMaps::pointFrom,
            (u,v) -> { throw new IllegalStateException(); },
             HashMap::new
        )
    );

    // Conveniency method that creates a Point from
    // a long by applying modulo prime number operations
    private static Point pointFrom(long seed) {
        final Point point = new Point();
        point.setX((int) seed % 4517);
        point.setY((int) seed % 5011);
        return point;
    }

We can easily see the number of objects allocated on the heap and how much heap memory these objects consume:

Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34366 | head
 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:       1002429       32077728  java.util.HashMap$Node (java.base@10)
   2:       1000128       24003072  java.lang.Long (java.base@10)
   3:       1000000       24000000  com.speedment.chronicle.test.map.Point
   4:           454        8434256  [Ljava.util.HashMap$Node; (java.base@10)
   5:          3427         870104  [B (java.base@10)
   6:           185         746312  [I (java.base@10)
   7:           839         102696  java.lang.Class (java.base@10)
   8:          1164          89088  [Ljava.lang.Object; (java.base@10)
For each Map entry, a Long, a HashMap$Node and a Point object need to be created on the heap. There are also a number of arrays with HashMap$Node objects created. In total, these objects and arrays consume 88,515,056 bytes of heap memory. Thus, each entry consumes on average 88.5 bytes.

NB: The extra 2429 HashMap$Node objects come from other HashMap objects used internally by Java.

Off-Heap Allocation

Contrary to this, a CronicleMap uses very little heap memory as can be observed when running the following code:

final Map<Long, Point> m2 = LongStream.range(0, 1_000_000)
    .boxed()
    .collect(
        toMap(
            Function.identity(),
            FillMaps::pointFrom,
            (u,v) -> { throw new IllegalStateException(); },
            () -> ChronicleMap
                .of(Long.class, Point.class)
                .averageValueSize(8)
                .valueMarshaller(PointSerializer.getInstance())
                .entries(1_000_000)
                .create()
        )
    );
Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34413 | head
 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:          6537        1017768  [B (java.base@10)
   2:           448         563936  [I (java.base@10)
   3:          1899         227480  java.lang.Class (java.base@10)
   4:          6294         151056  java.lang.String (java.base@10)
   5:          2456         145992  [Ljava.lang.Object; (java.base@10)
   6:          3351         107232  java.util.concurrent.ConcurrentHashMap$Node (java.base@10)
   7:          2537          81184  java.util.HashMap$Node (java.base@10)
   8:           512          49360  [Ljava.util.HashMap$Node; (java.base@10)
As can be seen, there are no Java heap objects allocated for the CronicleMap entries and consequently no heap memory either.

Instead of allocating heap memory, CronicleMap allocates its memory off-heap. Provided that we start our JVM with the flag -XX:NativeMemoryTracking=summary, we can retrieve the amount off-heap memory being used by issuing the following command:

Pers-MacBook-Pro:chronicle-test pemi$ jcmd 34413 VM.native_memory | grep Internal
-                  Internal (reserved=30229KB, committed=30229KB)
Apparently, our one million objects were laid out in off-heap memory using a little more than 30 MB of off-heap RAM. This means that each entry in the CronicleMap used above needs on average 30 bytes.

This is much more memory effective than a HashMap that required 88.5 bytes. In fact, we saved 66% of RAM memory and almost 100% of heap memory. The latter is important because the Java Garbage Collector only sees objects that are on the heap.

Note that we have to decide upon creation how many entries the CronicleMap can hold at maximum. This is different compared to HashMap which can grow dynamically as we add new associations. We also have to provide a serializer (i.e. PointSerializer.getInstance()), which will be discussed in detail later in this article.

Garbage Collection

Many Garbage Collection (GC) algorithms complete in a time that is proportional to the square of objects that exist on the heap. So if we, for example, double the number of objects on the heap, we can expect the GC would take four times longer to complete.

If we, on the other hand, create 64 times more objects, we can expect to suffer an agonizing 1,024 fold increase in expected GC time. This effectively prevents us from ever being able to create really large HashMap objects.

With ChronicleMap we could just put new associations without any concern of garbage collection times.

Serializer

The mediator between heap and off-heap memory is often called a serializer. ChronicleMap comes with a number of pre-configured serializers for most built-in Java types such as Integer, Long, String and many more.

In the example above, we used a custom serializer that was used to convert a Point back and forth between heap and off-heap memory. The serializer class looks like this:

public final class PointSerializer implements
    SizedReader<Point>,
    SizedWriter<Point> {

    private static PointSerializer INSTANCE = new PointSerializer();

    public static PointSerializer getInstance() { return INSTANCE; }

    private PointSerializer() {}

    @Override
    public long size(@NotNull Point toWrite) {
        return Integer.BYTES * 2;
    }

    @Override
    public void write(Bytes out, long size, @NotNull Point point) {
        out.writeInt(point.getX());
        out.writeInt(point.getY());
    }

    @NotNull
    @Override
    public Point read(Bytes in, long size, @Nullable Point using) {
        if (using == null) {
            using = new Point();
        }
        using.setX(in.readInt());
        using.setY(in.readInt());
        return using;
    }

}
The serializer above is implemented as a stateless singleton and the actual serialization in the methods write() and read() are fairly straight forward. The only tricky part is that we need to have a null check in the read() method if the “using” variable does not reference an instantiated/reused object.

How to Install it?

When we want to use ChronicleMap in our project, we just add the following Maven dependency in our pom.xml file and we have access to the library.

<dependency>
    <groupId>net.openhft</groupId>
    <artifactId>chronicle-map</artifactId>
    <version>3.17.3</version>
</dependency>
If you are using another build tool, for example, Gradle, you can see how to depend on ChronicleMap by clicking this link.

The Short Story

Here are some properties of ChronicleMap:

Stores data off-heap
Is almost always more memory efficient than a HashMap
Implements ConcurrentMap
Does not affect garbage collection times
Sometimes needs a serializer
Has a fixed max entry size
Can hold billions of associations
Is free and open-source

2 comments:

  1. Great article. Do you have any tips for handling large csv files in memory efficient way?

    ReplyDelete
    Replies
    1. Thanks!
      One way of handling large CVS files is to map the files to memory using MappedByteBuffer or using Chronicle Bytes as described in one of my previous articles: https://dzone.com/articles/java-chronicle-bytes-kicking-the-tires

      Delete