HashMap
with millions of objects will quickly lead to problems such as inefficient memory usage, low performance and garbage collection problems. Learn how to use off-heap CronicleMap
that can contain billions of objects with little or no heap impact.The built-in
Map
implementations, such as HashMap
and ConcurrentHashMap
are excellent tools when we want to work with small to medium-sized data sets. However, as the amount of data grows, these Map
implementations are deteriorating and start to exhibit a number of unpleasant drawbacks as shown in this first article in an article series about open-sourceed CronicleMap
.Heap Allocation
In the examples below, we will usePoint
objects. Point
is a POJO with a public default constructor and getters and setters for X and Y properties (int).
The following snippet adds a million Point
objects to a HashMap
:final Map<Long, Point> m = LongStream.range(0, 1_000_000) .boxed() .collect( toMap( Function.identity(), FillMaps::pointFrom, (u,v) -> { throw new IllegalStateException(); }, HashMap::new ) ); // Conveniency method that creates a Point from // a long by applying modulo prime number operations private static Point pointFrom(long seed) { final Point point = new Point(); point.setX((int) seed % 4517); point.setY((int) seed % 5011); return point; }
We can easily see the number of objects allocated on the heap and how much heap memory these objects consume:
Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34366 | head num #instances #bytes class name (module) ------------------------------------------------------- 1: 1002429 32077728 java.util.HashMap$Node (java.base@10) 2: 1000128 24003072 java.lang.Long (java.base@10) 3: 1000000 24000000 com.speedment.chronicle.test.map.Point 4: 454 8434256 [Ljava.util.HashMap$Node; (java.base@10) 5: 3427 870104 [B (java.base@10) 6: 185 746312 [I (java.base@10) 7: 839 102696 java.lang.Class (java.base@10) 8: 1164 89088 [Ljava.lang.Object; (java.base@10)For each
Map
entry, a Long
, a HashMap$Node
and a Point
object need to be created on the heap. There are also a number of arrays with HashMap$Node
objects created. In total, these objects and arrays consume 88,515,056 bytes of heap memory. Thus, each entry consumes on average 88.5 bytes.NB: The extra 2429
HashMap$Node
objects come from other HashMap
objects used internally by Java.Off-Heap Allocation
Contrary to this, aCronicleMap
uses very little heap memory as can be observed when running the following code:final Map<Long, Point> m2 = LongStream.range(0, 1_000_000) .boxed() .collect( toMap( Function.identity(), FillMaps::pointFrom, (u,v) -> { throw new IllegalStateException(); }, () -> ChronicleMap .of(Long.class, Point.class) .averageValueSize(8) .valueMarshaller(PointSerializer.getInstance()) .entries(1_000_000) .create() ) );
Pers-MacBook-Pro:chronicle-test pemi$ jmap -histo 34413 | head num #instances #bytes class name (module) ------------------------------------------------------- 1: 6537 1017768 [B (java.base@10) 2: 448 563936 [I (java.base@10) 3: 1899 227480 java.lang.Class (java.base@10) 4: 6294 151056 java.lang.String (java.base@10) 5: 2456 145992 [Ljava.lang.Object; (java.base@10) 6: 3351 107232 java.util.concurrent.ConcurrentHashMap$Node (java.base@10) 7: 2537 81184 java.util.HashMap$Node (java.base@10) 8: 512 49360 [Ljava.util.HashMap$Node; (java.base@10)As can be seen, there are no Java heap objects allocated for the
CronicleMap
entries and consequently no heap memory either.Instead of allocating heap memory,
CronicleMap
allocates its memory off-heap. Provided that we start our JVM with the flag -XX:NativeMemoryTracking=summary
, we can retrieve the amount off-heap memory being used by issuing the following command:Pers-MacBook-Pro:chronicle-test pemi$ jcmd 34413 VM.native_memory | grep Internal - Internal (reserved=30229KB, committed=30229KB)Apparently, our one million objects were laid out in off-heap memory using a little more than 30 MB of off-heap RAM. This means that each entry in the
CronicleMap
used above needs on average 30 bytes.This is much more memory effective than a
HashMap
that required 88.5 bytes. In fact, we saved 66% of RAM memory and almost 100% of heap memory. The latter is important because the Java Garbage Collector only sees objects that are on the heap.Note that we have to decide upon creation how many entries the
CronicleMap
can hold at maximum. This is different compared to HashMap
which can grow dynamically as we add new associations. We also have to provide a serializer (i.e. PointSerializer.getInstance()
), which will be discussed in detail later in this article.Garbage Collection
Many Garbage Collection (GC) algorithms complete in a time that is proportional to the square of objects that exist on the heap. So if we, for example, double the number of objects on the heap, we can expect the GC would take four times longer to complete.If we, on the other hand, create 64 times more objects, we can expect to suffer an agonizing 1,024 fold increase in expected GC time. This effectively prevents us from ever being able to create really large
HashMap
objects.With
ChronicleMap
we could just put new associations without any concern of garbage collection times.Serializer
The mediator between heap and off-heap memory is often called a serializer.ChronicleMap
comes with a number of pre-configured serializers for most built-in Java types such as Integer
, Long
, String
and many more.In the example above, we used a custom serializer that was used to convert a
Point
back and forth between heap and off-heap memory. The serializer class looks like this:public final class PointSerializer implements SizedReader<Point>, SizedWriter<Point> { private static PointSerializer INSTANCE = new PointSerializer(); public static PointSerializer getInstance() { return INSTANCE; } private PointSerializer() {} @Override public long size(@NotNull Point toWrite) { return Integer.BYTES * 2; } @Override public void write(Bytes out, long size, @NotNull Point point) { out.writeInt(point.getX()); out.writeInt(point.getY()); } @NotNull @Override public Point read(Bytes in, long size, @Nullable Point using) { if (using == null) { using = new Point(); } using.setX(in.readInt()); using.setY(in.readInt()); return using; } }The serializer above is implemented as a stateless singleton and the actual serialization in the methods
write()
and read()
are fairly straight forward. The only tricky part is that we need to have a null check in the read()
method if the “using” variable does not reference an instantiated/reused object.How to Install it?
When we want to useChronicleMap
in our project, we just add the following Maven dependency in our pom.xml file and we have access to the library.<dependency> <groupId>net.openhft</groupId> <artifactId>chronicle-map</artifactId> <version>3.17.3</version> </dependency>If you are using another build tool, for example, Gradle, you can see how to depend on
ChronicleMap
by clicking this link.The Short Story
Here are some properties of ChronicleMap:Stores data off-heap
Is almost always more memory efficient than a
HashMap
Implements
ConcurrentMap
Does not affect garbage collection times
Sometimes needs a serializer
Has a fixed max entry size
Can hold billions of associations
Is free and open-source
Great article. Do you have any tips for handling large csv files in memory efficient way?
ReplyDeleteThanks!
DeleteOne way of handling large CVS files is to map the files to memory using MappedByteBuffer or using Chronicle Bytes as described in one of my previous articles: https://dzone.com/articles/java-chronicle-bytes-kicking-the-tires