Tuesday, July 30, 2019

Java: ChronicleMap Part 2, Super RAM Maps

The standard Java Maps, such as the ubiquitous HashMap, are ultimately limited by the available RAM. Read this article and learn how you can create Java Maps with virtually unlimited sizes even exceeding the target machine’s RAM size.

The built-in Map implementations, such as HashMap and ConcurrentHashMap work fine as long as they are relatively small. In all cases, they are limited by the available heap and therefore eventually the available RAM size. ChronicleMap can store its contents in files, thereby circumventing this limitation, opening up for terabyte-sized mappings as shown in this second article in an article series about CronicleMap.

Read more about the fundamentals of CronicleMap in my previous first article.

File Mapping

Mapping of a file is made by invoking the createPersistedTo() method on a ChronicleMap builder as shown in the method below:
private static Map<Long, Point> createFileMapped() {
   try {
        return ChronicleMap
            .of(Long.class, Point.class)
            .averageValueSize(8)
            .valueMarshaller(PointSerializer.getInstance())
            .entries(10_000_000)
            .createPersistedTo(new File("my-map"));

    } catch (IOException ioe) {
        throw new RuntimeException(ioe);
    }
}

This will create a Map that will layout its content in a memory-mapped file named “my-map” rather than in direct memory. The following example shows how we can create 10 million Point objects and store them all in a file mapped map:

final Map<Long, Point> m3 = LongStream.range(0, 10_000_000)
    .boxed()
    .collect(
        toMap(
            Function.identity(),
            FillMaps::pointFrom,
            (u, v) -> {
                throw new IllegalStateException();
           },
           FillMaps::createFileMapped
       )
   );
The following command shows the newly created file:

Pers-MacBook-Pro:target pemi$ ls -lart my-map 
-rw-r--r--  1 pemi  staff  330305536 Jul 10 16:56 my-map
As can be seen, the file is about 33 MB and thus, each entry occupies 33 bytes on average.

Persistence

When the JVM terminates, the mapped file is still there, making it easy to pick up a previously created map including its content. This works much like a rudimentary superfast database. Here is how we can start off from an existing file:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(10_000_000)
    .createOrRecoverPersistedTo(new File("my-map"));

The Map will be available directly, including its previous content.

Java Map Exceeding RAM Limit

One interesting aspect of memory-mapped files is that they can exceed both the heap and RAM limits. The file mapping logic will make sure that the parts being currently used are loaded into RAM on demand. The mapping logic will also retain recent portions of accessed mapped memory in physical memory to improve performance. This occurs behind-the-scenes and need not be managed by the application itself.

My desktop computer is an older MacBook Pro with only 16GB of memory (Yes, I know that sucks). Nevertheless, I can allocate a Map with 1 billion entries potentially occupying 33 * 1,000,000,000 = 33 GB memory (We remember from above that each entry occupied 33 bytes on average). The code looks like this:

return ChronicleMap
    .of(Long.class, Point.class)
    .averageValueSize(8)
    .valueMarshaller(PointSerializer.getInstance())
    .entries(1_000_000_000)
    .createPersistedTo(new File("huge-map"));

Even though I try to create a Java Map with 2x my RAM size, the code runs flawlessly and I get this file:

Pers-MacBook-Pro:target pemi$ ls -lart | grep huge-map 
-rw-r--r--   1 pemi  staff  34573651968 Jul 10 18:52 huge-map

Needless to say, you should make sure that the file you are mapping to is located on a file system with high random access performance. For example, a filesystem located on a local SSD.

Summary

ChronicleMap can be mapped to an external file
The mapped file is retained when the JVM exits
New applications can pick up an existing mapped file
ChronicleMap can hold more data than there is RAM
Mapped files are best placed on file systems with high random access performance

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.