Monday, January 9, 2023

Java 20: Colossal Sparse Memory Segments

Did you know you can allocate memory segments that are larger than the physical size of your machine’s RAM and indeed larger than the size of your entire file system? Read this article and learn how to make use of mapped memory segments that may or may not be “sparse” and how to allocate 64 terabytes of sparse data on a laptop.

Mapped Memory

Mapped memory is virtual memory that has been assigned a one-to-one mapping to a portion of a file. The term “file” is quite broad here and may be represented by a regular file, a device, shared memory or any other thing that the operating system may refer to via a file descriptor.

Accessing files via mapped memory is often much faster than accessing a file via the standard file operations like read and write. Because mapped memory is operated on directly, some interesting solutions can also be constructed via atomic memory operations such as compare-and-set operations, allowing very efficient inter-thread and inter-process communication channels. 

Because not all parts of the mapped virtual memory must reside in real memory at the same time, a mapped memory segment might be much larger than the physical RAM in the machine it is running in. If a portion of the mapped memory is not available when accessed, the operating system will temporarily suspend the current thread and load the missing page after which operation may resume again.

Other advantages of mapped files are; they can be shared across processes running different JVMs and, the files remain persistent and can be inspected using any file tool like hexdump.

Setting up a Mapped Memory Segment

The new Foreign Function and Memory feature that previews for the second time in Java 20 allows large memory segments to be mapped to a file. Here is how you can create a memory segment of size 4 GiB backed by a file.

Set<OpenOption> opts = Set.of(CREATE, READ, WRITE);

try (FileChannel fc = FileChannel.open(Path.of("myFile"), opts);

     Arena arena = Arena.openConfined()) {

    MemorySegment mapped = 

            fc.map(READ_WRITE, 0, 1L << 32, arena.scope());


} // Resources allocated by "mapped" is released here via TwR

Sparse Files

A sparse file is a file where information can be stored in an efficient way if not all portions of the file are actually used. A file with large unused “holes” is an example of such a file whereby only the used sections are actually stored in the underlying physical file. In reality, however, the unused holes also consume some resources albeit much less than their used counterparts.

Figure 1, Illustrates a logical sparse file where only actual data elements are stored in the physical file.

As long as the sparse file is not filled with too much data, it is possible to allocate a sparse file that is much larger than the available physical disk space. For example, it is possible to allocate an empty 10 TB memory segment backed by a sparse file on a filesystem with very little available capacity. 

It should be noted that not all platforms support sparse files.

Setting up a Sparsely Mapped Memory Segment 

Here is an example of how to create and access the contents of a file via a memory-mapped MemorySegment whereby the contents is sparse. For example, expanding the real underlying data in the file as needed automatically:

Set<OpenOption> sparse = Set.of(CREATE_NEW, SPARSE, READ, WRITE);

try (var fc = FileChannel.open(Path.of("sparse"), sparse);

     var arena = Arena.openConfined()) {

     memorySegment mapped = 

             fc.map(READ_WRITE, 0, 1L << 32, arena.scope());


} // Resources allocated by "mapped" is released here via TwR

Note: The file will appear to consist of 4 GiB of data but in reality the file does not use any (apparent) file-system space at all:

pminborg@pminborg-mac ntive % ll sparse 

-rw-r--r--  1 pminborg  staff  4294967296 Nov 14 16:12 sparse

pminborg@pminborg-mac ntive % du -h sparse 

  0B sparse

Going Colossal

The implementation of sparse files varies across the many platforms that are supported by Java and consequently, various sparse-file properties will vary depending on where an application is deployed.

 I am using a Mac M1 under macOS Monteray (12.6.1) with 32 GiB RAM and 1 TiB storage (of which 900 GiB are available). 

I was able to map a single sparse file of up to 64 TiB using a single mapped memory segment on my machine (using its standard settings):

  4 GiB -> ok as demonstrated above

  1 TiB -> ok

 32 TiB -> ok

 64 TiB -> ok

128 TiB -> failed with OutOfMemoryError

It is possible to increase the amount of mappable memory but this is out of the scope for this article. In real applications, it is better to have smaller portions of a sparse file mapped into memory rather than mapping the entire sparse file in one chunk. These smaller mappings will then act as “windows” into the larger underlying file.

 Anyhow, this looks pretty colossal:

-rw-r--r--   1 pminborg  staff  70368744177664 Nov 22 13:34 sparse

Creating the empty 64 TiB sparse file took about 200 ms on my machine.

Unrelated Observations on Thread Confinement

As can be seen above, it is possible to access the same underlying physical memory from different threads (and indeed even different processes) with file mapping despite being viewed through several distinct thread-confined MemorySegment instances.

What’s Next?

Try out mapped segments today by downloading a JDK 20 Early-Access Build. Do not forget to pass the --enable-preview JVM flag or your code will not run.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.