Minborg

Minborg
Minborg

Thursday, February 16, 2023

Video: Project Panama and the Foreign Function and Memory API


The “Foreign Function and Memory API” (FFM), previewed in Java 20, allows Java programs to interoperate safely with code and data outside the Java runtime. In this fast-paced talk, we will explore what the FFM API has to offer and, via a hands-on live coding example, see how the promises of FFM can be realized in your code today. The live-code example involves integrating and using a native system call directly from Java.




Here is a link to the 15-minute presentation.

The presentation was made at jFocus 2023 in Stockholm

Monday, February 13, 2023

JDK 21: Image Performance Improvements


Introduction

In a previous article, I talked about how serialization and file I/O performance was improved in JDK 21 thanks to the use of VarHandleconstructs. The method employed there has now also been applied to Java’s image-handling library making it faster. Here is what happened:

Background

When packing/unpacking primitive values (such as int and long primitives) into/from a byte array, conversion was previously made using explicit bit shifting as shown in the ImageInputStreamImpl::readInt method below:

public int readInt() throws IOException {
    if (read(byteBuf, 0, 4) !=  4) {
        throw new EOFException();
    }

    if (byteOrder == ByteOrder.BIG_ENDIAN) {
        return
            (((byteBuf[0] & 0xff) << 24) | ((byteBuf[1] & 0xff) << 16) |      // (1)
             ((byteBuf[2] & 0xff) <<  8) | ((byteBuf[3] & 0xff) <<  0));
    } else {
        return
            (((byteBuf[3] & 0xff) << 24) | ((byteBuf[2] & 0xff) << 16) |      // (2)
             ((byteBuf[1] & 0xff) <<  8) | ((byteBuf[0] & 0xff) <<  0));
    }
}
  1. Big-endian unpacking via bit shifting

  2. Little-endian unpacking via bit shifting

The scheme used here is similar to what is described in my previous article so, I will not dive into the details again. In short, this method is complex and challenging for Java to fully optimize. Also, it is hard to read for us humans.

Improvements in JDK 21

In Java 21, conversions are instead made with VarHandle constructs via the new jdk.internal.util.ByteArray class. Here is what parts of the internal ByteArray class look like:

private static final VarHandle INT =
        MethodHandles.byteArrayViewVarHandle(int[], ByteOrder.BIG_ENDIAN);


static int getInt(byte[] b, int off) {
    return (int) INT.get(b, off);
}

Using VarHandles means Java is able to optimize methods better compared to explicit bit shifting. Again, you can read more about how VarHandles work in my previous article.

The class above handles big-endian. As images need to be able to handle also little-endian a new class called ByteArrayLittleEndianwas added. This means the readInt() method can be simplified and improved like this:

public int readInt() throws IOException {
    if (read(byteBuf, 0, 4) !=  4) {
        throw new EOFException();
    }

    return (byteOrder == ByteOrder.BIG_ENDIAN)
            ? ByteArray.getInt(byteBuf, 0)
            : ByteArrayLittleEndian.getInt(byteBuf, 0);
}

Nice! It looks much cleaner now.

Affected Classes and Impact

The following classes were directly improved:

  • ImageInputStreamImpl

  • ImageOutputStreamImpl

The good news is that these classes provide the foundation for a large number of other image-handling classes in the javax.imageio.stream package and perhaps elsewhere (after all, the above classes are in the public API).

This means, in many cases, image handling becomes faster and all third-party libraries relying on any of the classes above (directly or indirectly) will also run faster with no change in your application code.

Benchmarks

In the benchmarks below, I have used Java 17 as a baseline meaning that other Java 21 performance improvements will also contribute to higher performance. I have run the benchmarks using my Mac M1 aarch64 but the benchmarks are available here for anyone to run.

Graph 1

Graph 1 shows the performance of the ImageInputStreamImpl::readInt method for Java 17 and Java 21.

So, the throughput of the benchmarked method has improved from about 579,800,000 bytes/s to around 639,000,000 bytes/s on my machine which is more than a 10% improvement! Not too bad!

Actual Application Performance Increase

How much faster will your image applications run under Java 21 in reality? There is only one way to find out: Run your own code on JDK 21 today by downloading a JDK 21 Early-Access Build.

Thursday, January 26, 2023

Java 21: Performance Improvements Revealed

 In Java 21, old code might run significantly faster due to recent internal performance optimizations made in the Java Core Libraries. In this article, we will take a closer look at some of these changes and see how much faster your favorite programming language has become. Buckle up, for we are about to run at full speed!

Background

When converting primitive values such as int and long values back and forth to certain external representations, such as a file, the internal class java.io.Bits is used. In previous Java versions, conversion in this class was made using explicit bit shifting as shown hereunder:

static long getLong(byte[] b, int off) {

     return ((b[off + 7] & 0xFFL)      ) +

            ((b[off + 6] & 0xFFL) <<  8) +

            ((b[off + 5] & 0xFFL) << 16) +

            ((b[off + 4] & 0xFFL) << 24) +

            ((b[off + 3] & 0xFFL) << 32) +

            ((b[off + 2] & 0xFFL) << 40) +

            ((b[off + 1] & 0xFFL) << 48) +

            (((long) b[off])      << 56);

}

When taking a closer look, it can be seen that the code will extract a long from a backing byte array by successively extracting a byte value and left-shifting it various steps and then summing the bytes together. 

As the lowest-index byte is the most significant (i.e. it is shifted to the left the most), extraction is made in big-endian order (also called “network order”). There are eight similar steps in the algorithm, where each step is on a separate line, and each step comprises six sub-operations:

  1. Add a constant to the provided off parameter
  2. Extract a byte value at an index from the provided b array including checking index bounds
  3. Convert the byte value to a long (as an AND operation with another long on the LHS is imminent)
  4. Perform an AND operation with the long value 0XFF
  5. Shift the result to the left a number of steps
  6. Accumulate the resulting value (via the + operation)

Hence, there are eight times six operations in total (= 48 operations) that need to be performed. In reality, Java is able to optimize these operations slightly, for example by leveraging CPU instructions that can perform several operations in a single step.

Calling getLong() from an outer loop entails checking index bounds many times as it is difficult to hoist boundary checking outside the outer loop due to the method’s complexity.

Improvements in Java 21

In Java 21, conversions are made with VarHandle constructs instead and the class java.io.Bits was moved and renamed to jdk.internal.util.ByteArray so that other classes from various packages could benefit from it too. Here is what the ByteArray::getLong method looks like in Java 21:

private static final VarHandle LONG = 

        MethodHandles.byteArrayViewVarHandle(long[], ByteOrder.BIG_ENDIAN);


static long getLong(byte[] b, int off) {

     return (long) LONG.get(b, off);

}

Here, It looks like only one operation is made. However, in reality, there are several things going on under the covers of the VarHandle::get operation. On platforms using little-endian (which is almost 100% of the user base), the byte order needs to be swapped. Also, index bounds must be checked. 

The cast (long) is needed in order to prevent auto-boxing/un-boxing for the return value of the LONG VarHandle. The inner workings of VarHandle objects and their coordinates are otherwise beyond the scope of this article.

As VarHandles are first-class citizens of the Java language, significant effort has been put into making them efficient. One can only assume the byte-swapping operations are optimized for the platform at hand. Also, the array bounds checking can be hoisted outside the many sub-steps so only one check is needed.

In addition to internal boundary-check hoisting, The VarHandle construct makes it easier for Java to further hoist boundary checks outside an outer loop compared to the older, more complex, implementation used in pre-Java 21.

Almost all methods in Bits/ByteArray got rewritten, not only getLong(). So, both reading and writing short, int, float, long, and, double values are now much faster.

Affected Classes and Impact

The improved java.util.ByteArray class is used directly by the following Core Library classes:

  • ObjectInputStream
  • ObjectOutputStream
  • ObjectStreamClass
  • RandomAccessFile


Even though it appears the direct usage of ByteArray is limited, there is an enormous transitive use of these classes. For example, the three Object Stream classes above are used extensively in conjunction with serialization. 

This means, in many cases, Java serialization is much faster now!

The RandomAccessFile class is used internally in the JDK for graphics and sound input/output as well as Zip and directory handling. 

More importantly, there is a large number of third-party libraries that relies on these improved classes. They and all applications that are using them will automatically benefit from these improvements in speed. No change in your application code is needed. It just runs faster!


Raw Benchmarks

The details of the first benchmarks shown hereunder are described in this pull request. The actual change of Bits was made via this pull request.

I have run the benchmarks under Linux x64, Windows x64, and Mac aarch64. Note that this implied running them on different hardware, so these results can’t be compared across operating systems. In other words, Mac aarch64 is not necessarily faster than Linux x64.

I’ve run the tests using the above ByteArray::readLong method and I’ve used an outer loop with two iterations writing long values into an array. The more iterations in the outer loop, the more pronounced advantages we get with the VarHandle access. One reason for this is likely the C2 compiler is able to hoist out boundary checks outside the outer loop.


image

Graph 1 shows the improvement in speed in Bits for various platforms.


Serialization Benchmarks

So, given the performance increase in ByteArray looks awesome, what will be the practical effect on serialization given all the other things that need to happen during the serialization process?

Consider the following classes that contain all the primitive types (except boolean):

static final class MyData implements Serializable {

    byte b;

    char c;

    short s;

    int i;

    float f;

    long l;

    double d;


    public MyData(byte b, char c, short s, int i, float f, long l, double d) {

        this.b = b;

        this.c = c;

        this.s = s;

        this.i = i;

        this.f = f;

        this.l = l;

        this.d = d;

    }

}


record MyRecord(byte b,

                char c,

                shorts,

                int i,

                float f,

                long l,

                double d) implements Serializable {}



where the complete PrimitiveFieldSerializationBenchmark is available here. Running these benchmarks that serialize instances of the classes above on my laptop (macOS 12.6.1, MacBook Pro (16-inch, 2021) M1 Max) produced the following result:


Baseline (20-ea+30-2297)

Benchmark                           Mode  Cnt  Score   Error  Units

SerializeBenchmark.serializeData    avgt    8  7.283 ± 0.070  ns/op

SerializeBenchmark.serializeRecord  avgt    8  7.275 ± 0.201  ns/op


Java 21

SerializeBenchmark.serializeData    avgt    8  6.793 ± 0.132  ns/op

SerializeBenchmark.serializeRecord  avgt    8  6.733 ± 0.032  ns/op


This is good news! Our classes now serialize more than 5% faster.

Graph 2 shows the improvement in serialization for two classes.


Future Improvements

There are several other classes in the JDK that look similar and that might benefit from the same type of performance improvements once they are optimized with VarHandle access. 

Caring for old code is a trait of good stewardship!


Actual Application Performance Increase

How much faster will your applications run under Java 21 in reality if you use one or more of these improved classes (directly or indirectly)? There is only one way to find out: Run your own code on Java 21 today by downloading a JDK 21 Early-Access Build


Wednesday, January 18, 2023

Java 20: An Almost Infinite Memory Segment Allocator

Wouldn’t it be cool if you could allocate an infinite amount of memory? In a previous article, I elaborated a bit on how to create memory-mapped files which could be sparse. In this article, we will learn how this can be leveraged as an under-carriage for providing a memory-allocating arena that can return an almost infinite amount of native memory without ever throwing an OutOfMemoryError


Arena

An Arena controls the lifecycle of native memory segments, providing both flexible allocation and timely deallocation.


There are two built-in Arena types in Java 20:


  • A Confined Arena (available via the Arena::openConfined factory)
  • A Shared Arena (available via the Arena::openShared factory)


As the names imply, memory segments obtained from a Confined Arena can only be used by the thread that initially created the Arena, whereas memory segments from a Shared Arena can be used by any thread. Both types will allocate pure unmapped native memory.


The InfiniteArena Class

By creating a new class as shown hereunder, we can provide an implementation that differs from the built-in Arena types in the way that it will provide memory-mapped memory instead of pure native memory.


The class is using sparse files to reduce required file space on platforms where this is supported.


...


import static java.nio.channels.FileChannel.MapMode.READ_WRITE;

import static java.nio.file.StandardOpenOption.*;

import static java.util.Objects.requireNonNull;


public final class InfiniteArena implements Arena {


    private static final Set<OpenOption> OPTS =

            Set.of(CREATE_NEW, SPARSE, READ, WRITE);


    private final String fileName;

    private final AtomicLong cnt;

    private final Arena delegate;


    public InfiniteArena(String fileName) {

        this.fileName = requireNonNull(fileName);

        this.cnt = new AtomicLong();

        this.delegate = Arena.openShared();

    }



    @Override

    public MemorySegment allocate(long byteSize, long byteAlignment) {

        try {

            try (var fc = FileChannel.open(

                    Path.of(fileName + "-" + cnt.getAndIncrement()), OPTS)) {

                return fc.map(READ_WRITE, 0, byteSize, delegate.scope());

            }

        } catch (IOException e) {

            throw new RuntimeException(e);

        }

    }


    @Override

    public SegmentScope scope() {

        return delegate.scope();

    }


    @Override

    public void close() {

        delegate.close();

    }


    @Override

    public boolean isCloseableBy(Thread thread) {

        return delegate.isCloseableBy(thread);

    }


}


As seen above, the parameter byteAlignment in Arena::allocate is ignored in anticipation that mapped memory is super aligned by default on all supporting platforms and that byteAlignment is relatively low. Obviously, in a production system, this has to be handled in a more strict way.


On my machine (macOS 12.6.1) and using the examples below, mapped memory addresses are always aligned to at least 2^14 = 16,384-byte boundaries.


Using the InfiniteArena

Here is an example of how the InfiniteArena can be used in an application:


public static void main(String[] args) {


    try (Arena arena = new InfiniteArena("my-mapped-memory")) {

        MemorySegment s0 = arena.allocate(1L << 40);

        // Do nothing with s0


        MemorySegment s1 = arena.allocate(1L << 40);

        // Fill the region 1024 to 1024+256-1 with the value 2

        s1.asSlice(1024, 256)

                .fill((byte) 2);


        MemorySegment s2 = arena.allocate(16);

        // Write a String to the segment

        s2.setUtf8String(0, "Hello World");

    }

}


In the try-with-resources block, we create an InfiniteArena with a base file name "my-mapped-memory" to be used for the backing mapped files. The Arena is then used to allocate three native MemroySegment instances, where the first two ones are of size 1 TiB and the last is only 16 bytes. Note that, since we are not touching the first segment s0 and is only using a small portion of the second segment s1, the required physical disk space is minimal for these segments.


Lastly,  a small segment s2 of 16 bytes is created in which we put the all-familiar “Hello World”  string.


Inspecting the Files

After the code completes, we can inspect the lingering files:


% ls -lart

...

-rw-r--r--   1 pminborg  staff  1099511627776 Jan  9 17:18 my-mapped-memory-0

-rw-r--r--   1 pminborg  staff  1099511627776 Jan  9 17:18 my-mapped-memory-1

-rw-r--r--   1 pminborg  staff             16 Jan  9 17:18 my-mapped-memory-2



As can be seen, the file names are created as my-mapped-memory-X where X is the sequence number of created MemorySegment instances; 0, 1, …  The first two are large (1 TiB as expected). We can inspect the actual disk usage of all the files:


% du -h my-mapped-memory-*

  0B    my-mapped-memory-0

 16K    my-mapped-memory-1

4.0K    my-mapped-memory-2


Here is how they look in detail:


% hexdump -C my-mapped-memory-0 

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|



% hexdump -C my-mapped-memory-1

00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|

*

00000400  02 02 02 02 02 02 02 02  02 02 02 02 02 02 02 02  |................|

*

00000500  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|



Note 1: “*” indicates the lines are the same (except the address) until there is another address indication.

Note 2: The hexdump command takes a long time to complete for 1 TiB files so only the first lines of output are shown above.


Here is how the smaller my-mapped-memory–2 file looks like:


% hexdump -C my-mapped-memory-2

00000000  48 65 6c 6c 6f 20 57 6f  72 6c 64 00 00 00 00 00  |Hello World.....|

00000010


Cool! Being able to inspect used memory post-mortem could prove invaluable insights in many cases.


Drawbacks and Future Improvements

As a file needs to be created upon every allocation, this will slow down allocation speed compared to normal allocation. Also, if all space is actually being used, a sparse file would require more resources than if it was a non-sparse file. It would be trivial to modify the code above to use non-sparse files.


In the implementation above, the allocated files will remain after the Arena has been closed and indeed even after the JVM exits. This allows the inspection and tracing of memory segments allocated as exemplified above. It is a small thing to add cleaning up lingering files if that is needed. For example, if the program needs to be re-run again without it complaining about files that already exist. It is also possible to use a unique name each time the application runs to avoid file name collisions.


Mapped files can be much larger than the physical RAM but that comes with the price of potentially swapping in and out virtual memory should the accessed memory not be resident.


What’s Next?

Try out the InfiniteArena today by downloading a JDK 20 Early-Access Build. Do not forget to pass the --enable-preview JVM flag or your code will not run. 



Monday, January 9, 2023

Java 20: Colossal Sparse Memory Segments

Did you know you can allocate memory segments that are larger than the physical size of your machine’s RAM and indeed larger than the size of your entire file system? Read this article and learn how to make use of mapped memory segments that may or may not be “sparse” and how to allocate 64 terabytes of sparse data on a laptop.


Mapped Memory

Mapped memory is virtual memory that has been assigned a one-to-one mapping to a portion of a file. The term “file” is quite broad here and may be represented by a regular file, a device, shared memory or any other thing that the operating system may refer to via a file descriptor.


Accessing files via mapped memory is often much faster than accessing a file via the standard file operations like read and write. Because mapped memory is operated on directly, some interesting solutions can also be constructed via atomic memory operations such as compare-and-set operations, allowing very efficient inter-thread and inter-process communication channels. 


Because not all parts of the mapped virtual memory must reside in real memory at the same time, a mapped memory segment might be much larger than the physical RAM in the machine it is running in. If a portion of the mapped memory is not available when accessed, the operating system will temporarily suspend the current thread and load the missing page after which operation may resume again.


Other advantages of mapped files are; they can be shared across processes running different JVMs and, the files remain persistent and can be inspected using any file tool like hexdump.


Setting up a Mapped Memory Segment


The new Foreign Function and Memory feature that previews for the second time in Java 20 allows large memory segments to be mapped to a file. Here is how you can create a memory segment of size 4 GiB backed by a file.


Set<OpenOption> opts = Set.of(CREATE, READ, WRITE);

try (FileChannel fc = FileChannel.open(Path.of("myFile"), opts);

     Arena arena = Arena.openConfined()) {


    MemorySegment mapped = 

            fc.map(READ_WRITE, 0, 1L << 32, arena.scope());


    use(mapped);


} // Resources allocated by "mapped" is released here via TwR



Sparse Files

A sparse file is a file where information can be stored in an efficient way if not all portions of the file are actually used. A file with large unused “holes” is an example of such a file whereby only the used sections are actually stored in the underlying physical file. In reality, however, the unused holes also consume some resources albeit much less than their used counterparts.

Figure 1, Illustrates a logical sparse file where only actual data elements are stored in the physical file.


As long as the sparse file is not filled with too much data, it is possible to allocate a sparse file that is much larger than the available physical disk space. For example, it is possible to allocate an empty 10 TB memory segment backed by a sparse file on a filesystem with very little available capacity. 


It should be noted that not all platforms support sparse files.


Setting up a Sparsely Mapped Memory Segment 


Here is an example of how to create and access the contents of a file via a memory-mapped MemorySegment whereby the contents is sparse. For example, expanding the real underlying data in the file as needed automatically:


Set<OpenOption> sparse = Set.of(CREATE_NEW, SPARSE, READ, WRITE);

try (var fc = FileChannel.open(Path.of("sparse"), sparse);

     var arena = Arena.openConfined()) {


     memorySegment mapped = 

             fc.map(READ_WRITE, 0, 1L << 32, arena.scope());


    use(mapped);

} // Resources allocated by "mapped" is released here via TwR


Note: The file will appear to consist of 4 GiB of data but in reality the file does not use any (apparent) file-system space at all:


pminborg@pminborg-mac ntive % ll sparse 

-rw-r--r--  1 pminborg  staff  4294967296 Nov 14 16:12 sparse


pminborg@pminborg-mac ntive % du -h sparse 

  0B sparse



Going Colossal

The implementation of sparse files varies across the many platforms that are supported by Java and consequently, various sparse-file properties will vary depending on where an application is deployed.


 I am using a Mac M1 under macOS Monteray (12.6.1) with 32 GiB RAM and 1 TiB storage (of which 900 GiB are available). 


I was able to map a single sparse file of up to 64 TiB using a single mapped memory segment on my machine (using its standard settings):


  4 GiB -> ok as demonstrated above

  1 TiB -> ok

 32 TiB -> ok

 64 TiB -> ok

128 TiB -> failed with OutOfMemoryError


It is possible to increase the amount of mappable memory but this is out of the scope for this article. In real applications, it is better to have smaller portions of a sparse file mapped into memory rather than mapping the entire sparse file in one chunk. These smaller mappings will then act as “windows” into the larger underlying file.


 Anyhow, this looks pretty colossal:


-rw-r--r--   1 pminborg  staff  70368744177664 Nov 22 13:34 sparse


Creating the empty 64 TiB sparse file took about 200 ms on my machine.


Unrelated Observations on Thread Confinement

As can be seen above, it is possible to access the same underlying physical memory from different threads (and indeed even different processes) with file mapping despite being viewed through several distinct thread-confined MemorySegment instances.


What’s Next?

Try out mapped segments today by downloading a JDK 20 Early-Access Build. Do not forget to pass the --enable-preview JVM flag or your code will not run.

Monday, December 5, 2022

Java 20: A Sneak Peek on the Panama FFM API (Second Preview)

The new JEP 434 has just seen daylight and describes the second preview of the ”Foreign Function & Memory API” (or FFM for short) which is going to be incorporated in the upcoming Java 20 release! In this article, we will take a closer look at some of the improvements made from the first preview that debuted in Java 19 via the older JEP 424.


Getting familiar with the FFM

This article assumes you are familiar with the FFM API. If not, you can get a good overview via the  new JEP


Short Summary

Here is a short summary of the FFM changes made in Java 20 compared to Java 19:


  • The MemorySegment and MemoryAddress abstractions are unified (memory addresses are now modeled by zero-length memory segments);

  • MemorySession has been split into Arena and SegmentScope to facilitate sharing segments across maintenance boundaries.

  • The sealed MemoryLayout hierarchy is enhanced to facilitate usage with pattern matching in switch expressions and statements (JEP 433)



MemorySegment

A MemorySegment models a contiguous region of memory, residing either inside or outside the Java heap. A MemorySegment can also be used in conjunction with memory mapping whereby file contents can be directly accessed via a MemorySegment


Some changes were done between Java 19 and Java 20 with respect to the MemorySegment concept. In Java 19, there was a notion named MemoryAddress used for “pointers to memory” and function addresses. In Java 20,  MemorySegment::address returns a raw memory address in the form of a long rather than a MemoryAddress object. Additionally, function addresses are now modeled as a MemorySegment of length zero. This means the MemoryAddress class was dropped entirely.


SegmentScope

All MemorySegment instances need a SegmentScope which models the lifecycle of MemorySegment instances. A scope can be associated with several segments, meaning these segments share the same lifecycle and consequently, their backing resources will be released materially at the same time.


In Java 19, the term MemorySession was used for lifecycles but was also a closeable segment allocator. In Java 20, a SegmentScope is a much more concise, lifecycle-only concept.


Perpetual Global Scope Allocation

Native MemorySegment instances that should live during the entire JVM lifetime can be allocated through the SegmentScope.global() scope (i.e. segment memory associated with this scope will never be released unless the JVM exits). The SegmentScope.global() scope is guaranteed to be a singleton.


Automatic JVM-Managed Deallocation

Native MemorySegment instances  that are managed by the JVM can now be allocated through the SegmentScope.auto() factory:


MemorySegment instances associated with new scopes created via the auto() method are also available to all threads but will be automatically managed by the Java garbage collector. This means segments will be released some unspecified time after the segment becomes unreachable. Thus, segments will be released when they are no longer referenced, just like ByteBuffer objects allocated via the ByteBuffer.allocateDirect() method. 


This allows a convenient create-and-forget scheme but also implies giving up exact control of when potentially large segments of off-heap memory are actually released.


Deterministic User-Managed Deallocation via Arena

Native MemorySegment instances can also be managed directly and deterministically via the Arena factory methods:


  • Arena.openConfined()


  • Arena.openShared()


MemorySegment instances associated with an openConfined() Arena will only be available to the thread that first invokes the factory method and the backing memory will exist merely until the Arena::close method is invoked (either explicitly or by participating in a try-with-resources clause) whereafter accessing any segments associated with the closed Arena will throw an exception. 


MemorySegment instances associated with an openShared() Arena behave in a similar way except they are available to any thread. Another difference is when arenas of this type are closed, the JVM has to make sure no other threads are in a critical section (to ensure memory addressing integrity while maintaining performance) and so, closing a shared Arena is slower than closing a confined Arena.


It should be mentioned that forgetting to invoke the Arena::close method means; any and all memory associated with the Arena will remain allocated until the JVM exits. There are no safety nets here and so, a try-with-resource fits nicely for short-lived arenas as it guarantees all the resources of an  Arena are released, no matter what.


An Arena can also be used to co-allocate segments in the same scope. This is convenient when using certain data structures with pointers. For example, a linked list that can be dynamically grown by creating new segments when the old ones become full. The referencing pointers are guaranteed to remain valid as all the participating segments are associated with the same scope. Only when the common scope is closed, all the underlying segment resources can be released.


In Java 19, the MemorySession was similar to the Java 20 Arena but crucially, an Arena is not a lifecycle but is now instead associated with a lifecycle (accessible via the Arena::scope method). 



MemoryLayout and Pattern Matching

In FFM, a MemoryLayout can be used to describe the contents of a MemorySegment. If we, for example, have the following C struct declaration:


typedef struct Point {

    int x,

    int y

} Points[5];


Then, we can model it in FFM like this:


SequenceLayout pints = MemoryLayout.sequenceLayout(5,

    MemoryLayout.structLayout(

        ValueLayout.JAVA_INT.withName("x"),

        ValueLayout.JAVA_INT.withName("y")

    )

).withName("Points");



Pattern matching (as recently described in JEP 427) will arguably be one of the largest improvements to the Java language ever made and of a similar dignity as generics (appearing in Java 5) and lambdas/functions (appearing in Java 8). In Java 20, the sealed hierarchy of the MemorySegment was overhauled to provide a pattern-matching-friendly definition. This allows, for example,  uncomplicated and concise rendering of memory segments as shown hereunder:


default String render(MemorySegment segment,

                      long offset,

                      ValueLayout layout) {


    return layout.name().orElse(layout.toString())+ " = " +
    switch (layout) {

        case OfBoolean b -> Boolean.toString(segment.get(b, offset));

        case OfByte b -> Byte.toString(segment.get(b, offset));

        case OfChar c -> Character.toString(segment.get(c, offset));

        case OfShort s -> Short.toString(segment.get(s, offset));

        case OfInt i -> Integer.toString(segment.get(i, offset));

        case OfLong l -> Long.toString(segment.get(l , offset));

        case OfFloat f -> Float.toString(segment.get(f, offset));

        case OfDouble d -> Double.toString(segment.get(d, offset));

        case OfAddress a -> 

            "0x"+Long.toHexString(segment.get(a, offset).address());

    };

}



The code above can relatively easily be expanded with cases for the complete MemoryLayout sealed hierarchy including recursive calls for the types SequenceLayout, GroupLayout and for the more simple PaddingLayout.


As a side note, the javadocs in Java 20 will likely come with a pattern-matching nudger in the form of a graphic rendering of the sealed hierarchy for selected classes (i.e. those tagged with “@sealedGraph”). Here is how the graph for  MemoryLayout might look like once Java 20 hits GA:



As can be seen, the graph and the pattern-matching switch example above correspond and the cases are exhaustive with respect to the ValueLayout type.


Other Improvements

Java 20 will also see many other improvements in the FFM API, some of which are summarized hereunder:


  • Reduced API surface, making it easier to learn and understand the new API

  • Improved documentation

  • Ability to access thread local variables in native calls, including errorno


Show Me the Code!

Here are some examples of creating MemorySegment instances for various purposes:


Allocate a 1K MemorySegment for the Entire Duration of an Application’s Lifetime


public static final MemorySegment SHARED_DATA = 

        MemorySegment.allocateNative(1024, MemoryScope.global());



Allocate a Small, 32-byte Temporary  MemorySegment not Bothering When the Underlying Native Memory is Released


var seg = MemorySegment.allocateNative(32, MemoryScope.auto());



Co-allocate a New MemorySegment with an Existing Segment


var coSegment = MemorySegment.allocateNative(32, seg.scope());

// Store a pointer to the original segment

coSegment.set(ValueLayout.ADDRESS, 0, seg);



Allocate a Large, 4 GiB Temporary  MemorySegment Used by the Current Thread Only


try (var arena = Arena.openConfined()) {

    var confined = arena.allocate(1L << 32);

    use(confined);

} // Memory in "confined" is released here via TwR



Allocate a large, 4 GiB temporary  MemorySegment to be Used by Several Threads


try (var arena = Arena.openShared()) {

    var shared = arena.allocate(1L << 32);

    useInParallel(shared);

} // Memory in "shared" is released here via TwR



Access an Array via a MemorySegment


int[] intArray = new int[10];

var intSeg = MemorySegment.ofArray(intArray);



Access a Buffer (of long in This Example) via a MemorySegment



LongBuffer longBuffer = LongBuffer.allocate(20);

var longSeg = MemorySegment.ofBuffer(longBuffer);



What’s Next?

Take FFM for a spin today by downloading a Java 20 Early-Access build. Do not forget to pass the --enable-preview JVM flag or the code will not run. 


Test how you can benefit from FFM already now and engage with the open-source community via the panama mailing list.