Minborg

Minborg
Minborg

Tuesday, June 7, 2022

Exhaustive JUnit5 Testing with Combinations, Permutations and Products

 

Exhaustive JUnit5 Testing with Combinations, Permutations, and Products

Unit testing constitutes an integral part of the process of providing high-quality software. But, how can one write tests covering all variants of several operations? Read this article and learn how to use JUnit5 in conjunction with combinations, permutations, and products.

Test Support Libraries

There are many libraries that make testing better in different aspects. Here are some of them:

Agitar One

Agitator automatically creates dynamic test cases, synthesizes sets of input data, and analyzes the results.


http://www.agitar.com/solutions/products/software_agitation.html


Chaos Monkey

Chaos Monkey is responsible for randomly terminating instances in production to ensure that engineers implement their services to be resilient to instance failures.


https://netflix.github.io/chaosmonkey/


Chronicle Test Framework

This library provides support classes for writing JUnit tests. It supports both JUnit4 and JUnit5. Tools like permutations, combinations, and products can be leveraged to create exhaustive tests. Agitar One can automatically identify and generate tests whereas Chronicle Test Framework provides programmatic support for writing your own tests.

 

https://github.com/OpenHFT/Chronicle-Test-Framework


In this article we will be talking about Chronicle Test Framework.

Objective

Suppose we have written a custom implementation of java.util.List called MyList and that we want to test the implementation against a reference implementation such as ArrayList. Further, assume that we have a number of operations  O1, O2, …, On. that we can apply to these objects. The question now is: how can we write tests assuring that the two objects provide the same result regardless of how the operations are applied?

Combinations, Permutations, and Products

Before we continue, we remind ourselves of what combinations, permutations, and products are from a mathematical perspective. Assume we have a set X = {A, B, C} then;


The combinations of X are:

[]

[A]

[B]

[C]

[A, B]

[A, C]

[B, C]

[A, B, C]


The permutations of X are:

[A, B, C]

[A, C, B]

[B, A, C]

[B, C, A]

[C, A, B]

[C, B, A]


The permutations of all combinations of X are:

[]

[A]

[B]

[C]

[A, B]

[B, A]

[A, C]

[C, A]

[B, C]

[C, B]

[A, B, C]

[A, C, B]

[B, A, C]

[B, C, A]

[C, A, B]

[C, B, A]


Where [] denotes a sequence of no elements. As can be seen from the sequences above, the number of variants will increase very rapidly as the number of set members increases. This puts a practical limit on the exhaustiveness of the test one can write.


Another concept (not exemplified in this article) is “Products” (aka Cartesian Products). Suppose we have a sequence s1 = [A, B] of one type and another sequence s2 = [1, 2] of another type then;


The product of s1 and s2 is:

[A, 1]

[A, 2]

[B, 1]

[B, 2]


Products have many similarities with database “join” operations and relate to nested loops.

The Test Framework

In this test, we will be using open-source Chronicle-Test-Framework that supports the aforementioned combination, permutation, and product features. Here is an example:


// Prints: [], [A], [B], [C], [A, B], [B, C], … (8 sequences)

Combination.of("A", "B", "C")

    .forEach(System.out::println)


This prints all the combinations of {A, B, C} the result being the same as in the previous chapter. In the same way, the following example will print out all the permutations of {A, B, C}:


// Prints: [A, B, C], [A, C, B], [B, A, C], … (6 sequences)

Permutation.of("A", "B", "C")

    .forEach(System.out::println)


In this following example, we combine the capabilities of combinations and permutations:


// Prints: [], [A], [B], [C], [A, B], [B, A], …(16 sequences)

Combination.of("A", "B", "C")

    .flatMap(Permutation::of)

    .forEach(System.out::println)


The methods above produce a Stream of Collection elements allowing easy adaptations to other frameworks such as JUnit5.


When it comes to products, things work slightly differently. Here is an example:


final List<String> strings = Arrays.asList("A", "B");

final List<Integer> integers = Arrays.asList(1, 2);


Product.of(strings, integers)

        .forEach(System.out::println);


This will print:


Product2Impl{first=A, second=1}

Product2Impl{first=A, second=2}

Product2Impl{first=B, second=1}

Product2Impl{first=B, second=2}


If a more recent Java version is used, the following scheme would typically be used instead:


record StringInteger(String string, Integer integer){}


Product.of(strings, integers, StringInteger::new)

        .forEach(System.out::println);


This will produce:


StringInteger[string=A, integer=1]

StringInteger[string=A, integer=2]

StringInteger[string=B, integer=1]

StringInteger[string=B, integer=2]


In my opinion, the above is better than the default Product2Impl tuple because the record is a “nominal tuple” where the names and types of the state elements are declared in the record header compared to the Product2Impl which relies on generic types and “first” and “second” as names.

The Objective

Suppose that we want to make sure two java.lang.List implementations behave the same way for a number of mutating operations.  Here, lists contain Integer values (ie implements List<Integer>).


We start by identifying the mutating operations to use. One (arbitrary) suggestion is to use the following operations:


list.clear()

list.add(1)
list.remove((Integer) 1) // Will remove the object 1 and not @index 1

list.addAll(Arrays.asList(2, 3, 4, 5))

list.removeIf(ODD)


Where Predicate<Integer> ODD = v -> v % 2 == 1.


In this article, we will initially use ArrayList and LinkedList for comparison.


As the reader might suspect, the List types are going to be tested against each other with all the possible sequences of operations. But, how can this be achieved?

The Solution

JUnit5 provides a @TestFactory annotation and DynamicTest objects, allowing a test factory to return a Stream of DynamicTest instances. This is something that we can leverage, as shown below:


 private static final Collection<NamedConsumer<List<Integer>>> OPERATIONS =
    Arrays.asList(

        NamedConsumer.of(List::clear, "clear()"),

        NamedConsumer.of(list -> list.add(1), "add(1)"),

        NamedConsumer.of(list -> list.remove((Integer) 1), "remove(1)"),

        NamedConsumer.of(list -> list.addAll(Arrays.asList(2, 3, 4, 5)),
                                "addAll(2,3,4,5)"),

        NamedConsumer.of(list -> list.removeIf(ODD), "removeIf(ODD)")

);



@TestFactory

Stream<DynamicTest> validate() {

    return DynamicTest.stream(Combination.of(OPERATIONS)

                    .flatMap(Permutation::of),

            Object::toString,

            operations -> {

                List<Integer> first = new ArrayList<>();

                List<Integer> second = new LinkedList<>();

                operations.forEach(op -> {

                    op.accept(first);

                    op.accept(second);

                });

                assertEquals(first, second);

            });

}


This is the entire solution, and when run under IntelliJ (or other similar tools), the following tests will be performed (only the first 16 of the 326 tests are shown for brevity):



Image 1, Shows the first 16 test runs of the 326 tests.


The reason NamedConumer is used over the standard java.util.function.Consumer is that it allows real names to be shown rather than some cryptic lambda reference like ListDemoTest$$Lambda$350/0x0000000800ca9a88@3d8314f0. It certainly looks better in the output and provides much better debug capabilities.

Expanding the Concept

Suppose we have a more significant number of List implementations we’d like to test, such as:


ArrayList

LinkedList

CopyOnWriteArrayList

Stack

Vector


To expand the concept, we begin by creating a collection of List<Integer> constructors:


private static final Collection<Supplier<List<Integer>>> CONSTRUCTORS =
        Arrays.asList(

            ArrayList::new,

            LinkedList::new,

            CopyOnWriteArrayList::new,

            Stack::new,

            Vector::new);


The reason for working with constructors rather than instances is that we need to be able to create new List implementations for each dynamic test.


Now, we only need to do minor modifications to the previous test:


@TestFactory

Stream<DynamicTest> validateMany() {

    return DynamicTest.stream(Combination.of(OPERATIONS)

                    .flatMap(Permutation::of),

            Object::toString,

            operations -> {


                // Create a fresh list of List implementations

                List<List<Integer>> lists = CONSTRUCTORS.stream()

                        .map(Supplier::get)

                        .collect(Collectors.toList());


                // For each operation, apply the operation on each list

                operations.forEach(lists::forEach);


                // Test the lists pairwise

                Combination.of(lists)


                    // Filter out only combinations with two lists

                    .filter(set -> set.size() == 2)


                    // Convert the Set to a List for easy access below

                    .map(ArrayList::new)


                    // Assert the pair equals

                    .forEach(pair -> assertEquals(pair.get(0), pair.get(1)))

            });

}


Strictly, this is a bit of cheating as a single dynamic test contains several subtests for a plurality of List pairs. However, It would be reasonably easy to modify the code above to flatMap() in the various assertions under separate dynamic tests. This is left to the reader as an exercise.

A Final Warning

Combinations, permutations, and products are potent tools that can increase test coverage, but they can also increase the time it takes to run all tests as the number of sequences explodes even with only a moderate increase in the number of input variants. 


For example, here is a list of the number of permutations of all combinations for some given sets S of size s:

 

s = size(S)

poc(s) (permutations of combinations)

0

1

2

2

5

3

16

4

65

5

326

6

1,957

7

13,700

8

109,601

9

986,410

10

986,4101

11

108,505,112


Table 1, shows the number of permutations of combinations for some set sizes.


More generally, it can be shown that: poc(s) = poc(s - 1) * s + 1. 


If in doubt, the number of combinations can be checked using the .count() stream operation which will return the number of sequences returned as shown below:


long poc6 = Combination.of(1, 2, 3, 4, 5, 6)

        .flatMap(Permutation::of)

        .count();


The stream above will, unsurprisingly, return 1,957 as there are six input elements.


Combinatorics can make a huge impact. However, make sure to keep the number of dynamic tests under control, or else test time might increase significantly.


Libraries like Chronicle Queue and Chronicle Map make use of the Chronicle-Test-Framework to improve testing. Here is an example from Chronicle Map where five Map operations are exhaustively tested providing 326 dynamic tests in just a few lines of code.



Resources

Open-Source Chronicle-Test-Framework

The code in this article (Open-Source)

Monday, May 9, 2022

Which JVM Version is Fastest?

Which JVM Version is Fastest?

How is a high-performance, low-latency Java application affected by the JVM version used? Every nanosecond counts for trading and other applications where messages between two different threads are exchanged in about 250 ns! Read this article and find out which JDK variant comes out at the top!

Benchmarks

This article will use open-source Chronicle Queue to exchange 256-byte messages between two threads whereby all messages are also stored in shared memory (/dev/shm is used to minimize the impact of the disk subsystem). 

Chronicle Queue is a persisted low-latency Java messaging framework for high-performance and critical applications. Because Chronicle Queue is operating on mapped native memory, it eliminates the need for garbage collections giving developers deterministic high performance.

In the benchmarks, a single producer thread writes messages to a queue with a nanosecond timestamp. Another consumer thread reads the messages from the queue and records the time deltas in a histogram. The producer maintains a sustained message output rate of 100,000 messages per second with a 256-byte payload in each message. Data is measured over 100 seconds so that most jitter will be reflected in the measurements and ensures a reasonable confidence interval for the higher percentiles.

The target machine has an AMD Ryzen 9 5950X 16-Core Processor running at 3.4 GHz under Linux 5.11.0-49-generic #55-Ubuntu SMP. The CPU cores 2-8 are isolated, meaning the operating system will not automatically schedule any user processes and will avoid most interrupts on these cores.

The Java Code

Below, parts of the inner loop of the producer is shown:


// Pin the producer thread to CPU 2

Affinity.setAffinity(2);

try (ChronicleQueue cq = SingleChronicleQueueBuilder.binary(tmp)         .blockSize(blocksize)         .rollCycle(ROLL_CYCLE)         .build()) {

    ExcerptAppender appender = cq.acquireAppender();

    final long nano_delay = 1_000_000_000L/MSGS_PER_SECOND;

    for (int i = -WARMUP; i < COUNT; ++i) {

        long startTime = System.nanoTime();

        try (DocumentContext dc = appender.writingDocument()) {

            Bytes bytes = dc.wire().bytes();

            data.writeLong(0, startTime);

            bytes.write(data,0, MSGSIZE);

        }

        long delay = nano_delay - (System.nanoTime() - startTime);

        spin_wait(delay);

    }

}


In another thread, the consumer thread is running this code in its inner loop (shortened code):

// Pin the consumer thread to CPU 4

Affinity.setAffinity(4);

try (ChronicleQueue cq = SingleChronicleQueueBuilder.binary(tmp)         .blockSize(blocksize)         .rollCycle(ROLL_CYCLE)         .build()) {

    ExcerptTailer tailer = cq.createTailer();

    int idx = -APPENDERS * WARMUP;

    while(idx < APPENDERS * COUNT) {

        try (DocumentContext dc = tailer.readingDocument()) {

            if(!dc.isPresent())

                continue;

            Bytes bytes = dc.wire().bytes();

            data.clear();

            bytes.read(data, (int)MSGSIZE);

            long startTime = data.readLong(0);

            if(idx >= 0)

                deltas[idx] = System.nanoTime() - startTime;

            ++idx;

        }

    }

}


As can be seen, the consumer thread will read each nano timestamp and record the corresponding latency in an array. These timestamps are later put in a histogram which is printed when the benchmark completes. Measurements will start only after the JVM has warmed up properly and the C2 compiler has JIT:ed the hot execution path.

JVM Variants

Chronicle Queue officially supports all the recent LTS versions: Java 8, Java 11, and Java 17, and so these will be used in the benchmarks. We will also use the GraalVM community and enterprise edition. Here is a list of the specific JVM variants used:


Legend (JVM Variant)

Detail

OpenJDK 8

1.8.0_322, vendor: Temurin

OpenJDK 11

11.0.14.1, vendor: Eclipse Adoptium

OpenJDK 17

17.0.2, vendor: Eclipse Adoptium

Graal VM CE 17

17.0.2, vendor: GraalVM Community

Graal VM EE 17

17.0.2, vendor: Oracle Corporation

Table 1, Shows the specific JVM variants used.

Measurements

As 100,000 messages per second are produced, and the benchmarks run for 100 seconds, there will be 100,000 * 100 = 10 million messages sampled during each benchmark. The histogram used places each sample in a certain percentile: 50% (median), 90%, 99%, 99.9% etc. Here is a table showing the total number of messages received for some percentiles:


Percentile

# Messages

0% (all)

10,000,000

50% (“Median”, used below)

5,000,000

99%

100,000

99.9%

10,000

99.99% (used below)

1,000

99.999%

100

Table 2, Shows the number of messages for each percentile.


Assuming a relatively small variance of the measurement values, the confidence interval is likely reasonable for percentiles up to 99.99%. The percentile 99.999% probably requires gathering data for at least half an hour or so rather than just 100 seconds to produce any figures with a reasonable confidence interval.

Benchmarks Results

For each Java variant, the benchmarks are run like this:

mvn exec:java@QueuePerformance

Remember that our producer and consumer threads will be locked down to run on the isolated CPU cores 2 and 4, respectively. 

Here is what a typical process looks like after it has run for a while:

$ top

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                    

3216555 per.min+  20   0   92.3g   1.5g   1.1g S 200.0   2.3   0:50.15 java 


As can be seen, the producer and consumer thread spin-waits between each message and therefore consumes an entire CPU core each. If CPU consumption is a concern, latency and determinism can be traded against lowered power consumption by parking threads for a short period (e.g. LockSupport.parkNanos(1000)) when no messages are available.

The figures below are given in nanoseconds (ns) which is essential to understand. 

Many other latency measurements are made in microseconds (= 1,000 ns) or even milliseconds (= 1,000,000 ns). One ns corresponds roughly to the access time of a CPU L1 cache.

Here is the result of the benchmarks where all values are given in ns:


JDK Variant

Median

99.99%

OpenJDK 8

280

3,951

OpenJDK 11

370

4,210

OpenJDK 17

290

4,041

GraalVM CE 17 (*)

310

3,950

GraalVM EE 17 (*)

270

3,800

Table 3, Shows the latency figures for the various JDKs used.

(*) Not officially supported by Chronicle Queue.


Typical Latency (median)

For the typical (median) values, there is no significant difference between the various JDKs except for OpenJDK 11 which is about 30% slower than the other versions. 

The fastest of them all is GraalVM EE 17, but the difference compared to OpenJDK 8/OpenJDK 17 is marginal.

Here is a graph with the typical 256-byte message latency for the various JDK variants used (lower is better):

Graph 1, Shows the median (typical) latency in ns for the various JDK variants.

The typical (median) latency varied slightly from run to run where the figures varied around 5%.

Higher Percentiles

Looking at the higher percentiles, there is not much difference between the supported JDK variants either. GraalVM EE is slightly faster again but here the relative difference is even smaller. OpenJDK 11 appears to be marginally worse (- 5%) than the other variants, but the delta is comparable within the estimated margin of error.

Here is another graph showing latencies for the 99.99% percentile for the various JDK variants (lower is better):

Graph 2, Shows the 99.99% percentile latency [ns] for the various JDK variants.

Conclusions

In my opinion, the latency figures of Chronicle Queue are excellent. Accessing 64-bit data from main memory takes about 100 cycles (which corresponds to about 30 ns on current hardware). The code above has some logic that has to be executed. Additionally, Chronicle Queue obtains data from the producer, persists data (writes to a memory-mapped file), applies appropriate memory fencing for inter-thread communication and happens-before guarantees, and then makes data available to the consumer. All this typically happens around 600 ns for 256 bytes compared to the single 64-bit memory access at 30 ns. Very impressive indeed.

OpenJDK 17 and GraalVM EE 17 seem to be the best choices for this application, providing the best latency figures. Consider using GraalVM EE 17 over OpenJDK 17 if outliers need to be suppressed or if you really need the lowest possible overall latency.

Resources

Open-source Chronicle Queue 

Open-source JDK