Wednesday, January 19, 2022

Did You Know the Fastest Way of Serializing a Java Field is not Serializing it at All?

 

Did You Know the Fastest Way of Serializing a Java Field is not Serializing it at All? 


This article elaborates on different ways of serializing Java objects and benchmarks performance for the variants. Read this article and become aware of different ways to improve Java serialization performance.


In a previous article about open-source Chronicle Queue, there was some benchmarking and method profiling indicating that the speed of serialization had a significant impact on execution performance. After all, this is only to be expected as Chronicle Queue (and other persisted queue libraries) must convert Java objects located on the heap to binary data which is subsequently stored in files. Even for the most internally efficient libraries, this inevitable serialization procedure will largely dictate performance.

Data Transfer Object

In this article, we will use a Data Transfer Object (hereafter DTO) named  MarketData which contains financial information with a relatively large number of fields. The same principles apply to other DTOs in any other business area.


abstract class MarketData extends SelfDescribingMarshallable {


    long securityId;

    long time;


    // bid and ask quantities

    double bidQty0, bidQty1, bidQty2, bidQty3;

    double askQty0, askQty1, askQty2, askQty3;

    // bid and ask prices

    double bidPrice0, bidPrice1, bidPrice2, bidPrice3;

    double askPrice0, askPrice1, askPrice2, askPrice3;


    // Getters and setters not shown for clarity

}

Default Serialization

Java’s Serializable marker interface provides a default way to serialize Java objects to/from binary format, usually via the ObjectOutputStream and ObjectInputStream classes. The default way (whereby the magic writeObject() and readObject() are not explicitly declared) entails reflecting over an object's non-transient fields and reading/writing them one by one, which can be a relatively costly operation.


Chronicle Queue can work with Serializable objects but also provides a similar, but faster and more space-efficient way to serialize data via the abstract class SelfDescribingMarshallable. Akin to Serializable objects, this relies on reflection but comes with substantially less overhead in terms of payload, CPU cycles, and garbage.


Default serialization often comprises the steps of:

  • Identifying the non-transient fields using reflection

  • Reading/writing the identified non-transient field values using reflection

  • Writing/reading the field values to a target format (eg binary format)


The identification of non-transient fields can be cached, eliminating this step to improve performance.


Here is an example of a class using default serialization:


public final class DefaultMarketData extends MarketData {}


As can be seen, the class does not add anything over its base class and so it will use default serialization as transitively provided by SelfDescribingMarshallable.

Explicit Serialization

Classes implementing Serializable can elect to implement two magic private (sic!) methods whereby these methods will be invoked instead of resorting to default serialization.


This provides full control of the serialization process and allows fields to be read using custom code rather than via reflection which will improve performance. A drawback with this method is that if a field is added to the class, then the corresponding logic must be added in the two magic methods above or else the new field will not participate in serialization. Another problem is that private methods are invoked by external classes. This is a fundamental violation of encapsulation.


SelfDescribingMarshallable classes work in a similar fashion but thankfully it does not rely on magic methods and invoking private methods externally. A SelfDescribingMarshallable class provides two fundamentally different concepts of serializing: one via an intermediary Chronicle Wire open-source (which can be binary, text, YAML, JSON, etc) providing flexibility and one implicitly binary providing high performance. We will take a closer look at the latter one in the sections below.


Here is an example of a class using explicit serialization whereby public methods in implementing interfaces are explicitly declared:


public final class ExplicitMarketData extends MarketData {

    @Override

    public void readMarshallable(BytesIn bytes) {

        securityId = bytes.readLong();

        time = bytes.readLong();

        bidQty0 = bytes.readDouble();

        bidQty1 = bytes.readDouble();

        bidQty2 = bytes.readDouble();

        bidQty3 = bytes.readDouble();

        askQty0 = bytes.readDouble();

        askQty1 = bytes.readDouble();

        askQty2 = bytes.readDouble();

        askQty3 = bytes.readDouble();

        bidPrice0 = bytes.readDouble();

        bidPrice1 = bytes.readDouble();

        bidPrice2 = bytes.readDouble();

        bidPrice3 = bytes.readDouble();

        askPrice0 = bytes.readDouble();

        askPrice1 = bytes.readDouble();

        askPrice2 = bytes.readDouble();

        askPrice3 = bytes.readDouble();

    }


    @Override

    public void writeMarshallable(BytesOut bytes) {

        bytes.writeLong(securityId);

        bytes.writeLong(time);

        bytes.writeDouble(bidQty0);

        bytes.writeDouble(bidQty1);

        bytes.writeDouble(bidQty2);

        bytes.writeDouble(bidQty3);

        bytes.writeDouble(askQty0);

        bytes.writeDouble(askQty1);

        bytes.writeDouble(askQty2);

        bytes.writeDouble(askQty3);

        bytes.writeDouble(bidPrice0);

        bytes.writeDouble(bidPrice1);

        bytes.writeDouble(bidPrice2);

        bytes.writeDouble(bidPrice3);

        bytes.writeDouble(askPrice0);

        bytes.writeDouble(askPrice1);

        bytes.writeDouble(askPrice2);

        bytes.writeDouble(askPrice3);

    }

}


It can be concluded that this scheme relies on reading or writing each field explicitly and directly, eliminating the need to resort to slower reflection. Care must be taken to ensure fields are referenced in a consistent order and class fields must also be added to the methods above. 

Trivially Copyable Serialization

The concept of Trivially Copyable Java Objects is derived from and inspired by C++. 


As can be seen, the MarketData class above contains only primitive fields. In other words, there are no reference fields like String, List or the like. This means that when the JVM lays out the fields in memory, field values can be put adjacent to one another. The way fields are laid out is not specified in the Java standard which allows for individual JVM implementation optimizations. 


Many JVMs will sort primitive class fields in descending field size order and lay them out in succession. This has the advantage that read and write operations can be performed on even primitive type boundaries. Applying this scheme on the  ExplicitMarketData for example will result in the long time field being laid out first and, assuming we have the initial field space 64-bit aligned, allows the field to be accessed on an even 64-bit boundary. Next, the int securityId might be laid out, allowing it and all the other 32-bit fields to be accessed on an even 32-bit boundary. 


Imagine instead if an initial byte field were initially laid out, then subsequent larger fields would have to be accessed on uneven field boundaries. This would add a performance overhead for some operations, and would indeed prevent a small set of operations from being performed at all (eg unaligned CAS operations on the ARM architecture).


How is this relevant to high-performance serialization? Well, as it turns out, it is possible to access an object’s field memory region directly via Unsafe and use memcpy to directly copy the fields in one single sweep to memory or to a memory-mapped file. This effectively bypasses individual field access and replaces, in the example above, the many individual field accesses with a single bulk operation. 


The way this can be done in a correct, convenient, reasonably portable and safe way is outside the scope of this article. Luckily, this feature is readily available in Chronicle Queue, open-source Chronicle Bytes and other similar products out-of-the-box.


Here is an example of a class using trivially copyable serialization:


import static net.openhft.chronicle.bytes.BytesUtil.*;


public final class TriviallyCopyableMarketData extends MarketData {


    static final int START = 

            triviallyCopyableStart(TriviallyCopyableMarketData.class);

    

    static final int LENGTH = 

            triviallyCopyableLength(TriviallyCopyableMarketData.class);


    @Override

    public void readMarshallable(BytesIn bytes) {

        bytes.unsafeReadObject(this, START, LENGTH);

    }


    @Override

    public void writeMarshallable(BytesOut bytes) {

        bytes.unsafeWriteObject(this, START, LENGTH);

    }


}


This pattern lends itself well to scenarios where the DTO is reused. Fundamentally, It relies on invoking Unsafe under the covers for improved performance.

Benchmarks

Using JMH, serialization performance was assessed for the various serialization alternatives above using this class:


@State(Scope.Benchmark)

@BenchmarkMode(Mode.AverageTime)

@OutputTimeUnit(NANOSECONDS)

@Fork(value = 1, warmups = 1)

@Warmup(iterations = 5, time = 200, timeUnit = MILLISECONDS)

@Measurement(iterations = 5, time = 500, timeUnit = MILLISECONDS)

public class BenchmarkRunner {


    private final MarketData defaultMarketData = new DefaultMarketData();

    private final MarketData explicitMarketData = new ExplicitMarketData();

    private final MarketData triviallyCopyableMarketData = new TriviallyCopyableMarketData();

    private final Bytes<Void> toBytes = Bytes.allocateElasticDirect();

    private final Bytes<Void> fromBytesDefault = Bytes.allocateElasticDirect();

    private final Bytes<Void> fromBytesExplicit = Bytes.allocateElasticDirect();

    private final Bytes<Void> fromBytesTriviallyCopyable = Bytes.allocateElasticDirect();


    public BenchmarkRunner() {

        defaultMarketData.writeMarshallable(fromBytesDefault);

        explicitMarketData.writeMarshallable(fromBytesExplicit);

        triviallyCopyableMarketData.writeMarshallable(fromBytesTriviallyCopyable);

    }


    public static void main(String[] args) throws Exception {

        org.openjdk.jmh.Main.main(args);

    }


    @Benchmark

    public void defaultWrite() {

        toBytes.writePosition(0);

        defaultMarketData.writeMarshallable(toBytes);

    }


    @Benchmark

    public void defaultRead() {

        fromBytesDefault.readPosition(0);

        defaultMarketData.readMarshallable(fromBytesDefault);

    }


    @Benchmark

    public void explicitWrite() {

        toBytes.writePosition(0);

        explicitMarketData.writeMarshallable(toBytes);

    }


    @Benchmark

    public void explicitRead() {

        fromBytesExplicit.readPosition(0);

        explicitMarketData.readMarshallable(fromBytesExplicit);

    }


    @Benchmark

    public void trivialWrite() {

        toBytes.writePosition(0);

        triviallyCopyableMarketData.writeMarshallable(toBytes);

    }


    @Benchmark

    public void trivialRead() {

        fromBytesTriviallyCopyable.readPosition(0);

        triviallyCopyableMarketData.readMarshallable(fromBytesTriviallyCopyable);

    }

}


This produced the following output on a MacBook Pro (16-inch, 2019) with 2.3 GHz 8-Core Intel Core i9 CPU under JDK 1.8.0_312, OpenJDK 64-Bit Server VM, 25.312-b07:


Benchmark                      Mode  Cnt   Score   Error  Units

BenchmarkRunner.defaultRead    avgt    5  88.772 ± 1.766  ns/op

BenchmarkRunner.defaultWrite   avgt    5  90.679 ± 2.923  ns/op

BenchmarkRunner.explicitRead   avgt    5  32.419 ± 2.673  ns/op

BenchmarkRunner.explicitWrite  avgt    5  38.048 ± 0.778  ns/op

BenchmarkRunner.trivialRead    avgt    5   7.437 ± 0.339  ns/op

BenchmarkRunner.trivialWrite   avgt    5   7.911 ± 0.431  ns/op


Using the various MarketData variants, explicit serialization is more than two times faster than default serialization. Trivially copyable serialization is four times faster than explicit serialization and more than ten times faster than default serialization as illustrated in the graph below (lower is better):


More fields generally favour trivially copyable serialization over explicit serialization. Experience shows break-even is reached at around six fields in many cases. 


Interestingly, the concept of trivially copyable can be extended to hold data normally stored in reference fields such as a String or an array field. This will provide even more relative performance increase for such classes.  Contact the Chronicle team if you want to learn more,

Why Does it Matter?

Serialization is a fundamental feature of externalizing DTOs to persistent queues, sending them over the wire or putting them in an off-heap Map and otherwise handling DTOs outside the Java heap. Such data-intensive applications will almost always gain performance and experience reduced latencies when the underlying serialization performance is improved.

Resources

Chronicle Queue (open-source)

GitHub Chronicle Bytes (open-source)


Wednesday, January 12, 2022

How the Java Language Could Better Support Composition and Delegation

 

How the Java Language Could Better Support Composition and Delegation


This article outlines a way of improving the Java language to better support composition and delegation. Engage in the discussion and contribute to evolving the Java Language.


The Java language lacks explicit semantic support for composition and delegation. This makes delegating classes hard to write, error-prone, hard to read and maintain. For example, delegating a JDBC ResultSet interface entails writing more than 190 delegating methods that essentially provide no additional information, as illustrated at the end of this article, and only add ceremony.


More generally, in the case of composition, Σ m(i) delegating methods need to be written where m(i) is the number of methods for delegate i (provided that all delegate method signatures are disjunct across all the delegates).  


The concept of language support for delegation is not new and there are numerous articles on the subject, including [Bettini08] and [Kabanov11]. Many other programming languages like Kotlin (“Derived”)  and Scala (“export”) have language support for delegation.

In one of my previous articles ”Why General Inheritance is Flawed and How to Finally Fix it”, I described why composition and delegation are so important.


External Tools

Many IDEs have support for generating delegated methods. However, this neither impacts the ability to read nor understand a delegating class. Studies show that code is generally more read than written. There are third-party libraries that provide delegation (e.g. Lombok) but these are non-standard and provide a number of other drawbacks.


More generally, it would be possible to implement a subset of the functionality proposed here in third-party libraries leveraging annotation processors and/or dynamic proxies.


Trends and Industry Standards

As the drawbacks with inheritance were more deeply understood, the trend is to move towards composition instead. With the advent of the module system and generally stricter encapsulation policies, the need for semantic delegation support in the Java language has increased even more.


 I think this is a feature that is best provided within the language itself and not via various third-party libraries. Delegation is a cornerstone of contemporary coding. 


In essence, It should be much easier to “Favor composition over inheritance” as stated in the book “Effective Java” by Joshua Bloch  [Bloch18, Item 18].


Java Record Classes

Many of the problems identified above were also true for data classes before record classes were introduced in Java 14. Upon more thorough analysis, there might be a substantial opportunity to harvest many of the findings made during the development of records and apply these in the field of delegation and composition.


On the Proposal

My intention with this article is not to present a concrete proposal of a way to introduce semantic support for composition and delegation in Java. On the contrary, if this proposal is one of the often 10-15 different discarded initial proposals and sketches on the path that needs to be traversed before a real feature can be proposed in the Java language, it will be a huge success. The way towards semantic support for composition and delegation in Java is likely paved with a number of research papers, several design proposals, incubation, etc. This feature will also compete against other features, potentially deemed to be more important to the Java ecosystem as a whole.


One motto for records was “model data as data” and I think that we should also “model delegation as delegation”.  But what is delegation? There are likely different views on this within the community. 


When I think of delegation, the following springs to mind: A delegating class has the following properties:


  1.     Has one or more delegates

  2.     Delegates methods from its delegates

  3.     Encapsulates its delegates completely

  4.     Implements and/or uses methods from its delegates (arguably)



An Outline - The Emissary

In the following, I will present an outline to tackle the problem. In order to de-bikeshed the discussion, I will introduce a new keyword placeholder called “emissary” which is very unlikely ever to be used in a real implementation. This word could later be replaced by “delegator” or any other descriptive word suitable for the purpose or perhaps even an existing keyword.


 An emissary class has many similarities to a record class and can be used as shown in the example below:


public emissary Bazz(Foo foo, Bar bar);


As can be seen, the Bazz class has two delegates (Foo and Bar) and consequently an equivalent desugared class  is created having two private final fields:


private final Foo foo;

private final Bar bar;


An emissary class is also provided with a constructor. This process could be the same as for records with canonical and compact constructors:


public final class Bazz {


    private final Foo foo;

    private final Bar bar;


    public Bazz(Foo foo, Bar bar) {

       this.foo = foo;

       this.bar = bar;

    }


}


It also makes the emissary class implement Foo and Bar. Because of this, Foo and Bar must be interfaces and not abstract or concrete classes. (In a variant of the current proposal, the implementing interfaces could be explicitly declared).


public final class Bazz implements Foo, Bar {

    private final Foo foo;

    private final Bar bar;


   public Bazz(Foo foo, Bar bar) {

       this.foo = foo;

       this.bar = bar;

   }


}


Now, in order to continue the discussion, we need to describe the example classes Foo and Bar a bit more which is done hereunder:


public interface Foo() {


    void f();


}


public interface Bar() {


    void b();


}


By declaring an emissary class we, unsurprisingly, also get the actual delegation methods so that Bazz will actually implement its interfaces Foo and Bar:


public final class Bazz implements Foo, Bar {


    private final Foo foo;

    private final Bar bar;


    public Bazz(Foo foo, Bar bar) {

        this. Foo = foo;

        this.bar = bar;

    }


    @Override

    void f() {

        foo.f();

    }


    @Override

    void b() {

        bar.b();

    }


}


If the delegates contain methods with the same signature, these would have to be explicitly “de-ambigued”, for example in the same way as default methods in interfaces. Hence, if Foo and Bar both implements c() then Bazz needs to explicitly declare c() to provide reconciliation. One example of this is shown here where both delegates are invoked:


@Override

void c() {

    foo.c();

    bar.c();

}


Nothing prevents us from adding additional methods by hand, for example, to implement additional interfaces the emissary class explicitly implements but that is not covered by any of the delegates.


It is also worth noting that the proposed emissary classes should not get a hashCode(), equals() or toString() methods generated. If they did, they would violate property C and leak information about its delegates. For the same reason, there should be no de-constructor for an emissary class as this bluntly would break encapsulation. Emissary classes should not implement Serializable and the likes by default.


An emissary class, just like a record class, is immutable (or at least unmodifiable and therefore shallowly immutable) and is hence thread-safe if all the delegates are.


Finally, an emissary class would extend java.lang.Emissary, a new proposed abstract class similar to java.lang.Enum and java.lang.Record.


Comparing Record with Emissary

Comparing the existing record and the proposed emissary classes yield some interesting facts:


record


  • Provides a generated hashCode() method

  • Provides a generated equals() method

  • Provides a generated toString() method

  • Provides component getters

  • Cannot declare instance fields other than the private final fields which correspond to components of the state description


emissary


  • Does not provide a generated hashCode() method

  • Does not provide a generated equals() method

  • Does not provide a generated  toString() method

  • Provides delegating methods

  • Implements delegates (in one variant)

  • Can declare additional final instance fields other than the private final fields which correspond to delegates


both 


  • A private final field for each component/delegate of the state description

  • A public constructor, whose signature is the same as the state/delegate description, that initializes each field from the corresponding argument; (canonical constructor and compact constructor)

  • Gives up the ability to decouple API from representation

  • Implicitly final, and cannot be abstract (ensuring immutability)

  • Cannot extend any other class (ensures immutability)

  • Extends a java.lang class other than Object.

  • Can declare additional methods not covered by the properties/delegates


Anticipated Use Cases


Here are some use cases of the emissary class:


Composition


Providing an Implementation for one or several interfaces using composition:


  public emissary FooAndBar(Foo foo, Bar bar);


Encapsulation


Encapsulating an existing instance of a class, hiding the details of the actual implementation:


  private emissary EncapsulatedResultSet(ResultSet resultSet);


  …


  ResultSet rs = stmt.executeQuery(query);


  return new EncapsulatedResultSet(rs);


Disallow down-casting


Disallow the down-casting of an instance. I.e. an emissary class implements a restricted sub-set of its delegate’s methods where the non-exposed methods cannot be invoked via casting or reflection. 


String implements CharSequence and in the example below, we provide a String viewed as a CharSequence whereby we cannot down-cast the CharSequence wrapper back to a String


  private emissary AsCharSequence(CharSequence s);


  return new AsCharSequence(“I am a bit incognito.”);


Services and Components


Providing an implementation of an interface that has an internal implementation. The internal component package is typically not exported in the module-info file:


  public emissary MyComponent(MyComponent comp) {


      public MyComponent() {

          this(new InternalMyComponentImpl());

      }


      // Optionally, we may want to hide the public 

      // constructor

      private MyComponent(MyComponent comp) {

         this.comp = comp;

      } 


  }


  MyComponent myComp = ServiceLoader.load(MyComponent.class)

                           .iterator()

                           .next();


Note: If InternalMyComponentImpl is composed of an internal base class, contains annotations, has non-public methods, has fields etc. These will be completely hidden from direct discovery via reflection by the emissary class and under JPMS, it will be completely protected from deep reflection. 


Comparing Two ResultSet Delegators


Comparison between two classes delegating a ResultSet:


Emissary Class


// Using an emissary class. A one-liner

public emissary EncapsulatedResultSet(ResultSet resultSet);


IDE Generation


// Using automatic IDE delegation. About 1,000 lines!

public final class EncapsulatedResultSet implements ResultSet {


    private final ResultSet delegate;


    public EncapsulatedResultSet(ResultSet delegate) {

        this.delegate = delegate;

    }


    @Override

    public boolean next() throws SQLException {

        return delegate.next();

    }


  // About 1000 additional lines are not shown here for brevity…



Conclusions


We may conceptually reuse record classes for providing semantic composition and delegation support in the Java language. This would greatly reduce the language ceremony needed for these kinds of constructs and would very likely nudge developers towards using composition just like record classes nudged developers towards immutability. 


The scientific field of composition and delegation and what is related to is much bigger than indicated in this article. Further studies are needed before arriving at a concrete proposal. Perhaps this is just a part of something bigger?


Language support for composition and delegation in some form would make Java an even better language in my opinion.


References

[Bettini08]

Bettini, Lorenzo. “Typesafe dynamic object delegation in class-based languages”, PPPJ '08: Proceedings of the 6th international symposium on Principles and practice of programming in Java, September 2008, Pages 171–180, https://doi.org/10.1145/1411732.1411756


[Kabanov11]

Kabanov, Jevgeni. “On designing safe and flexible embedded DSLs with Java 5”, Science of Computer Programming, Volume 76, Issue 11, November 2011 pp 970–991, https://doi.org/10.1016/j.scico.2010.04.005


[Bloch18]

Bloch, Joshua., Effective Java, Third Edition, ISBN 0-13-468599-7, 2018