Monday, December 15, 2014

Java 8, BYO Super Efficient Maps! Quadruple Your Map Performance

Background

Maps are used ubiquitously in almost all Java applications. There are several built-in implementations of the Map interface tailored for different purposes. All these Map implementations rely on calculating a hash value for the key, and using that hash value, we are able to put or get objects from a backing array. But sometimes we know more about the keys than that they are just Objects. In this post I will show how you can take advantage of that fact to write super-fast and super-efficient Map implementations. The implementation devised here is really nothing more than an offsetted array decorated with the Map interface methods. Since it does not have to deal with hashing, rehashing and hash collision, it is unsurprising that it is so fast. Its potential memory efficiency is however, something to elaborate on.

Performance

When comparing one of the Map implementation depicted in this post (namely the IntArrayMap) we will see that it is substantially more memory efficient than any other of the built-in Map implementation.

           HashMap   5470 MB     54 bytes per entry 
           TreeMap   5599 MB     55 bytes per entry
     LinkedHashMap   6271 MB     62 bytes per entry 
 ConcurrentHashMap   5471 MB     54 bytes per entry 
       IntArrayMap    400 MB      4 bytes per entry

(Bytes per entry is excluding size of value object itself.)

Applying the same comparative test but for get() method performance, we will also be able to conclude that the IntArrayMap is much faster than the other Maps.

           HashMap    44864 ms,   111 MTPS
           TreeMap   364277 ms,    13 MTPS
     LinkedHashMap    74616 ms,    67 MTPS
 ConcurrentHashMap    62002 ms,    80 MTPS
       IntArrayMap(*) 29635 ms,   168 MTPS
       IntArrayMap(**) 9520 ms,   525 MTPS

All tests are made using a JIT warmup of 5 000 000 000 iterations and then an additional 5 000 000 000 iterations for actual testing.

So the IntArrayMap is potentially ten times more memory efficient and more than four times quicker than any other built-in Map.

IntArrayMap(*) using the get(Object) method.
IntArrayMap(**) using the get(int) method.

Restrictions

In this post, the keys in the Map implementations must be an Object that extends Number, for example Integer, Long or Byte. The Object must not be able to hold any decimals (excluding the classes Double, Float, BigInteger and BigDecimal). Furthermore the key space must be known in advance. This might first appear as an unacceptable restriction and I agree that this certainly is the case for many applications. However, there are, more often than one might think, times when we know the key space in advance. For example, when dealing with Enums or other static mappings or smaller dynamic maps. A final restriction in the solutions provided in this post is that the key set must be within the range of Integer.

Overall Solution

Because the key space is know, the solutions can rely on direct mapping of the key onto an array that holds the values. So, when we create a Map, we also provide the minBound and maxBound index we want to use. If we, for example, create a Map with key space (10-14) we will create an array with 5 elements (maxBound-minBound+1) and when we get() from that map, we will take the key, subtract it with 10 (the minBound) and then get the value in that array position. The solution is related to the way EnumMap works.

Benefits


  • Much more memory efficient because the key is not stored in the Map. It is instead derived implicitly from the index.
  • Deterministic behavior since we do not have to re-hash when the Map is too small and needs to be grown.
  • Faster put() and get() operations because no hash code needs to be computed.
  • Reduced object creation rate (no boxing/un-boxing) because the put() and get() methods can work directly with primitive classes (e.g. int instead of Integer)


Drawbacks


  • Static key space that must be decided upon before map creation.
  • Keys must extend Number.
  • Less memory efficient for sparse maps (i.e. maps with few entries in the key space).
  • Key space must be within Integer range.
  • The Map implementations does not support null values.


Base Number Array Map

This is how the overall abstract base class NumberArrayMap looks like (yes, I know it it a big class. There are many things a Map has to do):

public abstract class NumberArrayMap<K extends Number, V> extends AbstractMap<K, V> implements Map<K, V>, Serializable {

    private static final long serialVersionUID = -2304239764179123L;

    private int minBound;
    private int maxBound;
    //
    private transient int size;
    private transient int modCount;
    private transient int capacity;
    private transient V[] values;

    private transient Set<Map.Entry<K, V>> entrySet;

    protected NumberArrayMap(int minBound, int maxBound) {
        if (minBound > maxBound) {
            throw new IllegalArgumentException("Error: minBound (" + minBound + ") must not be greater than maxBound (" + maxBound + ")");
        }
        this.minBound = minBound;
        this.maxBound = maxBound;
        init();
    }

    private void init() {
        initCapacity();
        clear();
    }

    @Override
    public int size() {
        return size;
    }

    @Override
    public boolean isEmpty() {
        return size == 0;
    }

    @Override
    public boolean containsKey(Object key) {
        if (key instanceof Number) {
            return containsKey(((Number) key).intValue());
        }
        return false;
    }

    public boolean containsKey(int key) {
        return isInRange(key) && values[index(key)] != null;
    }

    @Override
    public boolean containsValue(Object value) {
        for (final V v : values) {
            if (v != null && v.equals(value)) {
                return true;
            }
        }
        return false;
    }

    @Override
    public V get(Object key) {
        if ((key instanceof Number)) {
            return get(((Number) key).intValue());
        }
        return null;
    }

    public V get(int key) {
        if (isInRange(key)) {
            return values[index(key)];
        }
        return null;
    }

    @Override
    public V put(K key, V value) {
        return put(key.intValue(), value);
    }

    public V put(int key, V value) {
        assertInRange(key);
        Objects.requireNonNull(value, "This Map does not support null values");
        final V oldValue = get(key);
        if (oldValue == null) {
            // A new value, increment size
            size++;
        }
        values[index(key)] = value;
        modCount++;
        return oldValue;
    }

    @Override
    public V remove(Object key) {
        if (key instanceof Number) {
            return remove(((Number) key).intValue());
        }
        return null;
    }

    public V remove(int key) {
        final V oldValue = get(key);
        if (oldValue != null) {
            // An existing value, decrement size
            size--;
        }
        values[index(key)] = null;
        modCount++;
        return oldValue;
    }

    @Override
    public void putAll(Map<? extends K, ? extends V> m) {
        for (final Entry<? extends K, ? extends V> e : m.entrySet()) {
            put(e.getKey(), e.getValue());
        }
    }

    @Override
    public void clear() {
        @SuppressWarnings("unchecked")
        final V[] newValues = (V[]) new Object[capacity];
        this.values = newValues;
        size = 0;
        modCount++;
    }

    @Override
    public Set<Entry<K, V>> entrySet() {
        Set<Entry<K, V>> es;
        return (es = entrySet) == null ? (entrySet = new NumberArrayMap<K, V>.EntrySet()) : es;
    }

    final class EntrySet extends AbstractSet<Map.Entry<K, V>> {

        @Override
        public final int size() {
            return size;
        }

        @Override
        public final void clear() {
            NumberArrayMap.this.clear();
        }

        @Override
        public final Iterator<Map.Entry<K, V>> iterator() {
            return new NumberArrayMap<K, V>.EntryIterator();
        }

        @Override
        public final boolean contains(Object o) {
            if (!(o instanceof Map.Entry)) {
                return false;
            }
            final Map.Entry<?, ?> e = (Map.Entry<?, ?>) o;
            final Object key = e.getKey();
            final V valueCandidate = get(key);
            return valueCandidate != null && valueCandidate.equals(e.getValue());
        }

        @Override
        public final boolean remove(Object o) {
            if (o instanceof Map.Entry) {
                final Map.Entry<?, ?> e = (Map.Entry<?, ?>) o;
                final Object key = e.getKey();
                final V valueCandidate = get(key);
                if (valueCandidate != null && valueCandidate.equals(e.getValue())) {
                    return NumberArrayMap.this.remove(key) != null;
                }
            }
            return false;
        }

        @Override
        public final void forEach(Consumer<? super Map.Entry<K, V>> action) {
            Objects.requireNonNull(action);
            if (size > 0) {
                int mc = modCount;
                for (int i = minBound; i <= maxBound; ++i) {
                    final V value = get(i);
                    if (value != null) {
                        action.accept(makeEntry(makeKeyFromInt(i), value));
                    }
                    if (modCount != mc) {
                        throw new ConcurrentModificationException();
                    }
                }
            }
        }
    }

    private abstract class BucketIterator<N> implements Iterator<N> {

        private Entry<K, V> next;
        private Entry<K, V> current;
        private int expectedModCount;
        private int currentIndex;

        public BucketIterator() {
            expectedModCount = modCount;
            currentIndex = 0;
            if (size > 0) {
                // advance to first entry
                forwardToNext(minBound);
            }
        }

        @Override
        public final boolean hasNext() {
            return next != null;
        }

        public final Entry<K, V> nextEntry() {
            if (modCount != expectedModCount) {
                throw new ConcurrentModificationException();
            }
            if (next == null) {
                throw new NoSuchElementException();
            }
            current = next;
            forwardToNext(currentIndex + 1);
            return current;
        }

        @Override
        public final void remove() {
            final Entry<K, V> p = current;
            if (p == null) {
                throw new IllegalStateException();
            }
            if (modCount != expectedModCount) {
                throw new ConcurrentModificationException();
            }
            current = null;
            final K key = p.getKey();
            NumberArrayMap.this.remove(key);
            expectedModCount = modCount;
        }

        private void forwardToNext(int start) {
            if (start <= maxBound) {
                for (int i = start; i <= maxBound; ++i) {
                    final V value = get(i);
                    if (value != null) {
                        currentIndex = i;
                        next = makeEntry(makeKeyFromInt(i), value);
                        return;
                    }
                }
            }
            // We have reached the end...
            next = null;
        }

    }

    final class KeyIterator extends BucketIterator<K> {

        @Override
        public final K next() {
            return nextEntry().getKey();
        }
    }

    final class ValueIterator extends BucketIterator<V> {

        @Override
        public final V next() {
            return nextEntry().getValue();
        }
    }

    final class EntryIterator extends BucketIterator<Map.Entry<K, V>> {

        @Override
        public final Map.Entry<K, V> next() {
            return nextEntry();
        }
    }

    protected abstract K makeKeyFromInt(int k);

    private Entry<K, V> makeEntry(K key, V value) {
        return new AbstractMap.SimpleImmutableEntry<>(key, value);
    }

    private int index(int key) {
        return key - minBound;
    }

    private boolean isInRange(int key) {
        return (key >= minBound && key <= maxBound);
    }

    private void assertInRange(int key) {
        if (!isInRange(key)) {
            throw new ArrayIndexOutOfBoundsException("Key " + key + " out of range. Key shoud be >= " + minBound + " and <= " + maxBound);
        }
    }

    private void initCapacity() {
        this.capacity = maxBound - minBound + 1;
    }

    private void writeObject(java.io.ObjectOutputStream s) throws IOException {
        s.defaultWriteObject();
        s.writeInt(minBound);
        s.writeInt(maxBound);
        s.writeInt(size);
        for (final Entry<K, V> e : entrySet()) {
            final V value = e.getValue();
            if (value != null) {
                s.writeObject(e.getKey());
                s.writeObject(e.getValue());
            }
        }
    }

    private void readObject(java.io.ObjectInputStream s) throws IOException, ClassNotFoundException {
        s.defaultReadObject();
        minBound = s.readInt();
        maxBound = s.readInt();
        init();
        final int noItems = s.readInt();
        if (noItems < 0) {
            throw new InvalidObjectException("Illegal noItems count: " + noItems);
        }
        if (noItems > 0) {
            // Read the keys and values, and put the mappings in the NumberMap
            for (int i = 0; i < noItems; i++) {
                @SuppressWarnings("unchecked")
                final K key = (K) s.readObject();
                @SuppressWarnings("unchecked")
                final V value = (V) s.readObject();
                put(key, value);
            }
        }
    }

    @SuppressWarnings("unchecked")
    @Override
    public Object clone() throws CloneNotSupportedException {
        NumberArrayMap<K, V> result;
        try {
            result = (NumberArrayMap<K, V>) super.clone();
        } catch (CloneNotSupportedException e) {
            // this shouldn't happen, since we are Cloneable
            throw new InternalError(e);
        }
        result.init();
        result.putAll(this);
        return result;
    }


}


All concrete implementing classes that inherit from this class must implement the K makeKeyFromInt(int k) method in order to be able to reproduce a key for any given index. This is due to the reason that we do not store the keys as the built-in Map implementations do. If, for example, the key type K is Byte, then we need to cast k to byte like this: return (byte)k;  We will see how that works in the next chapter.

It is worth noticing that the normal contains(), get(), remove() and put() methods all have corresponding methods that works directly with int as key. This is important form a performance perspective because that way, we can avoid unnecessary boxing/un-boxing of keys.

By letting our class inherit from AbstractMap we get a number of methods for free, including the hashCode(), equals() and toString() methods. We also elect to override some of the AbstractMap methods for performance reasons.



Sub Classes

Now that the abstract NumberArrayMap class is implemented, it is very easy to create concrete Map implementations for different Number classes such as Integer and Long.  Here is a list of the ones I have made:

public class ByteArrayMap<V> extends NumberArrayMap<Byte, V> {

    private static final long serialVersionUID = -2304239764179124L;
  
    public ByteArrayMap() {
        this(Byte.MIN_VALUE, Byte.MAX_VALUE);
    }

    public ByteArrayMap(byte minBound, byte maxBound) {
        super(minBound, maxBound);
    }

    @Override
    protected Byte makeKeyFromInt(int k) {
        return (byte) k;
    }

}

public class ShortArrayMap<V> extends NumberArrayMap<Short, V> {

    private static final long serialVersionUID = -2304239764179125L;
  
    public ShortArrayMap(short minBound, short maxBound) {
        super(minBound, maxBound);
    }

    @Override
    protected Short makeKeyFromInt(int k) {
        return (short)k;
    }

}


public class IntArrayMap<V> extends NumberArrayMap<Integer, V> {

    private static final long serialVersionUID = -2304239764179126L;
  
    public IntArrayMap(int minBound, int maxBound) {
        super(minBound, maxBound);
    }

    @Override
    protected Integer makeKeyFromInt(int k) {
        return k;
    }

}


public class LongArrayMap<V> extends NumberArrayMap<Long, V> {

    private static final long serialVersionUID = -2304239764179127L;

    public LongArrayMap(int minBound, int maxBound) {
        super(minBound, maxBound);
    }

    @Override
    protected Long makeKeyFromInt(int k) {
        return (long) k;
    }

}



// This class will erroneously map Numbers with equal integers (e.g. 1.2 and 1.3) to the same key
// The same goes for Double and Float, so use only integer Number implementations
@Depricated

public class BigDecimalArrayMap<V> extends NumberArrayMap<BigDecimal, V> {

    private static final long serialVersionUID = -2304239764179128L;

    public BigDecimalArrayMap(int minBound, int maxBound) {
        super(minBound, maxBound);
    }

    @Override
    protected BigDecimal makeKeyFromInt(int k) {
        return new BigDecimal(k);
    }

}



Working with Enums

Often, Enums comes with additional properties of some kind. Consider the following Enum class:

public enum SomeEnum 

    ADAM(0, 4), BERT(1, 42), CHARLIE(2, 134), SOME_DUDE(10, 13);

    private SomeEnum(int value, int luckyNumber) {
        this.value = value;
        this.luckyNumber = luckyNumber;
    }

    private final int value;
    private final int luckyNumber;
    public int getValue() {         return value;     }
    public int getLuckyNumber() {         return luckyNumber;     } }


By implementing a Map support class, we can very easily implement lookup methods (or "finders" as they sometimes are called). Here is the support class:

public class IntEnumArrayMap<E extends Enum<?>> extends IntArrayMap<E> {

    private static final long serialVersionUID = -2304239764179128L;

    public static <E extends Enum<?>> IntEnumArrayMap<E> fromEnums(E[] enums, ToIntFunction<E> function) {
        final int min = Stream.of(enums).mapToInt(function).min().getAsInt();
        final int max = Stream.of(enums).mapToInt(function).max().getAsInt();
        final IntEnumArrayMap<E> result = new IntEnumArrayMap<>(min, max);
        Stream.of(enums).forEach((e) -> result.put(function.applyAsInt(e), e));
        return result;
    }

    public IntEnumArrayMap(int minBound, int maxBound) {
        super(minBound, maxBound);
    }
}

Now we can implement our finders like this:

public enum SomeEnum {

    ADAM(0, 4), BERT(1, 42), CHARLIE(2, 134), SOME_DUDE(10, 13);

    private SomeEnum(int value, int luckyNumber) {
        this.value = value;
        this.luckyNumber = luckyNumber;
    }

    private final int value;
    private final int luckyNumber;

    private final static IntEnumArrayMap<SomeEnum> valueMap =
             IntEnumArrayMap.fromEnums(values(), SomeEnum::getValue);
    private final static IntEnumArrayMap<SomeEnum> luckyNumberMap =
             IntEnumArrayMap.fromEnums(values(), SomeEnum::getLuckyNumber);

    public static SomeEnum findByValue(int value) {
        return valueMap.get(value);
    }

    public static SomeEnum findByLuckyNumber(int value) {
        return luckyNumberMap.get(value);
    }

    public int getValue() {
        return value;
    }

    public int getLuckyNumber() {
        return luckyNumber;
    }

}

Very convenient and efficient! We know the key set of the Map we are creating so the NumberArrayMap seams like a perfect fit. Note how nice we just provide the method we want to use for mapping (e.g. SomeEnum::getValue) using Java 8's new function interface.


Future work

The implementation above relies on the inherited basic Spliterator from AbstractMap. It would be advantages to write a better Spliterator that can parallelize the entrySet(). Any one up to the challenge?

Perhaps it would be possible to create other type of Map implementations that can handle other types of key sets (i.e other than Number) that can be mapped to int. Enum is one candidate (using the ordinal() as key) but there is already an EnumMap.

Grab the source code using "git clone https://github.com/minborg/javapot.git" and expand the code and also see how I made the performance tests.

Concusion

If you know your key space, you can create much more efficient Map implementations than the built-in Java ones. Enums are a perfect place where these Map implementations can live. Static maps and small maps that are created dynamically are also a perfect match.

Let's Map the World!

1 comment:

Note: Only a member of this blog may post a comment.