Wednesday, June 13, 2018

Infinite Sets in Java 9

A Set

A Set is a collection of elements whereby any given element in the Set only appears once. More formally, a set contains no pair of elements e1 and e2 such that e1.equals(e2). We can easily create Set in Java 9 like this:
and use a

final Set<Integer> s = Set.of(1, 2, 3);
System.out.println(s);

This might produce the following output:
[2, 3, 1]

The Set produced above is immutable, i.e. it cannot change and it is also finite because there is a distinct number of elements in the Set, namely three. The order in which the elements are returned via its read methods (such as stream(), iterator() and forEach()) is unspecified.

An Infinite Set

An infinite set contains an unlimited number of elements. One example of an infinite set is the set of all integers [..., -1, 0, 1, 2, ...] where an integer is not of a Java Integer class but an integer according to the mathematical definition of an integer whereby there is always a larger integer n+1 for any given integer n.

There are many infinite sets such as the set of all primes, the set of even integers, the set of fibonacci numbers etc.

For obvious reasons, we cannot precompute and store all the elements of an infinite Java Set. If we try, we would eventually run out of memory.

A fundamental question we have to ask ourselves is: Are there actually infinite sets for the Java types we have? If we have a Set<Byte> there are at most 256 elements in the Set and that is far from infinite, same reasoning goes for Short and even Integer. After all, there are only about four billion different Integer objects and if we would use a bit-map to represent membership, we could fit a Set<Integer> in just 0.5 GB. Albeit big, is not infinite.

But if we are talking about Long or String elements, we are approaching at least virtually infinite sets. To store a bitmap of all Longs would require a number of PB of internal storage. A true infinite Set would be a Set of String with all possible combination of characters [a-z] of any length.

Before we continue, I would like to mention that the code in this post is also available on GitHub as described at the very end of the post.

The ImmutableStreamSet

To move away from a paradigm where we store the elements of a Set, we could create an ImmutableStreamSet that defines the elements of the Set only through its stream() method. The ImmutableStreamSet could be defined as a FunctionalInterface like this:

@FunctionalInterface
public interface ImmutableStreamSet<E> extends Set<E> {

    // This is the only method we need to implements
    @Override
    public Stream<E> stream(); 

    @Override
    default int size() {
        return (int) stream().limit(Integer.MAX_VALUE).count();
    }

    @Override
    default boolean contains(Object o) {
        return stream().anyMatch(e -> Objects.equals(e, o));
    }

    @Override
    default boolean containsAll(Collection<?> c) {
        return (this == c) ? true : c.stream().allMatch(this::contains);
    }

    @Override
    default boolean isEmpty() {
        return !stream().findAny().isPresent();
    }

    @Override
    default <T> T[] toArray(T[] a) {
        return stream().collect(toList()).toArray(a);
    }

    @Override
    default Object[] toArray() {
        return stream().toArray();
    }

    @Override
    default Spliterator<E> spliterator() {
        return stream().spliterator();
    }

    @Override
    default Iterator<E> iterator() {
        return stream().iterator();
    }

    @Override
    default Stream<E> parallelStream() {
        return stream().parallel();
    }

    @Override
    default void forEach(Consumer<? super E> action) {
        stream().forEach(action);
    }

    // We are immutable
    @Override
    default boolean removeIf(Predicate<? super E> filter) {
        throw new UnsupportedOperationException();
    }

    @Override
    default void clear() {
        throw new UnsupportedOperationException();
    }

    @Override
    default boolean removeAll(Collection<?> c) {
        throw new UnsupportedOperationException();
    }

    @Override
    default boolean retainAll(Collection<?> c) {
        throw new UnsupportedOperationException();
    }

    @Override
    default boolean addAll(Collection<? extends E> c) {
        throw new UnsupportedOperationException();
    }

    @Override
    default boolean remove(Object o) {
        throw new UnsupportedOperationException();
    }

    @Override
    default boolean add(E e) {
        throw new UnsupportedOperationException();
    }

    static <E> ImmutableStreamSet<E> of(Supplier<Stream<E>> supplier) {
        // Check out GitHub to see how this Impl class is implemented
        return new ImmutableStreamSetImpl<>(supplier);
    }

}

Awesome, now we can create infinite sets by just providing a stream supplier like this:

    ImmutableStreamSet<Long> setOfAllLong
            = LongStream.rangeClosed(Long.MIN_VALUE, Long.MAX_VALUE)::boxed;

This will create a Set of all Long values (e.g. with 2^64 elements).  When providing a stream supplier, it is imperative to make sure to adhere to the Set property of element uniqueness.  Consider the following illegal Set:

final ImmutableStreamSet<Long> illegalSet = 
            () -> Stream.of(1l, 2l, 1l);
Clearly, 11 occurs two times in the set which makes this object violate the Set requirements.

As we will see, it would be better to create concrete classes of the infinite sets we are considering. One particular problem with the default implementation above is that the contains() method might be very slow. Read the next chapters and find out why and how to solve it.

PositiveLongSet

Let us assume that we want to create a Set with all the positive long values and that we want to be able to use the Set efficiently with other sets and objects. This is how we can go about:

public final class PositiveLongSet implements ImmutableStreamSet<Long> {

    public static final PositiveLongSet INSTANCE = new PositiveLongSet();

    private PositiveLongSet() {
    }

    @Override
    public Stream<Long> stream() {
        return LongStream.rangeClosed(1, Long.MAX_VALUE).boxed();
    }

    @Override
    public int size() {
        return Integer.MAX_VALUE;
    }

    @Override
    public boolean contains(Object o) {
        return SetUtil.contains(this, Long.class, other -> other > 0, o);
    }

    @Override
    public boolean isEmpty() {
        return false;
    }

    @Override
    public String toString() {
        return SetUtil.toString(this);
    }

}

Note how we comply with the formal requirement in the method size() where we return Integer.MAX_VALUE even though the Set is much larger. If Set had been defined today, it is likely that size() would have returned a long instead of an int. But in the beginning of the 90s, internal RAM was usually less than 1 GB. We are using two utility methods in the class:

The SetUtil.toString() takes a Set, iterates over the first eight elements and returns a String representation of those elements.

The SetUtil.contains() method takes a Set, the Element type class (here Long.class) and a Predicate that is called if the object we are comparing agains is of the given Element type class (if the object we are comparing against is null or of another type, then trivially the Set does not contain the object).

Here is how the SetUtil looks like:

final class SetUtil {

    private static final int TO_STRING_MAX_ELEMENTS = 8;

    static <E> String toString(Set<E> set) {
        final List<String> first = set.stream()
            .limit(TO_STRING_MAX_ELEMENTS + 1)
            .map(Object::toString)
            .collect(toList());

        final String endMarker = first.size() > TO_STRING_MAX_ELEMENTS ? ", ...]" : "]";

        return first.stream()
            .limit(TO_STRING_MAX_ELEMENTS)
            .collect(
                joining(", ", "[", endMarker)
            );
    }

    static <E> boolean contains(
        final Set<E> set,
        final Class<E> clazz,
        final Predicate<E> predicate,
        final Object o
    ) {
        if (o == null) {
            return false;
        }
        if (!(clazz.isAssignableFrom(o.getClass()))) {
            return false;
        }
        final E other = clazz.cast(o);
        return predicate.test(other);
    }

}

Armed with the classes ImmutableStreamSet and SetUtil we can now easily create other infinite sets like PostitiveEvenLongSet (not shown hereunder, try writing it by yourself), PrimeLongSet (containing all primes that can be represented by a Long) and even FibonacciLongSet (containing all fibonacci numbers that can be represented by a Long). Here is how these classes may look like:

PrimeLongSet


public final class PrimeLongSet implements ImmutableStreamSet<Long> {

    public static final PrimeLongSet INSTANCE = new PrimeLongSet();

    private PrimeLongSet() {
    }

    private static final LongPredicate IS_PRIME =
        x -> LongStream.rangeClosed(2, (long) Math.sqrt(x)).allMatch(n -> x % n != 0);

    @Override
    public Stream<Long> stream() {
        return LongStream.rangeClosed(2, Long.MAX_VALUE)
            .filter(IS_PRIME)
            .boxed();
    }

    @Override
    public int size() {
        return Integer.MAX_VALUE; 
    }

    @Override
    public boolean contains(Object o) {
        return SetUtil.contains(this, Long.class, IS_PRIME::test, o);
    }

    @Override
    public boolean isEmpty() {
        return false;
    }

    @Override
    public String toString() {
        return SetUtil.toString(this);
    }

}

FibonacciLongSet


public final class FibonacciLongSet implements ImmutableStreamSet<Long> {

    public static final FibonacciLongSet INSTANCE = new FibonacciLongSet();

    private FibonacciLongSet() {
    }

    @Override
    public Stream<Long> stream() {
        return Stream.concat(
            Stream.of(0l),
            Stream.iterate(new Fibonacci(0, 1), Fibonacci::next)
                .mapToLong(Fibonacci::getAsLong)
                .takeWhile(fib -> fib > 0)
                .boxed()
        );
    }

    @Override
    public int size() {
        return 92;
    }

    @Override
    public boolean contains(Object o) {
        return SetUtil.contains(
            this,
            Long.class,
            other -> stream().anyMatch(fib -> Objects.equals(fib, other)),
            o
        );
    }

    @Override
    public boolean isEmpty() {
        return false;
    }

    @Override
    public String toString() {
        return SetUtil.toString(this);
    }

    private static class Fibonacci {

        final long beforeLast;
        final long last;

        public Fibonacci(long beforeLast, long last) {
            this.beforeLast = beforeLast;
            this.last = last;
        }

        public Fibonacci next() {
            return new Fibonacci(last, last + beforeLast);
        }

        public long getAsLong() {
            return beforeLast + last;
        }

    }

}

Note how we are using Stream::takeWhile to break the stream when long wraps around to a negative value. Arguably, we are "cheating" when we precompute and provide a size of 92 but otherwise size() would have been a bit slower.

Stitching it all up

By providing an interface with static providers to instances of these classes, we can encapsulate our predefined sets and make sure that there are only one instance of them in the JVM like this:
public interface Sets {

    static Set<Long> positiveLongSet() {
        return PositiveLongSet.INSTANCE;
    }

    static Set<Long> positiveEvenLongSet() {
        return PositiveEvenLongSet.INSTANCE;
    }

    static Set<Long> primeLongSet() {
        return PrimeLongSet.INSTANCE;
    }

    static Set<Long> fibonacciLongSet() {
        return FibonacciLongSet.INSTANCE;
    }

}

We could also encapsulate our code in a Java 9 module to make sure only the classes Sets and ImmutableStreamSet are visible by exposing them in the projects top-most package and putting all the other classes in a package named "internal" (which is not exposed). This is how our module-info.java could look like provided that the two exposed classes are in the com.speedment.infinite_sets and the implementation classes in a package like com.speedment.infinite_sets.internal:

module-info.java

module com.speedment.infinite_sets {
    exports com.speedment.infinite_sets;
}

Trying it Out

We can now create another module that is using our infinite sets by first declaring usage of our existing module like this:

module-info.java

module Infinite_sets_app {
    requires com.speedment.infinite_sets;
}


And then we have access to the exposed parts of the module. here is one way of trying out the infinite sets:

import static com.speedment.infinite_sets.Sets.*;

public class Main {

    public static void main(String[] args) {

        Stream.of(
            Set.of(1, 2, 3),
            positiveLongSet(),
            positiveEvenLongSet(),
            primeLongSet(),
            fibonacciLongSet()
        ).forEachOrdered(System.out::println);

        // This actually completes fast due to identity equality
        positiveLongSet().containsAll(positiveLongSet());

    }

}

This might produce the following output:


[3, 2, 1]
[1, 2, 3, 4, 5, 6, 7, 8, ...]
[2, 4, 6, 8, 10, 12, 14, 16, ...]
[2, 3, 5, 7, 11, 13, 17, 19, ...]
[0, 1, 2, 3, 5, 8, 13, 21, ...]


Engage on GitHub


The source code in this post is available on GitHub here.




Game, Set and Match...



You can read a Java Set Example and learn more about Set fundamentals.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.