Minborg

Minborg
Minborg

Wednesday, December 10, 2014

Java 8, Initializing Maps in the Smartest Way

Background

Create maps
Illustration: Elis Minborg
When I was a kid, I was taught that, in the Swedish language, one should write out numbers using their text representation in the range from zero to twelve, otherwise one should use their number representation. For example, one should write "There are two birds and 21 flowers" (but in Swedish then of course).

In Java, we are likely to use Maps to perform such translations. The question in this post is: How does one initialize Maps in the best way? What are the alternatives and what are their pros and cons?


Objective

The objective in this post is to write a method that returns a Map with a translation from an Integer to a String. The Map shall be Unmodifiable, because we might want to reuse the Map in several places in our code and we want to be sure that is has not been tampered with. The Map shall contain the mapping pairs (or Entries as they are called in Java): 0->"zero", 1->"one", 2->"two", ... , 12->"twelve".

The Imperative Way

The straight forward solution is to declare a Map and then just put() entries into the Map. After that is done, we return an Unmodifiable view of the newly created map like this:
protected static Map<Integer, String> imperative() {
        final Map<Integer, String> numMap = new HashMap<>();
        numMap.put(0, "zero");
        numMap.put(1, "one");
        numMap.put(2, "two");
        numMap.put(3, "three");
        numMap.put(4, "four");
        numMap.put(5, "five");
        numMap.put(6, "six");
        numMap.put(7, "seven");
        numMap.put(8, "eight");
        numMap.put(9, "nine");
        numMap.put(10, "ten");
        numMap.put(11, "eleven");
        numMap.put(12, "twelve");
        return Collections.unmodifiableMap(numMap);
    }
In this solution, as opposed to the other solutions that are shown later in this post, we have to declare an intermediate result variable (numMap). We then have to reference this result variable over and over again for each put() operation , which is disturbing. There is a risk that this result variable can "leak", effectively rendering the Unmodifiable map modifiable, because the Collections.unmodifiableMap() method just provides a view of the provided Map. So if we retain a reference to the provided Map, we can still change it.

Double Brace Initialization

The Double Brace Initialization Idiom firsts appears as very appealing. In the beginning, I liked it and started to use it frequently in my code. This is how it works:
    @SuppressWarnings("serial")
    protected static Map<Integer, String> doubleBracket() {

        return Collections.unmodifiableMap(new HashMap<Integer, String>() {
            {
                put(0, "zero");
                put(1, "one");
                put(2, "two");
                put(3, "three");
                put(4, "four");
                put(5, "five");
                put(6, "six");
                put(7, "seven");
                put(8, "eight");
                put(9, "nine");
                put(10, "ten");
                put(11, "eleven");
                put(12, "twelve");
            }
        });
    }
There is actually no magic with it at all. The first pair of braces will create an anonymous class and the second pair of braces will create an instance initializer block that is run when the anonymous inner class is instantiated. In any class, you can have such brackets containing code that will be run when your create instances of your class (or when the class is used the first time, if you precede the brackets with the keyword static). This looks much better because now you do not have to create an intermediate result and you do not have to reference a variable each time you call put().

However, there are a number of snags with this scheme that is not immediately apparent. As previously stated, when you declare a class with double braces, an anonymous class will be created. Because the anonymous class is not static, it will also hold a reference (this$0) to its containing class (if defined within another class, which is the most normal case). This means that the containing instance (this$0) can not be garbage-collected as long as your Map is alive. A big concern according to my view.

Another disadvantage is that you must provide the types of the Map explicitly, because the compiler is not able to infer the types using the diamond (<>) operator. Also, the new anonymous class does not define a serialVersionUID variable, so we must suppress the resulting warning using the @SuppressWarnings("serial") annotation if you want it to compile nicely. You should really think twice before you use the Double Brace Idiom.

The Builder Pattern

This is one of my favorites and you can read more about this pattern in my previous posts "Creating Objects Using The Builder Pattern" and "The Interface Builder Pattern". The idea here is to create a Builder that has methods that returns the Builder itself again and again (like the StringBuilder does). This way the Builder can be called several time until the final build() method is called and the result is returned. I have created builders that can be used to create instance of Map and ConcurrentMap and placed them in a utility class called Maps:
public class Maps {

    private Maps() {
    }

    public static <K, V> MapBuilder<K, V> builder() {
        return builder(HashMap::new);
    }

    public static <K, V> MapBuilder<K, V> builder(Supplier<Map<K, V>> mapSupplier) {
        return new MapBuilder<>(mapSupplier.get());
    }

    public static <K, V> ConcurrentMapBuilder<K, V> concurrentBuilder() {
        return concurrentBuilder(ConcurrentHashMap::new);
    }

    public static <K, V> ConcurrentMapBuilder<K, V> concurrentBuilder(Supplier<ConcurrentMap<K, V>> mapSupplier) {
        return new ConcurrentMapBuilder<>(mapSupplier.get());
    }

    private static class BaseBuilder<M extends Map<K, V>, K, V> {

        protected final M map;

        public BaseBuilder(M map) {
            this.map = map;
        }

        public BaseBuilder<M, K, V> put(K key, V value) {
            map.put(key, value);
            return this;
        }

        public M build() {
            return map;
        }

    }

    public static class MapBuilder<K, V> extends BaseBuilder<Map<K, V>, K, V> {

        private boolean unmodifiable;

        public MapBuilder(Map<K, V> map) {
            super(map);
        }

        @Override
        public MapBuilder<K, V> put(K key, V value) {
            super.put(key, value);
            return this;
        }

        public MapBuilder<K, V> unmodifiable(boolean unmodifiable) {
            this.unmodifiable = unmodifiable;
            return this;
        }

        @Override
        public Map<K, V> build() {
            if (unmodifiable) {
                return Collections.unmodifiableMap(super.build());
            } else {
                return super.build();
            }
        }

    }

    public static class ConcurrentMapBuilder<K, V> extends BaseBuilder<ConcurrentMap<K, V>, K, V> {

        public ConcurrentMapBuilder(ConcurrentMap<K, V> map) {
            super(map);
        }

        @Override
        public ConcurrentMapBuilder<K, V> put(K key, V value) {
            super.put(key, value);
            return this;
        }

    }

}
For the sake of completeness, I have also included methods to create concurrent maps and also two builder()  methods that takes a map Provider, allowing any type of Map to be created using the Builder. Using the support methods, we can now create maps easily like this:

    protected static Map<Integer, String> builderPattern() {
        return Maps.<Integer, String>builder().
                put(0, "zero").
                put(1, "one").
                put(2, "two").
                put(3, "three").
                put(4, "four").
                put(5, "five").
                put(6, "six").
                put(7, "seven").
                put(8, "eight").
                put(9, "nine").
                put(10, "ten").
                put(11, "eleven").
                put(12, "twelve").
                unmodifiable(true).
                build();
    }

Although we must enter the generic types of the Map (i.e. <Integer, String>) it looks much better. We can avoid all the other disadvantages that the Double Brace Idiom had. Note the Builder method unmodifiable() that sets a flag, controlling if the build() method shall return a normal Map or an unmodifiable Map. Very convenient! Another advantage is that this pattern works with older Java versions and that no extra objects are created during map creation.

The Java 8 Way

Java 8 has a number of features that can be used to create and initialize maps. Take a look at this method:      

    protected static Map<Integer, String> stdJava8() {

        return Collections.unmodifiableMap(Stream.of(
                new SimpleEntry<>(0, "zero"),
                new SimpleEntry<>(1, "one"),
                new SimpleEntry<>(2, "two"),
                new SimpleEntry<>(3, "three"),
                new SimpleEntry<>(4, "four"),
                new SimpleEntry<>(5, "five"),
                new SimpleEntry<>(6, "six"),
                new SimpleEntry<>(7, "seven"),
                new SimpleEntry<>(8, "eight"),
                new SimpleEntry<>(9, "nine"),
                new SimpleEntry<>(10, "ten"),
                new SimpleEntry<>(11, "eleven"),
                new SimpleEntry<>(12, "twelve"))
                .collect(Collectors.toMap((e) -> e.getKey(), (e) -> e.getValue())));
    }

Here we create a Stream of map entries. At least two implementations of Entry already exists in the standard Java libraries and here I have used SimpleEntry. After the Stream is constructed, we collect all the entries and creates a Map by splitting up each Entry in a key and a value. It is important to include the diamond operator for each SimpleEntry or else the toMap() method will not be able to infer the types of the Entry.

As can be seen, many intermediate objects need to be created before the final Map can be constructed. This can be a disadvantage if many instances are created rapidly. One advantage with this method is that it is using a standard Java 8 stream pattern, so it is very easy to start out with this pattern and then modify it to do something else.

The Java 8 Way Simplified  

We can further simplify the process of creating maps the Java 8 way, by adding a small number of support methods to our Maps class like this:      

    public static <K, V> Map.Entry<K, V> entry(K key, V value) {
        return new AbstractMap.SimpleEntry<>(key, value);
    }

    public static <K, U> Collector<Map.Entry<K, U>, ?, Map<K, U>> entriesToMap() {
        return Collectors.toMap((e) -> e.getKey(), (e) -> e.getValue());
    }

    public static <K, U> Collector<Map.Entry<K, U>, ?, ConcurrentMap<K, U>> entriesToConcurrentMap() {
        return Collectors.toConcurrentMap((e) -> e.getKey(), (e) -> e.getValue());
    }



 Now we can create the desired Map using this method:  

    protected static Map<Integer, String> extJava8() {
        return Collections.unmodifiableMap(Stream.of(
                entry(0, "zero"),
                entry(1, "one"),
                entry(2, "two"),
                entry(3, "three"),
                entry(4, "four"),
                entry(5, "five"),
                entry(6, "six"),
                entry(7, "seven"),
                entry(8, "eight"),
                entry(9, "nine"),
                entry(10, "ten"),
                entry(11, "eleven"),
                entry(12, "twelve")).
                collect(entriesToMap()));
    }

Do not forget to import the support methods statically. Now, this looks much nicer than the previous attempt because is is much shorter and does not contain so much boiler plate code.

Conclusion

There are many ways to create and initialize a Map. The choice is really up to you. Personally, I like the Builder Pattern best because it is very efficient, flexible and straight forward to use. The Java 8 pattern is also nice. A final warning on using the Double Brace Idiom should also be made.

How do we do the actual translation of text using the Map we have created using any of the various methods in this post? One nice way of doing it is to use Java 8's Map method getOrDefault() like this:

     public static String toText(Map<Integer, String> map, int val) {
         return map.getOrDefault(val, Integer.toString(val));
     }
    
     final Map<Integer, String> map = imperative();
     System.out.println("There are " + toText(map, 2) + " birds and " + toText(map, 21) + " flowers.");


   "There are two birds and 21 flowers."

UPDATE: Learn how you can implement a type safe dynamic Map builder in this newer post.

22 comments:

  1. Nice post! Just to expand on the double-braced pattern a bit, for the perf-wary:

    Because the double-braced pattern makes N subclasses of, in this case, HashMap, it becomes more difficult for JITs to make fast code. For example, imagine a function that takes an argument of type Map, and performs a fair number of operations on said Map. If you always pass in a HashMap (not a subclass of HashMap), modern JITs will likely notice and make the code faster if you pass in an argument where arg.getClass() == HashMap.class*. OTOH, if you pass in N classes of different concrete types (all HashMaps, just brace initialized), your code becomes far more expensive for JITs to optimize, so it's less likely for them to do so. What does this mean for you? Calls to any Map function become linear time, instead of constant**.

    Naturally, there are issues of cache efficiency as well, because instead of having one vtable (I forget the exact Java terminology for them), you have N.

    Also, there are a few comments here (substantiated with benchmarks, though I'm not 100% confident in them):
    https://stackoverflow.com/questions/924285/efficiency-of-java-double-brace-initialization

    * - They can do this through generating a fast path or using PICs in places where interface dispatch would otherwise be used.
    ** - https://stackoverflow.com/questions/4423968/how-are-java-interfaces-implemented-internally-vtables

    Caveat: If it's possible for a JIT to recognize this pattern and *just* create a HashMap instance, then ignore everything I just said; you get more convenient syntax for free. However, I'm rusty with the Java lang spec, so I'm not sure that such an optimization would be allowed in the first place.

    ReplyDelete
  2. > Because the anonymous class is not static

    It actually *is* static in your case, because it is created inside of a static method, so there is no enclosing instance.

    ReplyDelete
  3. @SuppressWarnings("unchecked")
    public static HashMap newMap(Object... elems) {
    HashMap m = new HashMap();

    for (int i = 0; i < elems.length; i += 2) {
    m.put((K) elems[i], (V) elems[i + 1]);
    }

    return m;
    }

    Map m = newMap("k1", 1, "k2", 2);

    ReplyDelete
    Replies
    1. Unfortunately, this solution will not provide any type safe way of entering data into a Map and the Map itself is just a raw type. There is nothing that prevents or warns one from calling Map m = newMap("k1", new Thread(), 1, new Long(2)) for example.

      Delete
  4. Regarding the HotSpot performance, I’d expect the optimizer to detect that all these subclasses, created via double-brace initialization, do not override the invoked method(s), thus the optimizations are still possible. However, double brace initialization creates a lot of class files taking up disk-space, needing to be loaded, analyzed/verified and residing in the memory during the application’s lifetime. Not a justified trade-off for a minor source code improvement (/if/ you consider code with four additional curly braces an improvement over the ordinary use of a local variable)…

    ReplyDelete
  5. I have used com.google.common.collect.ImmutableMap. Where does that fit in here?

    ReplyDelete
    Replies
    1. The built in Collections.unmodifiableMap(java.util.Map) is a *view* of an underlying map which can still change under the hood (if you retain a reference to it) whereas Google's ImmutableMap has its own backing implementation that will never change. So the latter is truly immutable. You need, of course, to include Guava to be able to use it.

      Delete
  6. [...]
    ImmutableMap.of(
    OPIAttributeNames.EXTERNAL_ID_SUBSCRIPTION_PRODUCT, "Helios",
    OPIAttributeNames.EXTERNAL_ID_SUBSCRIPTION_TYPE, "continuous",
    OPIAttributeNames.EXTERNAL_ID_SUBSCRIPTION_EXTERNAL_REFERENCE, "SUBEXTREF_NothingNothing")
    [...]

    Cheers, Girish

    ReplyDelete
  7. C'mon Java, just give us a language construct! This is the 21st century for goodness sake.

    ReplyDelete
    Replies
    1. Java 9 will have that. You could do like Map.of(1,"Alpha",2,"Beta");

      Delete
  8. In Groovy , Just type:

    def simpleMap = [key1: value1, key2: value2]

    And that's all!
    Groovy rocks! \m/

    ReplyDelete
  9. Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)

    ReplyDelete
  10. Another disadvantage is that you must provide the types of the Map implicitly

    should be

    Another disadvantage is that you must provide the types of the Map **explicitly**

    ReplyDelete
  11. You are right Andrew. I have updated the post accordingly. Thank you!

    ReplyDelete
  12. And the good old simple way:

    Object[] map = new Object[]{
        new Object[]{"key", "value"},
        new Object[]{"key", "value"},
        new Object[]{"key", "value"},
    };

    ReplyDelete
  13. Guava has a builder as well: http://stackoverflow.com/a/17374339/435605

    ReplyDelete
  14. Map data = Arrays.stream(
    new Object[][]{
    {"key1", "1"},
    {"key2", 2},
    {"key3",new LinkedList()}}
    ).collect(Collectors.toMap(e -> (String) e[0], e -> e[1]));

    ReplyDelete
  15. Why does this does not compile?!

    class A {}

    final A a = new A();
    Map map = Stream.of(entry("A", a)).collect(entriesToMap());

    But it compiles with casting the entry value to Object like this:

    Map map = Stream.of(entry("A", (Object)a)).collect(entriesToMap());

    ReplyDelete
    Replies
    1. I am unable to reproduce the error you describe. This woks for me:


      public class NewMain {

      public static <K, V> Map.Entry<K, V> entry(K key, V value) {
      return new AbstractMap.SimpleEntry<>(key, value);
      }

      public static <K, U> Collector<Map.Entry<K, U>, ?, Map<K, U>> entriesToMap() {
      return Collectors.toMap((e) -> e.getKey(), (e) -> e.getValue());
      }

      public static <K, U> Collector<Map.Entry<K, U>, ?, ConcurrentMap<K, U>> entriesToConcurrentMap() {
      return Collectors.toConcurrentMap((e) -> e.getKey(), (e) -> e.getValue());
      }

      static class A {};

      /**
      * @param args the command line arguments
      */
      public static void main(String[] args) {
      final A a = new A();
      Map<String, A> map = Stream.of(entry("A", a)).collect(entriesToMap());
      }

      }


      Delete
    2. Thanks for your reply and I saw my mistake!
      It does not compile because my Map is a Map and therefore I have to cast the value to Object!

      Delete
    3. Follow up to my reply: my map uses key=String and value=Object (Map<String,Object>), and therefore I need the cast!

      Delete