Minborg

Minborg
Minborg

Wednesday, June 13, 2018

Go Full Stack with Java in a Jiffy

Here is a look at how you can write a full stack database web application without using SQL, HQL, PHP, ASP, HTML, CSS or Javascript and instead relying purely on Java using Vaadin’s UI layer and Speedment Stream ORM.

Ever wanted to quickly create a web application connected to your existing database or build a professional application with short time-to-market requirements? The Java Stream API has unleashed the possibility to write database queries in pure Java.

In this article, we will demonstrate how fast and easy this can be done by leveraging two Java frameworks; Vaadin and Speedment. Because they both use Java Streams, it easy to connect them together. This means we will end up with a short, concise and type-safe application.

For this mini-project, we will use the My SQL sample database named "Employees" which provides approximately 160MB of data spread over six separate tables and comprising 4 million records.

The full application code is available at GitHub and you can clone this repository if you want to run the application in your own environment. You will also need trial licenses from both Vaadin and Speedment to use the features used in this article. These are available for free.


The intended end result is a web application where it is possible to analyze gender balance and salary distribution among different departments. The result is displayed graphically, using pure standard Vaadin Charts Java components as depicted in the video below:




Setting Up the Data Model

We are using Speedment Stream ORM to access the database. It is easy to set up any project using the Speedment initializer. Speedment can generate Java classes directly from the database’s schema data. After generation, we can create our Speedment instance like this:

Speedment speedment = new EmployeesApplicationBuilder()
                .withUsername("...") // Username need to match database
                .withPassword("...") // Password need to match database
                .build();

Create a Dropdown for Departments

In our web application, we want to have a drop-down list of all departments. It is easy to retrieve the departments from the database as can be seen in this method:

public Stream<Departments> departments() {
    DepartmentsManager depts = speedment.getOrThrow(DepartmentsManager.class);
    return depts.stream();
}

Joining Departments and Employees Together

Now we are going to create a join relation between Departments and Employees. In the database, there is a many-to-many relation table that connects these tables together named DeptEmpl.
First, we create a custom tuple class that will hold our three entries from the joined tables:

public final class DeptEmplEmployeesSalaries {

    private final DeptEmp deptEmp;
    private final Employees employees;
    private final Salaries salaries;

    public DeptEmplEmployeesSalaries(
        DeptEmp deptEmp, 
        Employees employees, 
        Salaries salaries
    ) {
        this.deptEmp = requireNonNull(deptEmp);
        this.employees = requireNonNull(employees);
        this.salaries = requireNonNull(salaries);
    }

    public DeptEmp deptEmp() { return deptEmp; }
    
    public Employees employees() { return employees; }

    public Salaries salaries() { return salaries; }

    public static TupleGetter0 deptEmpGetter() {
            return DeptEmplEmployeesSalaries::deptEmp;
    }

    public static TupleGetter1 employeesGetter() {
            return DeptEmplEmployeesSalaries::employees;
    }

    public static TupleGetter2 salariesGetter() {
            return DeptEmplEmployeesSalaries::salaries;
    }

}
The DeptEmplEmployeesSalaries is simply an immutable holder of the three entities, except it has three additional “getter” methods that can be applied to extract the individual entities. Note that they return TupleGetter, which allows joins and aggregations to use optimized versions compared to just using an anonymous lambda or method reference.

Now that we have the custom tuple, we can easily define our Join relation:

   private Join joinDeptEmpSal(Departments dept) {
        // The JoinComponent is needed when creating joins
        JoinComponent jc = speedment.getOrThrow(JoinComponent.class);

        return jc.from(DeptEmpManager.IDENTIFIER)
                    // Only include data from the selected department
                    .where(DeptEmp.DEPT_NO.equal(dept.getDeptNo()))

                // Join in Employees with Employees.EMP_NO equal DeptEmp.EMP_NO
                .innerJoinOn(Employees.EMP_NO).equal(DeptEmp.EMP_NO)

                // Join Salaries with Salaries.EMP_NO) equal Employees.EMP_NO
                .innerJoinOn(Salaries.EMP_NO).equal(Employees.EMP_NO)
                      // Filter out historic salary data
                     .where(Salaries.TO_DATE.greaterOrEqual(currentDate))

                .build(DeptEmplEmployeesSalaries::new);
    }
When we are building our Join expression, we start off by first using the DeptEmp table (as we recall, this is the many-to-many relation table between Departments and Employees). For this table, we apply a where() statement so that we are able to filter out only those many-to-many relation that belongs to the department we want to appear in the join.

Next, we join in the Employees table and specify a join relation where newly joined table’s column Employees.EMP_NO equal DeptEmp.EMP_NO.

After that, we join in the Salaries table and specify another join relation where Salaries.EMP_NO equal Employees.EMP_NO. For this particular join relation, we also apply a where() statement so that we filter out salaries that are current (and not historic, past salaries for an employee).

Finally, we call the build() method and defines the constructor of our DeptEmplEmployeesSalaries class that holds the three entities DeptEmp, Employees, and Salaries.

Counting the Number of Employees for a Department

Armed with the join method above, it is very easy to count the number of Employees for a certain department in the Join stream. This is how we can go about:

public long countEmployees(Departments department) {
    return joinDeptEmpSal(department)
               .stream()
               .count();
}

Calculating a Salary Distribution Aggregation

By using the built-in Speedment Aggregator, we can express aggregations quite easily. The Aggregator can consume regular Java Collections, Java Streams from a single table as well as Join Streams without constructing intermediary Java objects on the heap. This is because it stores all its data structures completely off-heap.

We first start with creating a “result object” in the form of a simple POJO that is going to be used as a bridge between the completed off-heap aggregation and the Java heap world:

public class GenderIntervalFrequency {

    private Employees.Gender gender;
    private int interval;
    private long frequency;

    private void setGender(Employees.Gender gender) { this.gender = gender; }

    private void setInterval(int interval) { this.interval = interval; }

    private void setFrequency(long frequency) { this.frequency = frequency;}

    private Employees.Gender getGender() { return gender; }

    private int getInterval() { return interval; }
        
    private long getFrequency() { return frequency; }

}
Now that we have the POJO, we are able to build a method that returns an Aggregation like this:

public Aggregation freqAggregation(Departments dept) {

    Aggregator aggregator =

        // Provide a constructor for the "result object"
        Aggregator.builder(GenderIntervalFrequency::new)

            // Create a key on Gender
            .firstOn(DeptEmplEmployeesSalaries.employeesGetter())
            .andThen(Employees.GENDER)
            .key(GenderIntervalFrequency::setGender)

            // Create a key on salary divided by 1,000 as an integer
            .firstOn(DeptEmplEmployeesSalaries.salariesGetter())
            .andThen(Salaries.SALARY.divide(SALARY_BUCKET_SIZE).asInt())
            .key(GenderIntervalFrequency::setInterval)

            // For each unique set of keys, count the number of entitites
            .count(GenderIntervalFrequency::setFrequency)
            .build();


    return joinDeptEmpSal(dept)
        .stream()
        .parallel()
        .collect(aggregator.createCollector());

}
This requires a bit of explanation. When we invoke the Aggregator.builder() method, we provide a constructor of the “result object” that we are using as a bridge between the off-heap and the on-heap world.

After we have a builder, we can start defining our aggregation and usually the clearest way is to start off with the keys (i.e. groups) that we are going to use in the aggregation. When we are aggregating results for a Join operation, we first need to specify which entity we want to extract our key from. In this case, we want to use the employee’s gender so we invoke .firstOn(eptEmplEmployeesSalaries.employeesGetter()) which will extract the Employees entity from the tuple. Then we apply .andThen(Employees.GENDER) which, in turn, will extract the gender property from theEmployees entity. The key() method takes a method reference for a method that is going to be called once we want to actually read the result of the aggregation.

The second key is specified in much the same way, only here we apply the .firstOn(DeptEmplEmployeesSalaries.salariesGetter()) method to extract the Salaries entity instead of the Employees entity. When we then apply the .andThen() method we are using an expression to convert the salary so it is divided by 1,000 and seen as an integer. This will create separate income brackets for every thousand dollars in salary.

The count() operator simply says that we want to count the occurrence of each key pair. So, if there are two males that have an income in the 57 bracket (i.e. a salary between 57,000 and 57,999) the count operation will count those two for those keys.

Finally, in the line starting with return, the actual computation of the aggregation will take place whereby the application will aggregate all the thousands of salaries in parallel and return an Aggregation for all the income data in the database. An Aggregation can be thought of as a kind of List with all the keys and values, only that the data is stored off-heap.

Adding In-JVM-Memory Acceleration

By just adding two lines to our application, we can get a high-performance application with in-JVM-memory acceleration.

Speedment speedment = new EmployeesApplicationBuilder()
        .withUsername("...") // Username need to match database
        .withPassword("...") // Password need to match database
        .withBundle(InMemoryBundle.class) // Add in-JVM-acceleration
        .build();

        // Load a snapshot of the database into off-heap JVM-memoory   
        speedment.get(DataStoreComponent.class)
            .ifPresent(DataStoreComponent::load);

The InMemoryBundle allows the entire database to be pulled in to the JVM using off-heap memory and then allows Streams and Joins to be executed directly from RAM instead of using the database. This will improve performance and will make the Java application work more deterministically. Having data off-heap also means that data will not affect Java Garbage Collect allowing huge JVMs to be used with no GC impact.

Thanks to the In-memory acceleration, even the biggest department with over 60,000 salaries will be computed in less than 100 ms on my laptop. This will ensure that our UI stays responsive.

Building the UI in Java

Now that the data model is finished, we move on to the visual aspects of the application. This is as mentioned earlier done utilizing Vaadin, a framework which allows implementation of HTML5 web user interfaces using Java. The Vaadin framework is built on the notion of components, which could be a layout, a button or anything in between. The components are modeled as objects which can be customized and styled in an abundance of ways.

The image above describes the structure of the GUI we intend to build for our DataModel. It constitutes of nine components, out of which five read information from the database and present it to the user while the rest are static. Without further ado, let’s start configuring the UI.

A sketch showing the hierarchy of the components included in our GUI.


The Vaadin UI Layer

To integrate Vaadin in the application, we downloaded a starter pack from Vaadin to set up a simple project base. This will automatically generate a UI class which is the base of any Vaadin application.

@Theme("mytheme")
public class EmployeeUI extends UI {

    @Override // Called by the server when the application starts
    protected void init(VaadinRequest vaadinRequest) { }

    // Standard Vaadin servlet which was not modified 
    @WebServlet(urlPatterns = "/*", name = "MyUIServlet", asyncSupported = true)
    @VaadinServletConfiguration(ui = EmployeeUI.class, productionMode = false)
    public static class MyUIServlet extends VaadinServlet { }
}

The overridden init() is called from the server when the application is started, hence this is where we soon will state what actions are to be performed when the application is running. EmployeeUI also contains MyUIServlet, which is a standard servlet class used for deployment. No modification was needed for the sake of this application.

Creation of Components

As mentioned above, all of our components will be declared in init(). This is not suggested as a best practice but works well for an application with a small scope. Although, we would like to collectively update the majority of the components from a separate method when a new department is selected, meaning those will be declared as instance variables along the way.

Application Title

We start off simple by creating a Label for the title. Since its value will not change, it can be locally
declared.

Label appTitle = new Label("Employee Application");
appTitle.setStyleName("h2");

In addition to a value, we give it a style name. Style names allow full control of the appearance of the component. In this case, we use the built-in Vaadin Valo Theme and select a header styling simply by setting the parameter to “h2”. This style name can also be used to target the component with custom CSS (for example .h2 { font-family: ‘Times New Roman; }).

Text Fields

To view the number of employees and the average salary for the selected department, we use the TextField component. TextField is mainly used for user text input, although by setting it to read-only, we prohibit any user interaction. Notice how two style name can be used by separating them with a blank space.


noOfEmployees = new TextField("Number of employees"); // Instance variable
noOfEmployees.setReadOnly(true);
// Multiple style names are separated with a blank space 
noOfEmployees.setStyleName("huge borderless"); 

This code is duplicated for the averageSalary TextField although with a different caption and variable name.

Charts

Charts can easily be created with the Vaadin Charts addon, and just like any other component, a chart Java Object with corresponding properties. For this application, we used the COLUMN chart to view gender balance and an AREASPLINE for the salary distribution.



/* Column chart to view balance between female and male employees at a certain department */
genderChart = new Chart(ChartType.COLUMN);
Configuration genderChartConfig = genderChart.getConfiguration();
genderChartConfig.setTitle("Gender Balance");

// 0 is only used as an init value, chart is populated with data in updateUI() 
maleCount = new ListSeries("Male", 0);
femaleCount = new ListSeries("Female", 0);
genderChartConfig.setSeries(maleCount, femaleCount);

XAxis x1 = new XAxis();
x1.setCategories("Gender");
genderChartConfig.addxAxis(x1);

YAxis y1 = new YAxis();
y1.setTitle("Number of employees");
genderChartConfig.addyAxis(y1);
Most of the properties associated with a chart are controlled by its configuration which is retrieved with getConfiguration(). This is then used to add a chart title, two data series, and the axis properties. For the genderChart, a simple ListSeries was used to hold the data because of its simple nature. Although for the salaryChart below, a DataSeries was chosen since it handles a larger and more complicated data sets.

The declaration of the salaryChart is very similar to that of the genderChart. Likewise, the configuration is retrieved and used to add a title and axes.

salaryChart = new Chart(ChartType.AREASPLINE);
Since both charts display data for male and females we decide to use a shared legend that we fix in the upper right corner of the salaryChart.

/* Legend settings */
Legend legend = salaryChartConfig.getLegend();
legend.setLayout(LayoutDirection.VERTICAL);
legend.setAlign(HorizontalAlign.RIGHT);
legend.setVerticalAlign(VerticalAlign.TOP);
legend.setX(-50);
legend.setY(50);
legend.setFloating(true);
Lastly, we add two empty DataSeries which will be populated with data at a later stage.

// Instance variables to allow update from UpdateUI() 
maleSalaryData = new DataSeries("Male"); 
femaleSalaryData = new DataSeries("Female");
salaryChartConfig.setSeries(maleSalaryData, femaleSalaryData);

Department Selector

The final piece is the department selector which controls the rest of the application.



/* Native Select component to enable selection of Department */
NativeSelect<Departments> selectDepartment = new NativeSelect<>("Select department");
selectDepartment.setItems(DataModel.departments());
selectDepartment.setItemCaptionGenerator(Departments::getDeptName);
selectDepartment.setEmptySelectionAllowed(false);


We implement it as a NativeSelect<T> component that calls departments(), which was previously defined in DataModel, to retrieve a Stream of Departments from the database. Next, we specify what property of Department to display in the dropdown list (default is toString()).

Since we do not allow empty selections, we set the defaultDept to the first element of the Department Stream. Note that the defaultDept is stored as a variable for later use.

/* Default department to use when starting application */
final Departments defaultDept = DataModel.departments().findFirst().orElseThrow(NoSuchElementException::new);
selectDepartment.setSelectedItem(defaultDept);

Adding the Components to the UI

So far we have only declared the components without adding them to the actual canvas. To be displayed in the application they all need to be added to the UI. This is usually done by attaching them to a Layout. Layouts are used to create a structured hierarchy and can be nested into one and other.

HorizontalLayout contents = new HorizontalLayout();
contents.setSizeFull();

VerticalLayout menu = new VerticalLayout();
menu.setWidth(350, Unit.PIXELS);

VerticalLayout body = new VerticalLayout();
body.setSizeFull();
As revealed in the code above, three layouts were used for this purpose, one horizontal and two vertical. Once the layouts are defined we can add the components.

menu.addComponents(appTitle, selectDepartment, noOfEmployees, averageSalary);
body.addComponents(genderChart, salaryChart);
contents.addComponent(menu);
// Body fills the area to the right of the menu
contents.addComponentsAndExpand(body); 
// Adds contents to the UI 
setContent(contents);
Components appear in the UI in the order they are added. For a VerticalLayout such as the menu, this means from top to bottom. Notice how the HorizontalLayout contents hold the two VerticalLayouts, placing them next to each other. This is necessary because the UI itself can hold only one component, namely contents which holds all components as one unit.

Reflecting the DataModel in the UI

Now that all visuals are in place, it is time to let them reflect the database content. This means we need to add values to the components by retrieving information from the DataModel. Bridging between our data model and EmployeeUI will be done by handling events from selectDepartment. This is accomplished by adding a selection listener as follows in init():

selectDepartment.addSelectionListener(e ->
    updateUI(e.getSelectedItem().orElseThrow()) 
);
Since updateUI() was not yet defined, that is our next task.

private void updateUI(Departments dept) { }
Here is a quick reminder of what we want updateUI() to accomplish: When a new department is selected we want to calculate and display the total number of employees, the number of males and females, the total average salary and the salary distribution for males and females for that department.

Conveniently enough, we designed our DataModel with this in mind, making it easy to collect the information from the database.

We start with the values of the text fields:

final Map<Employees.Gender, Long> counts = DataModel.countEmployees(dept);

noOfEmployees.setValue(String.format("%,d", counts.values().stream().mapToLong(l -> l).sum()));

averageSalary.setValue(String.format("$%,d", DataModel.averageSalary(dept).intValue()));
The sum of the males and females gives the total number of employees. averageSalary() returns a Double which is cast to an int. Both values are formatted as a String before being passed to the text fields.

We can also use the Map counts to populate the first graph by retrieving the separate counts for male and female.

final List<DataSeriesItem> maleSalaries = new ArrayList<>();
final List<DataSeriesItem> femaleSalaries = new ArrayList<>();
   
DataModel.freqAggregation(dept)
   .streamAndClose()
   .forEach(agg -> {
       (agg.getGender() == Gender.F ? femaleSalaries : maleSalaries)
           .add(new DataSeriesItem(agg.getInterval() * 1_000, agg.getFrequency()));
   });
Our DataModel provides an Aggregation which we can think of as a list containing tuples of a gender, a salary and a corresponding salary frequency (how many persons share that salary). By streaming over the Aggregation we can separate male and female data in two Lists containing DataSeriesItems. A DataSeriesItem is in this case used like a point with an x- and y-value.

Comparator<DataSeriesItem> comparator = Comparator.comparingDouble((DataSeriesItem dsi) -> dsi.getX().doubleValue());

maleSalaries.sort(comparator);
femaleSalaries.sort(comparator);
Before adding the data to the chart, we sort it in rising order of the x-values, otherwise, the graph will look very chaotic. Now our two sorted List<DataSeriesItem> will fit perfectly with the DataSeries of salaryChart.

//Updates salaryChart 
maleSalaryData.setData(maleSalaries);
femaleSalaryData.setData(femaleSalaries);
salaryChart.drawChart();
Since we are changing the whole data set rather than just a single point, we set the data for our DataSeries to the Lists of x and ys we just created. Unlike a change in a ListSeries, this will not trigger an update of the chart, meaning we have to force a manual update with drawChart().

Lastly, we need to fill the components with default values when the application starts. This can now be done by calling updateUI(defaultDept) at the end of init().

Styling in Java

Vaadin offers complete freedom when it comes to adding a personal feel to components. Since this is a pure Java application only the styling options available in their Java framework were used, although CSS styling will naturally give total control of the visuals.

A comparison before and after applying the ChartTheme.


To give our charts a personal touch we created a class ChartTheme which extends Theme. In the constructor, we defined what properties we would like to change, namely the color of the data series, background, legend, and text.

public class ChartTheme extends Theme {
   public ChartTheme() {
       Color[] colors = new Color[2];
       colors[0] = new SolidColor("#5abf95"); // Light green
       colors[1] = new SolidColor("#fce390"); // Yellow
       setColors(colors);

       getChart().setBackgroundColor(new SolidColor("#3C474C"));
       getLegend().setBackgroundColor(new SolidColor("#ffffff"));

       Style textStyle = new Style();
       textStyle.setColor(new SolidColor("#ffffff")); // White text
       setTitle(textStyle);
   }
}
Then theme was applied to all charts by adding this row to init():

ChartOptions.get().setTheme(new ChartTheme());

Conclusion

We have used Speedment to interface the database and Vaadin to interface the end user. The only code needed in between is just a few Java Streams constructs that declaratively describe the application logic, which grants minimal time to market and cost of maintenance.

Feel free to fork this repo from GitHub and start experimenting on your own.

Authors

Julia Gustafsson
Per Minborg

Infinite Sets in Java 9

A Set

A Set is a collection of elements whereby any given element in the Set only appears once. More formally, a set contains no pair of elements e1 and e2 such that e1.equals(e2). We can easily create Set in Java 9 like this:
and use a

final Set<Integer> s = Set.of(1, 2, 3);
System.out.println(s);

This might produce the following output:
[2, 3, 1]

The Set produced above is immutable, i.e. it cannot change and it is also finite because there is a distinct number of elements in the Set, namely three. The order in which the elements are returned via its read methods (such as stream(), iterator() and forEach()) is unspecified.

An Infinite Set

An infinite set contains an unlimited number of elements. One example of an infinite set is the set of all integers [..., -1, 0, 1, 2, ...] where an integer is not of a Java Integer class but an integer according to the mathematical definition of an integer whereby there is always a larger integer n+1 for any given integer n.

There are many infinite sets such as the set of all primes, the set of even integers, the set of fibonacci numbers etc.

For obvious reasons, we cannot precompute and store all the elements of an infinite Java Set. If we try, we would eventually run out of memory.

A fundamental question we have to ask ourselves is: Are there actually infinite sets for the Java types we have? If we have a Set<Byte> there are at most 256 elements in the Set and that is far from infinite, same reasoning goes for Short and even Integer. After all, there are only about four billion different Integer objects and if we would use a bit-map to represent membership, we could fit a Set<Integer> in just 0.5 GB. Albeit big, is not infinite.

But if we are talking about Long or String elements, we are approaching at least virtually infinite sets. To store a bitmap of all Longs would require a number of PB of internal storage. A true infinite Set would be a Set of String with all possible combination of characters [a-z] of any length.

Before we continue, I would like to mention that the code in this post is also available on GitHub as described at the very end of the post.

The ImmutableStreamSet

To move away from a paradigm where we store the elements of a Set, we could create an ImmutableStreamSet that defines the elements of the Set only through its stream() method. The ImmutableStreamSet could be defined as a FunctionalInterface like this:

@FunctionalInterface
public interface ImmutableStreamSet<E> extends Set<E> {

    // This is the only method we need to implements
    @Override
    public Stream<E> stream(); 

    @Override
    default int size() {
        return (int) stream().limit(Integer.MAX_VALUE).count();
    }

    @Override
    default boolean contains(Object o) {
        return stream().anyMatch(e -> Objects.equals(e, o));
    }

    @Override
    default boolean containsAll(Collection<?> c) {
        return (this == c) ? true : c.stream().allMatch(this::contains);
    }

    @Override
    default boolean isEmpty() {
        return !stream().findAny().isPresent();
    }

    @Override
    default <T> T[] toArray(T[] a) {
        return stream().collect(toList()).toArray(a);
    }

    @Override
    default Object[] toArray() {
        return stream().toArray();
    }

    @Override
    default Spliterator<E> spliterator() {
        return stream().spliterator();
    }

    @Override
    default Iterator<E> iterator() {
        return stream().iterator();
    }

    @Override
    default Stream<E> parallelStream() {
        return stream().parallel();
    }

    @Override
    default void forEach(Consumer<? super E> action) {
        stream().forEach(action);
    }

    // We are immutable
    @Override
    default boolean removeIf(Predicate<? super E> filter) {
        throw new UnsupportedOperationException();
    }

    @Override
    default void clear() {
        throw new UnsupportedOperationException();
    }

    @Override
    default boolean removeAll(Collection<?> c) {
        throw new UnsupportedOperationException();
    }

    @Override
    default boolean retainAll(Collection<?> c) {
        throw new UnsupportedOperationException();
    }

    @Override
    default boolean addAll(Collection<? extends E> c) {
        throw new UnsupportedOperationException();
    }

    @Override
    default boolean remove(Object o) {
        throw new UnsupportedOperationException();
    }

    @Override
    default boolean add(E e) {
        throw new UnsupportedOperationException();
    }

    static <E> ImmutableStreamSet<E> of(Supplier<Stream<E>> supplier) {
        // Check out GitHub to see how this Impl class is implemented
        return new ImmutableStreamSetImpl<>(supplier);
    }

}

Awesome, now we can create infinite sets by just providing a stream supplier like this:

    ImmutableStreamSet<Long> setOfAllLong
            = LongStream.rangeClosed(Long.MIN_VALUE, Long.MAX_VALUE)::boxed;

This will create a Set of all Long values (e.g. with 2^64 elements).  When providing a stream supplier, it is imperative to make sure to adhere to the Set property of element uniqueness.  Consider the following illegal Set:

final ImmutableStreamSet<Long> illegalSet = 
            () -> Stream.of(1l, 2l, 1l);
Clearly, 11 occurs two times in the set which makes this object violate the Set requirements.

As we will see, it would be better to create concrete classes of the infinite sets we are considering. One particular problem with the default implementation above is that the contains() method might be very slow. Read the next chapters and find out why and how to solve it.

PositiveLongSet

Let us assume that we want to create a Set with all the positive long values and that we want to be able to use the Set efficiently with other sets and objects. This is how we can go about:

public final class PositiveLongSet implements ImmutableStreamSet<Long> {

    public static final PositiveLongSet INSTANCE = new PositiveLongSet();

    private PositiveLongSet() {
    }

    @Override
    public Stream<Long> stream() {
        return LongStream.rangeClosed(1, Long.MAX_VALUE).boxed();
    }

    @Override
    public int size() {
        return Integer.MAX_VALUE;
    }

    @Override
    public boolean contains(Object o) {
        return SetUtil.contains(this, Long.class, other -> other > 0, o);
    }

    @Override
    public boolean isEmpty() {
        return false;
    }

    @Override
    public String toString() {
        return SetUtil.toString(this);
    }

}

Note how we comply with the formal requirement in the method size() where we return Integer.MAX_VALUE even though the Set is much larger. If Set had been defined today, it is likely that size() would have returned a long instead of an int. But in the beginning of the 90s, internal RAM was usually less than 1 GB. We are using two utility methods in the class:

The SetUtil.toString() takes a Set, iterates over the first eight elements and returns a String representation of those elements.

The SetUtil.contains() method takes a Set, the Element type class (here Long.class) and a Predicate that is called if the object we are comparing agains is of the given Element type class (if the object we are comparing against is null or of another type, then trivially the Set does not contain the object).

Here is how the SetUtil looks like:

final class SetUtil {

    private static final int TO_STRING_MAX_ELEMENTS = 8;

    static <E> String toString(Set<E> set) {
        final List<String> first = set.stream()
            .limit(TO_STRING_MAX_ELEMENTS + 1)
            .map(Object::toString)
            .collect(toList());

        final String endMarker = first.size() > TO_STRING_MAX_ELEMENTS ? ", ...]" : "]";

        return first.stream()
            .limit(TO_STRING_MAX_ELEMENTS)
            .collect(
                joining(", ", "[", endMarker)
            );
    }

    static <E> boolean contains(
        final Set<E> set,
        final Class<E> clazz,
        final Predicate<E> predicate,
        final Object o
    ) {
        if (o == null) {
            return false;
        }
        if (!(clazz.isAssignableFrom(o.getClass()))) {
            return false;
        }
        final E other = clazz.cast(o);
        return predicate.test(other);
    }

}

Armed with the classes ImmutableStreamSet and SetUtil we can now easily create other infinite sets like PostitiveEvenLongSet (not shown hereunder, try writing it by yourself), PrimeLongSet (containing all primes that can be represented by a Long) and even FibonacciLongSet (containing all fibonacci numbers that can be represented by a Long). Here is how these classes may look like:

PrimeLongSet


public final class PrimeLongSet implements ImmutableStreamSet<Long> {

    public static final PrimeLongSet INSTANCE = new PrimeLongSet();

    private PrimeLongSet() {
    }

    private static final LongPredicate IS_PRIME =
        x -> LongStream.rangeClosed(2, (long) Math.sqrt(x)).allMatch(n -> x % n != 0);

    @Override
    public Stream<Long> stream() {
        return LongStream.rangeClosed(2, Long.MAX_VALUE)
            .filter(IS_PRIME)
            .boxed();
    }

    @Override
    public int size() {
        return Integer.MAX_VALUE; 
    }

    @Override
    public boolean contains(Object o) {
        return SetUtil.contains(this, Long.class, IS_PRIME::test, o);
    }

    @Override
    public boolean isEmpty() {
        return false;
    }

    @Override
    public String toString() {
        return SetUtil.toString(this);
    }

}

FibonacciLongSet


public final class FibonacciLongSet implements ImmutableStreamSet<Long> {

    public static final FibonacciLongSet INSTANCE = new FibonacciLongSet();

    private FibonacciLongSet() {
    }

    @Override
    public Stream<Long> stream() {
        return Stream.concat(
            Stream.of(0l),
            Stream.iterate(new Fibonacci(0, 1), Fibonacci::next)
                .mapToLong(Fibonacci::getAsLong)
                .takeWhile(fib -> fib > 0)
                .boxed()
        );
    }

    @Override
    public int size() {
        return 92;
    }

    @Override
    public boolean contains(Object o) {
        return SetUtil.contains(
            this,
            Long.class,
            other -> stream().anyMatch(fib -> Objects.equals(fib, other)),
            o
        );
    }

    @Override
    public boolean isEmpty() {
        return false;
    }

    @Override
    public String toString() {
        return SetUtil.toString(this);
    }

    private static class Fibonacci {

        final long beforeLast;
        final long last;

        public Fibonacci(long beforeLast, long last) {
            this.beforeLast = beforeLast;
            this.last = last;
        }

        public Fibonacci next() {
            return new Fibonacci(last, last + beforeLast);
        }

        public long getAsLong() {
            return beforeLast + last;
        }

    }

}

Note how we are using Stream::takeWhile to break the stream when long wraps around to a negative value. Arguably, we are "cheating" when we precompute and provide a size of 92 but otherwise size() would have been a bit slower.

Stitching it all up

By providing an interface with static providers to instances of these classes, we can encapsulate our predefined sets and make sure that there are only one instance of them in the JVM like this:
public interface Sets {

    static Set<Long> positiveLongSet() {
        return PositiveLongSet.INSTANCE;
    }

    static Set<Long> positiveEvenLongSet() {
        return PositiveEvenLongSet.INSTANCE;
    }

    static Set<Long> primeLongSet() {
        return PrimeLongSet.INSTANCE;
    }

    static Set<Long> fibonacciLongSet() {
        return FibonacciLongSet.INSTANCE;
    }

}

We could also encapsulate our code in a Java 9 module to make sure only the classes Sets and ImmutableStreamSet are visible by exposing them in the projects top-most package and putting all the other classes in a package named "internal" (which is not exposed). This is how our module-info.java could look like provided that the two exposed classes are in the com.speedment.infinite_sets and the implementation classes in a package like com.speedment.infinite_sets.internal:

module-info.java

module com.speedment.infinite_sets {
    exports com.speedment.infinite_sets;
}

Trying it Out

We can now create another module that is using our infinite sets by first declaring usage of our existing module like this:

module-info.java

module Infinite_sets_app {
    requires com.speedment.infinite_sets;
}


And then we have access to the exposed parts of the module. here is one way of trying out the infinite sets:

import static com.speedment.infinite_sets.Sets.*;

public class Main {

    public static void main(String[] args) {

        Stream.of(
            Set.of(1, 2, 3),
            positiveLongSet(),
            positiveEvenLongSet(),
            primeLongSet(),
            fibonacciLongSet()
        ).forEachOrdered(System.out::println);

        // This actually completes fast due to identity equality
        positiveLongSet().containsAll(positiveLongSet());

    }

}

This might produce the following output:


[3, 2, 1]
[1, 2, 3, 4, 5, 6, 7, 8, ...]
[2, 4, 6, 8, 10, 12, 14, 16, ...]
[2, 3, 5, 7, 11, 13, 17, 19, ...]
[0, 1, 2, 3, 5, 8, 13, 21, ...]


Engage on GitHub


The source code in this post is available on GitHub here.




Game, Set and Match...

Wednesday, May 23, 2018

Making Pivot Tables with Java Streams from Databases

Making Pivot Tables with Java Streams from Databases

Raw data from database rows and tables does not provide so much insight to human readers. Instead, humans are much more likely to see data patterns if we perform some kind of aggregation on the data
before it is being presented to us. A pivot table is a specific form of aggregation where we can apply operations like sorting, averaging, or summing, and also often grouping of columns values.

In this article, I will show how you can compute pivot tables of data from a database in pure Java without writing a single line of SQL. You can easily reuse and modify the examples in this article to fit your own specific needs.

In the examples below, I have used open-source Speedment, which is a Java Stream ORM, and the open-source Sakila film database content for MySQL. Speedment works for any major relational database type such as MySQL, PostgreSQL, Oracle, MariaDB, Microsoft SQL Server, DB2, AS400 and more.

Pivoting

I will construct a Map of Actor objects and, for each Actor, a corresponding List of film ratings of films that a particular Actor has appeared in. Here is an example of how a pivot entry for a specific Actor might look like expressed verbally:

“John Doe participated in 9 films that were rated ‘PG-13’ and 4 films that were rated ‘R’”.

We are going to compute pivot values for all actors in the database. The Sakila database has three tables of interest for this particular application:

1) “film” containing all the films and how the films are rated (e.g. “PG-13”, “R”, etc.).
2) “actors” containing (made up) actors (e.g. “MICHAEL BOLGER”, “LAURA BRODY”, etc.).
3) “film_actor” which links films and actors together in a many-to-many relation.

The first part of the solution involves joining these three tables together. Joins are created using Speedment’s JoinComponent which can be obtained like this:

// Visit https://github.com/speedment/speedment
// to see how a Speedment app is created. It is easy!
Speedment app = …;

JoinComponent joinComponent = app.getOrThrow(JoinComponent.class);

Once we have the JoinComponent, we can start defining Join relations that we need to compute our pivot table:
Join<Tuple3<FilmActor, Film, Actor>> join = joinComponent
        .from(FilmActorManager.IDENTIFIER)
        .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
        .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
        .build(Tuples::of);
The build() takes a method reference Tuples::of that will resolve to a constructor that takes three entities of type; FilmActor, Film and Actor and that will create a compound immutable Tuple3 comprising those specific entities. Tuples are built into Speedment.

Armed with our Join object we now can create our pivot Map using a standard Java Stream obtained from the Join object:

Map<Actor, Map<String, Long>> pivot = join.stream()
    .collect(
        groupingBy(
            // Applies Actor as a first classifier
            Tuple3::get2,
            groupingBy(
                // Applies rating as second level classifier
                tu -> tu.get1().getRating().get(),
                counting() // Counts the elements 
                )
            )
        );

Now that the pivot Map has been computed, we can print its content like this:
// pivot keys: Actor, values: Map<String, Long>
pivot.forEach((k, v) -> { 
    System.out.format(
        "%22s  %5s %n",
        k.getFirstName() + " " + k.getLastName(),
        V
    );
});
This will produce the following output:
        MICHAEL BOLGER  {PG-13=9, R=3, NC-17=6, PG=4, G=8} 
           LAURA BRODY  {PG-13=8, R=3, NC-17=6, PG=6, G=3} 
     CAMERON ZELLWEGER  {PG-13=8, R=2, NC-17=3, PG=15, G=5}
...


Mission completed! In the code above, the method Tuple3::get2 will retrieve the third element from the tuple (an Actor) whereas the method tu.get1() will retrieve the second element from the tuple (a Film).

Speedment will render SQL code automatically from Java and convert the result to a Java Stream. If we enable Stream logging, we can see exactly how the SQL was rendered:
SELECT 
    A.`actor_id`,A.`film_id`,A.`last_update`, 
    B.`film_id`,B.`title`,B.`description`,
    B.`release_year`,B.`language_id`,B.`original_language_id`,
    B.`rental_duration`,B.`rental_rate`,B.`length`,
    B.`replacement_cost`,B.`rating`,B.`special_features`,
    B.`last_update`, C.`actor_id`,C.`first_name`,
    C.`last_name`,C.`last_update`
FROM 
    `sakila`.`film_actor` AS A
INNER JOIN 
    `sakila`.`film` AS B ON (B.`film_id` = A.`film_id`) 
INNER JOIN 
    `sakila`.`actor` AS C ON (C.`actor_id` = A.`actor_id`)

Joins with Custom Tuples

As we noticed in the example above, we have no actual use of the FilmActor object in the Stream since it is only used to link Film and Actor entities together during the Join phase. Also, the generic Tuple3 had general get0(), get1() and get2() methods that did not say anything about what they contained.

All this can be fixed by defining our own custom “tuple” called ActorRating like this:

private static class ActorRating {
    private final Actor actor;
    private final String rating;

    public ActorRating(FilmActor fa, Film film, Actor actor) {
        // fa is not used. See below why
        this.actor = actor;
        this.rating = film.getRating().get();
    }

    public Actor actor() {
        return actor;
    }

    public String rating() {
        return rating;
    }

}


When Join objects are built using the build() method, we can provide a custom constructor that we want to apply on the incoming entities from the database. This is a feature that we are going use as depicted below:
Join<ActorRating> join = joinComponent
    .from(FilmActorManager.IDENTIFIER)
    .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
    .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
    .build(ActorRating::new); // Use a custom constructor

Map<Actor, Map<String, Long>> pivot = join.stream()
    .collect(
        groupingBy(
            ActorRating::actor,
            groupingBy(
                ActorRating::rating,
                counting()
            )
         )
    );
In this example, we proved a class with a constructor (the method reference ActorRating:new gets resolved to new ActorRating(fa, actor, film)) that just discards the linking FilmActor object altogether. The class also provided better names for its properties which made the code more readable. The solution with the custom ActorRating class will produce exactly the same output result as the first example but it looks much nicer when used. I think the effort of writing a custom tuple is worth the extra effort over using generic Tuples in most cases.

Using Parallel Pivoting

One cool thing with Speedment is that it supports the Stream method parallel() out-of-the-box. So, if you have a server with many CPUs, you can take advantage of all those CPU cores when running database queries and joins. This is how parallel pivoting would look like:

Map<Actor, Map<String, Long>> pivot = join.stream()
    .parallel()  // Make our Stream parallel
    .collect(
        groupingBy(
            ActorRating::actor,
            groupingBy(
                ActorRating::rating,
                counting()
            )
         )
    );
We only have to add a single line of code to get parallel aggregation. The default parallel split strategy kicks in when we reach 1024 elements. Thus, parallel pivoting will only take place on tables or joins larger than this. It should be noted that the Sakila database only contains 1000 films, so we would have to run the code on a bigger database to actually be able to benefit from parallelism.

Take it for a Spin!

In this article, we have shown how you can compute pivot data from a database in Java without writing a single line of SQL code. Visit Speedment open-source on GitHub to learn more.

Read more about other features in the the User's Guide.

Wednesday, May 16, 2018

Java Stream ORM Now with JOINs

Java Stream ORM Now with JOINs

Speedment is a Java Stream ORM Toolkit and Runtime that allows you to view database tables as standard Java Streams. Because you do not have to mix Java and SQL, the application becomes much more compact making it faster to develop, less prone to errors and easier to maintain. Streams are also strictly type-safe and lazily constructed so that only a minimum amount of data is pulled in from the database as elements are consumed by the streams.

The new version Speedment 3.1.1 “Homer” now also supports dynamically joined tables to be viewed as standard Java Streams. This is a big deal when developing Java applications that explore relations between database tables.

In the examples below, I have used the open-source Sakila film database content for MySQL that you can download here. Speedment works for any major relational database type such as Oracle, MySQL, Microsoft SQL Server, PostgreSQL, DB2, MariaDB, AS400 and more.

Streaming over a Single Table

The following code snippet will create a List of all Film objects that has a Film.RATING of “PG-13” and where the List is sorted in Film.LENGTH order:
List<Film> list = films.stream()
    .filter(Film.RATING.equal("PG-13"))
    .sorted(Film.LENGTH)
    .collect(toList());

The stream will be automatically rendered to a SQL query under the hood. If we enable Stream logging, we will see the following (prepared statement “?”-variables given as values in the end):
SELECT 
    `film_id`,`title`,`description`,`release_year`,
    `language_id`,`original_language_id`,
    `rental_duration`,`rental_rate`,
    `length`,`replacement_cost`,`rating`,`special_features`,
    `last_update` 
FROM 
    `sakila`.`film` 
WHERE 
    (`rating`  = ? COLLATE utf8_bin) 
ORDER BY
    `length` ASC

values:[PG-13]

Thus, the advantage is that you can express your database queries using type-safe Java and then consume the result by means of standard Java Streams. You do not have to write any SQL code.

Joining Several Tables

Appart from the table “film”, the Sakila database also contains other tables. One of these is a table called “language”. Each Film entity has a foreign key to the Language being spoken in the film using a column named “language_id”.

In this example, I will show how we can create a standard Java Stream that represents a join of these two tables. This way, we can get a Java Stream of matching pairs of Film/Language entities.

Join objects are created using the JoinComponent which can be obtained like this:
// Visit https://github.com/speedment/speedment
// to see how a Speedment app is created. It is easy!
Speedment app = …;

JoinComponent joinComponent = app.getOrThrow(JoinComponent.class);

Once we have grabbed the JoinComponent, we can start creating Join objects like this:
Join<Tuple2<Film, Language>> join = joinComponent
    .from(FilmManager.IDENTIFIER)
    .innerJoinOn(Language.LANGUAGE_ID).equal(Film.LANGUAGE_ID)
    .build(Tuples::of);

Now that we have defined our Join object we can create the actual Java Stream:
join.stream()
    .map(t2 -> String.format(
        "The film '%s' is in %s",
            t2.get0().getTitle(), // get0() -> Film
            t2.get1().getName()   // get1() -> Language
    ))
    .forEach(System.out::println);

This will produce the following output:
The film 'ACADEMY DINOSAUR' is in English
The film 'ACE GOLDFINGER' is in English
The film 'ADAPTATION HOLES' is in English
...

In the code above, the method t2.get0() will retrieve the first element from the tuple (a Film) whereas the method t2.get1() will retrieve the second element from the tuple (a Language). Default generic tuples are built into Speedment and thus Tuple2 is not a Guava class. Speedment does not depend on any other library. Below you will see how you can use any class constructor for the joined tables. Again, Speedment will render SQL code automatically from Java and convert the result to a Java Stream. If we enable Stream logging, we can see exactly how the SQL code was rendered:
SELECT
    A.`film_id`,A.`title`,A.`description`,
    A.`release_year`,A.`language_id`,A.`original_language_id`,
    A.`rental_duration`,A.`rental_rate`,A.`length`,
    A.`replacement_cost`,A.`rating`,A.`special_features`,
    A.`last_update`,
    B.`language_id`,B.`name`,B.`last_update` 
FROM 
    `sakila`.`film` AS A
INNER JOIN 
    `sakila`.`language` AS B 
ON
    (B.`language_id` = A.`language_id`)

Interestingly, the Join object can be created once and be re-used over and over again to create new Streams.

Many-to-Many Relations

The Sakila database also defines a handful of Many-to-Many relations. For example, the table “film_actor” contains rows links films to actors. Each film can have multiple actors and each actor might have appeared in multiple films. Every row in the table links a particular Film to a specific Actor. For example, If a Film depicts 12 Actor entities, then FilmActor contains 12 entries all having the same film_id but different actor_ids. The purpose of this example is to create a complete list of all films and the appearing actors in a Java Stream. This is how we can join the three tables together:
Join<Tuple3<FilmActor, Film, Actor>> join = joinComponent
    .from(FilmActorManager.IDENTIFIER)
    .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
    .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
    .build(Tuples::of);

    join.stream()
        .forEach(System.out::println);


The code above will produce the following output (formatted for readability):
...
Tuple3Impl {
    FilmActorImpl { actorId = 137, filmId = 249, lastUpdate = 2006-02-15 05:05:03.0 },
    FilmImpl { filmId = 249, title = DRACULA CRYSTAL, description =...,
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS,...}
}

Tuple3Impl {
    FilmActorImpl { actorId = 137, filmId = 254, lastUpdate = 2006-02-15 05:05:03.0 },
    FilmImpl { filmId = 254, title = DRIVER ANNIE, description = ...,
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS, ...}
}

Tuple3Impl {
    FilmActorImpl { actorId = 137, filmId = 263, lastUpdate = 2006-02-15 05:05:03.0 },
    FilmImpl { filmId = 263, title = DURHAM PANKY, description = ... },
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS,... }
}
...

Joins with Custom Tuples

As we noticed in the example above, we have no actual use of the FilmActor object in the Stream since it is only used to link Film and Actor objects together during the Join phase.

When Join objects are built using the build() method, we can provide a custom constructor that we want to apply on the incoming entities from the database. The constructor can be of any type so you can write your own Java objects that holds, for example, Film and Actor or any of the columns they contain and that are of interest.

In this example, I proved a (lambda) constructor that just discards the linking FilmActor objects altogether:
Join<Tuple2<Film, Actor>> join = joinComponent
    .from(FilmActorManager.IDENTIFIER)
    .innerJoinOn(Film.FILM_ID).equal(FilmActor.FILM_ID)
    .innerJoinOn(Actor.ACTOR_ID).equal(FilmActor.ACTOR_ID)
    .build((fa, f, a) -> Tuples.of(f, a));

    join.stream()
        .forEach(System.out::println);

The code above will produce the following output (formatted for readability):
...
Tuple2Impl {
    FilmImpl { filmId = 249, title = DRACULA CRYSTAL, description = ... },
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS, ...}
}
Tuple2Impl {
    FilmImpl { filmId = 254, title = DRIVER ANNIE, description = A... }, 
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS,...}
}
Tuple2Impl {
    FilmImpl { filmId = 263, title = DURHAM PANKY, description = ... }, 
    ActorImpl { actorId = 137, firstName = MORGAN, lastName = WILLIAMS,...}
}
...
Thus, we only get matching pairs of Film and Actor entities where there is an appearance of an actor in a film. The linking object FilmActor is never seen in the Stream.

Take it for a Spin!

Over the course of this article, you have learned how to stream over one or several database tables using Speedment.

Visit Speedment open-source on GitHub and try it out!

Read all about the new JOIN functionality in the User's Guide.

Wednesday, November 22, 2017

Easily Return Values From a Transaction with Speedment 3.0.17 or Later

Transactions

In my previous post, I wrote about how to use Transactions in an easy way using Speedment where we updated two bank accounts atomically. As you all might know, transactions are a way of combining a number of database operation into a single operation that is atomically executed.

But transactions are not only about updating the database but also about performing atomic reads. With Speedment, we can compute values atomically using Java streams and then return the result to something outside the scope of the transaction in an easy way.

Returning Transaction Values

Suppose we have a database with films and artists (e.g. using the open-sourced Sakila database content for MySQL) and we would like to count the number of films plus the number of artist in the database in a single transaction. This might be the case if we want to show some kind of database size in a GUI or the likes. Here is how it might look like:

    long sumCount = txHandler.createAndApply(tx -> 
        films.stream().count() + actors.stream().count()
    );

When the transaction starts, the view of the database is "frozen" so that new films or artist that are inserted into the database by other threads are not visible within the transaction. Thus, we can safely assume that our view is unaffected by other threads.

As can be seen, with really minimal boiler plate code, we can express functions that are to be performed within transactions using Speedment.

Preparations

Before transactions can be used, we need to obtain a TransactionHandler like this:

    SakilaApplication app = ....
    TransactionComponent transactionComponent = app.getOrThrow(TransactionComponent.class);
    TransactionHandler txHandler = transactionComponent.createTransactionHandler();

The FilmManager and the ActorManager can be retrieved from the application as shown hereunder:

    FilmManager films = app.getOrThrow(FilmManager.class);
    ActorManager actors = app.getOrThrow(ActorManager.class);

Read more on Speedment transactions here.

What's Next?


Visit Speedment open-source on GitHub and try it out!

Tuesday, November 14, 2017

Transactions Made Simple Using Speedment 3.0.17 and Above

Transactions

Sometimes we want to make sure that our database operations are executed atomically and separated from other operations. This is where transactions come into play. A transaction is a set of operation
proposals that may or may not be accepted by the database as an atomic operation. So, either all operations in the transaction are accepted or no operations of the transaction are accepted. Another advantage with transaction is that the state of the database will be locally "frozen" when the transaction starts, so we will not see updates by other threads while in the transaction.

Speedment is an open-source Stream ORM Java Toolkit and Runtime Java tool that wraps an existing database and its tables into Java 8 streams. Later versions of Speedment support database transactions in an easy-to-use way.

Updates

Imagine we are writing a bank application with accounts and we are going to move $100 from one account (1) to another (2). In this case it is important that money does not disappear (i.e. is deducted from 1 but never deposited in 2) or perhaps even worse, will be duplicated (i.e. deposited in 2 but not deducted from 1). This can be assured using a Speedment database transaction like this:
txHandler.createAndAccept(tx ->

    Account sender = accounts.stream()
        .filter(Account.ID.equal(1))
        .findAny()
        .get();

    Account receiver = accounts.stream()
        .filter(Account.ID.equal(2))
        .findAny()
        .get();

    accounts.update(sender.setBalance(sender.getBalance() - 100));
    accounts.update(receiver.setBalance(receiver.getBalance() + 100));

    tx.commit();
}
When the method tx.commit() is called, the two updates are committed to the database atomically and will be visible to all other threads. If we do not call tx.commit() explicitly then the transaction will be automatically rolled back (i.e. the updates will not have any effect and will be discarded).

Preparations

Before transactions can be used, we need to obtain a TransactionHandler like this:
    BankApplication app = ....
    TransactionComponent transactionComponent = app.getOrThrow(TransactionComponent.class);
    TransactionHandler txHandler = transactionComponent.createTransactionHandler();
The AccountManager can be retrieved from the application as shown hereunder:
    AccountManager accounts = app.getOrThrow(AccountManager.class);

What's Next?

Read more on Speedment transactions here.

Visit GitHub and read all about Speedment open-source here.

Monday, September 11, 2017

The Need for Speed, Access Existing Data 1,000x Faster

Learn how you can speed up your analytics database applications by a factor of 1,000 by using standard Java 8 streams and Speedment’s In-JVM-Memory accelerator.

Web and mobile applications are sometimes slow because the backing database is slow and/or the connection to the database imposes latencies. Modern UIs and interactive applications require fast back-ends with ideally no observable latency or else users will move on to other services or will just get tired and stop using the service altogether.

In this article we will learn how analytic database applications can be speed up by orders of magnitude using standard Java 8 streams and Speedment's in-JVM-memory acceleration technology. At the end, we will run a JMH test suit with representative benchmarks that indicate a speedup factor exceeding 1,000 times.

Viewing the Database as Streams

Speedment is a modern stream based ORM meaning that the tables are viewed as standard Java 8 streams. In this article we will use the “Sakila” database that is an open-source example database available directly from Oracle here. The Sakila example database contains films, actors, etcetera. This is how a Java 8 stream from the database might look like:

    List<Film> secondPage = films.stream()
        .filter(Film.RATING.equal("PG-13"))
        .sorted(Film.TITLE.comparator())
        .skip(50)
        .limit(50)
        .collect(Collectors.toList());

This stream will filter out only those films that has a rating equal to “PG-13” and will then sort the remaining films by film title. After that, the first 50 films are skipped and then the next 50 films are collected to a list. Thus, we get the second page of all PG-13 films sorted in title order.

Usually, we also need to know how many films there are in total that has a rating of “PG-13” in order to show a correctly scaled scrollbar in our application. This can be done like this:

     long count = films.stream()
        .filter(Film.RATING.equal("PG-13"))
        .count();

Using a Database

Speedment will automatically render the Streams to SQL. This way, we can remain in a pure typesafe Java environment without having to write SQL code. By enabling logging, we can see that the first paging stream will be rendered to the following SQL query (assuming we are using MySQL):
 
SELECT
    `film_id`,`title`,`description``release_year`,
    `language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
    `length`,`replacement_cost`,`rating`,`special_features`,`last_update` 
FROM 
    `sakila`.`film` 
WHERE 
    (`sakila`.`film`.`rating`  = ? COLLATE utf8_bin) 
ORDER BY 
    `sakila`.`film`.`title` ASC 
LIMIT ? OFFSET ?

values:[PG-13, 50, 50]
The second count stream will be rendered to:
 
SELECT COUNT(*) FROM (
    SELECT
        `film_id`,`title`,`description``release_year`,
        `language_id`,`original_language_id`,`rental_duration`,`rental_rate`,
        `length`,`replacement_cost`,`rating`,`special_features`,`last_update`
     FROM
             `sakila`.`film` 
        WHERE 
            (`sakila`.`film`.`rating`  = ? COLLATE utf8_bin)
) AS A

values:[PG-13]

Thus, the stream operations are rendered to efficient SQL. When running one thousand of these queries in parallel on a laptop class of computer with the MySQL standard server configuration, they complete in 700 ms and 175 ms aggregated latency respectively. If you are thinking about how the second SQL statement could be efficient, the fact is that the database will be able to basically eliminate the inner select.

Using In-JVM-Memory Acceleration

Now to the fun part. Let us activate the in-JVM-memory acceleration component in Speedment, called DataStore, in our application. This is done in this way:

     SakilaApplication app = new SakilaApplicationBuilder()
        .withPassword("sakila-password")
        // Activate DataStore
        .withBundle(DataStoreBundle.class)
        .build();

        // Load a snapshot of the database into off heap memory
        app.get(DataStoreComponent.class)
            .ifPresent(DataStoreComponent::load);

When the application is started, a snapshot of the database is pulled into the JVM and is stored off-heap. Because data is stored off-heap, data will not influence garbage collection and the amount of data is only limited by the available RAM. Nothing prevents us from loading terabytes of data if we have that amount of RAM available.

If we now run the same application again, we get 22 ms and 1 ms aggregated latency. This means that the latency is reduced by a factor of 30 and 170 respectively. A significant improvement it has to be said. But, it is getting better still.

Using In-JVM-Memory Acceleration and Json

REST and JSON are commonly used to serve clients that request data these days. Speedment has a special collector that can collect JSON data using something called in-place de-serialization whereby only the fields that are needed by the collector are deserialized from off-heap memory. We can depend on the Json plugin by first adding a dependency in our pom file:

     <dependency>
        <groupId>com.speedment.enterprise.plugins</groupId>
        <artifactId>json-stream</artifactId>
        <version>${speedment.enterprise.version}</version>
    </dependency>

Then, we install the plugin in the ApplicationBuilder as shown hereunder:


     SakilaApplication app = new SakilaApplicationBuilder()
        .withPassword("sakila-password")
        .withBundle(DataStoreBundle.class)
        // Install the Json Plugin
        .withBundle(JsonBundle.class)
        .build();

If we only want the Film fields “title”, “rating” and “length” in the json output, we can create a Json encoder like this:


    final JsonComponent json = app.getOrThrow(JsonComponent.class);

    final JsonEncoder<Film> filmEncoder = json.<Film>emptyEncoder()
        .put(Film.TITLE)
        .put(Film.RATING)
        .put(Film.LENGTH)
        .build();

This decoder is immutable and can be reused over and over again in our application:


     String json = films.stream()
        .filter(Film.RATING.equal("PG-13"))
        .sorted(Film.TITLE.comparator())
        .skip(50 * pageNo)
        .limit(50)
        .collect(JsonCollectors.toList(filmEncoder));

This gives us an additional speed up factor of 2 compared to dealing with entire entities. The JsonComponent can do much more than simply collecting things to a list. For example, it can  create aggregations, also using in-place de-serialization. 



Run Your Own Projects with In-JVM-Memory Acceleration

It is easy to try in-JVM-Memory acceleration by yourself. There is a free Initializer that can be found here. Just tick in your desired database type and you will get a POM and an application template automatically generated for you. You also need a license key to run. Just click “Request a Free Trial License Key” on the same page to get one. If you need more help setting up your project, check out the Speedment GitHub page or explore the manual.

How Fast is it for Real?

Speedment supports a number of database types including Oracle, MySQL, MariaDB, PostgreSQL, Microsoft SQL Server, DB2 and AS400. Speedment can also work with Avro files that is used by Hadoop. In this example, we will run MySQL.

Testing performance in Java application is notoriously difficult. Using the JMH framework, I have written a number of typical applications and have run each test hundreds of thousand times and compared the result for pure MySQL and MySQL with Speedment’s in-JVM accelerator. The performance figures below are given as operations/second (higher is better).


Benchmark
Pure MySQL
MySQL with Speedment in-JVM
Speed up factor
Count all
5,324
43,615,967
8,000
Count with filter
5,107
2,465,928
400
Filtering
449
597,702
1,300
Sorting
109
171,304
1,500
Paging
1,547
1,443,015
900
Iterate all
108
5,556
50
Aggregation
117
167,728
1,400
Aggregation filter
453
608,763
1,300

As can be seen, MySQL with Speedment In-JVM accelerator outperforms Pure MySQL by a factor of 1,000 or more in most cases. The smallest speed up factor observed was 50 times which is still very good.

Test Environment

MySQL, 5.7.16 standard installation, MySQL JDBC Driver 5.1.42, Oracle Java 1.8.0_131, Speedment Enterprise 1.1.10, macOS Sierra 10.12.6, Macbook Pro 2.2 GHz i7 (mid 2015), 16 GB RAM.

Benchmark Code

Below are some examples of how the benchmark code looks like. The complete benchmark application can be found on GitHub here. I encourage you to clone it an run it to see the speedup factor on your own target machine.
    @Benchmark
    public String paging() {
        return films.stream()
            .filter(Film.RATING.equal("PG-13"))
            .skip(50)
            .limit(50)
            .collect(filmCollector);
    }


    @Benchmark
    public String aggregationWithFilter() {
        return films.stream()
            .filter(Film.RATING.equal("PG-13"))
            .collect(sumLengthCollector);
    }

How Much RAM is Needed?

Speedment can often store data in RAM more efficiently than the database itself. The Sakila database in the benchmark takes 6.6 MB on disk but Speedment is only using 3 MB of memory. Considering that Speedment indexes all columns by default whereas the database only indexes a few columns, the Speedment is remarkably memory efficient.

How Long Time Does it Take to Load Data?

The Sakila database was loaded and indexed by Speedment in less than 1 second. Speedment can refresh data from the database in the background and will keep track of which streams are running against which database snapshot version (MVCC).

How Much Faster Will My Own Applications Run?

How much latency reduction one would get in a particular project is anyone's guess. Is is x10, x50, x100 or even more? Take the opportunity and find out how much speed you can gain in your own projects!

Take it for a Spin

Learn more about Speedment on GitHub and kick-start your own project using the Speedment Initializer and remember to tick “enable in-memory acceleration” and use the initializer to get a free evaluation license key too. Explore the manual section for the Speedment in-JVM accelerator here or use my twitter handle @PMinborg