MOLAP

Getting started

Getting started

This chapter will introduce the core features of MOLAP and is intended to help new users get started with analyzing their data.

Populating the table with data

A multidimensional data model is typically organized around a central theme, such as sales. This theme is represented by a table that we call DataFrame. The data for the data frame can typically come from CSV or Excel file, an SQL database, a Swing TableModel, or from an array as in the example below:

DataFrame<Integer,String,?> dataFrame = DataFrameFactory.fromRowMajorArray(
    new String[]{"Item", "Origin", "Quantity", "Price"},
    new Class[]{String.class, String.class, Integer.class, Double.class},
    new Object[][]{
            {"Apple", "Switzerland", 12, 112.0},
            {"Pear", "Switzerland", 24, 125.0},
            {"Pear", "Switzerland", 6, 130.0},
            {"Tomato", "Switzerland", 50, 122.0},
            {"Tomato", "Italy", 40, 44.0},
            {"Tomato", "Italy", 100, 80.0}
    }
);

Manipulating the data

MOLAP provides a good range of methods to modifying, slicing, joining, and rearranging data frames. All of this can be made elegantly, through an indexing mechanism that allows stitching data together similarly to what can be made with R or the Pandas libraries. We won't cover this in this introduction, but most of these methods can be found in the DataFrame interface as well its column and row elements that can be extracted as Series.

Aggregating the data

The cube models the data through a multidimensional data model that allows the efficient analysis of the data. The cube is automatically instanciated as soon as you use the data structure returned by DataFrame.aggregate().

Querying the highest level of abstraction

A cube at the highest level of abstraction is the apex cuboid. When a query is placed, this the the default cuboid being used.

Aggregation quantity = dataFrame.getSum("Quantity");
Aggregation revenue = dataFrame.getSum("Price").as("Revenue");
AggregateDataFrame<String> apexQuery = dataFrame.aggregate(quantity, revenue);

Table 1.1. Overall quantity and revenue

Sum(Quantity) Revenue
232.0 613.0

Drilling down one level

Drill Down/Up allows the user to navigate among levels of data ranging from the most summarized (up) to the most detailed (down).

AggregateDataFrame<String> byItemQuery = dataFrame.aggregate(quantity, revenue)
            .drillDown("Item");

Table 1.2. Quantities and revenues by item

Item Sum(Quantity) Revenue
Apple 12.0 112.0
Pear 30.0 255.0
Tomato 190.0 246.0

Order by a specific value

Sorting can be defined through one or more aggregation value.

AggregateDataFrame<String> orderQuery = dataFrame.aggregate(quantity, revenue)
            .drillDown("Item").order(revenue);

Table 1.3. Quantities and revenues by item ordered by quantity

Item Sum(Quantity) Revenue
Apple 12.0 112.0
Tomato 190.0 246.0
Pear 30.0 255.0

Deriving an existing query

An existing query can be further refined.

AggregateDataFrame<String> byOriginQuery = apexQuery.drillDown("Origin");

Table 1.4. Quantities and revenues by origin

Origin Sum(Quantity) Revenue
Switzerland 92.0 489.0
Italy 140.0 124.0

Drilling down multiple levels

Drilling can be done on any number of levels, the cube will adapt automatically.

AggregateDataFrame<String> byItemOriginQuery = byItemQuery.drillDown("Origin");

Table 1.5. Quantities and revenues by item and origin

Item Origin Sum(Quantity) Revenue
Apple Switzerland 12.0 112.0
Pear Switzerland 30.0 255.0
Tomato Switzerland 50.0 122.0
Tomato Italy 140.0 124.0

Alternate group by

A different drilling path should lead to the same results!

AggregateDataFrame<String> byOriginItemQuery = byOriginQuery.drillDown("Item");

Table 1.6. Quantities and revenues by origin and item

Origin Item Sum(Quantity) Revenue
Switzerland Apple 12.0 112.0
Switzerland Pear 30.0 255.0
Switzerland Tomato 50.0 122.0
Italy Tomato 140.0 124.0

Pivoting

Pivot allows an analyst to rotate the cube in space to see its various faces.

AggregateDataFrame<String> pivotQuery = byItemOriginQuery.pivot(revenue);

Table 1.7. Revenues by item and origin (pivot)

Item Italy Switzerland
Apple n.a. 112.0
Pear n.a. 255.0
Tomato 124.0 122.0

Adding summary values

Summary information (e.g. totals) can be added by navigating up the the cuboid hierarchy.

AggregateDataFrame<String> byItemWithTotalQuery = dataFrame.aggregate(quantity, revenue)
            .on(byItemQuery.getCuboid(), byItemQuery.getCuboid().drillUp());

Table 1.8. Quantities and revenues by item with totals

Item Sum(Quantity) Revenue
Apple 12.0 112.0
Pear 30.0 255.0
Tomato 190.0 246.0
(all) 232.0 613.0

All cuboids

A view of all the possible combinations provides interesting analytic possibilities.

AggregateDataFrame<String> cubeQuery = dataFrame.aggregate(quantity, revenue)
            .on(byItemOriginQuery.getCuboid(),
                    byItemQuery.getCuboid(),
                    byOriginQuery.getCuboid(),
                    apexQuery.getCuboid());

Table 1.9. Quantities and revenues by all combinations

Origin Item Sum(Quantity) Revenue
Switzerland Apple 12.0 112.0
Switzerland Pear 30.0 255.0
Switzerland Tomato 50.0 122.0
Italy Tomato 140.0 124.0
(all) Apple 12.0 112.0
(all) Pear 30.0 255.0
(all) Tomato 190.0 246.0
Switzerland (all) 92.0 489.0
Italy (all) 140.0 124.0
(all) (all) 232.0 613.0

Advanced statistics

The cube allow for computing advanced statistics without having to compute everything at every level.

Series unitPriceColumn = dataFrame.getColumn("Price")
            .divide(dataFrame.getColumn("Quantity"));
Series squareUnitPriceColumn = unitPriceColumn.pow(2.0)
            .multiply(dataFrame.getColumn("Quantity"));
Aggregation meanPrice = dataFrame.getMean("Price");
Aggregation stdDevPrice = dataFrame.getStdDev("Price");
Aggregation meanUnitPrice = revenue.dividedBy(quantity)
            .as("Mean unit price");
Aggregation stdDevUnitPrice = new StdDevAggregation(squareUnitPriceColumn,
            dataFrame.getColumn("Price"), dataFrame.getColumn("Quantity"))
            .as("Standard deviation of unit price");
AggregateDataFrame<String> statQuery = dataFrame
            .aggregate(meanPrice, stdDevPrice, meanUnitPrice, stdDevUnitPrice)
            .drillDown("Item")
            .drillDown("Origin");

Table 1.10. Mean and standard deviation of unit price by item

Item Origin Mean(Price) StdDev(Price) Mean unit price Standard deviation of unit price
Apple Switzerland 112.0 0.0 9.333333333333334 0.0
Pear Switzerland 127.5 2.5 8.5 6.583333333333334
Tomato Switzerland 122.0 0.0 2.44 0.0
Tomato Italy 62.0 18.0 0.8857142857142857 0.1355261854357885