MOLAP | Getting Started

Getting started

Getting started

Table of Contents

Populating the table with data
Manipulating the data
Aggregating the data
Querying the highest level of abstraction
Drilling down one level
Order by a specific value
Deriving an existing query
Drilling down multiple levels
Alternate group by
Pivoting
Adding summary values
All cuboids
Advanced statistics

This chapter will introduce the core features of MOLAP and is intended to help new users get started with analyzing their data.

Populating the table with data

A multidimensional data model is typically organized around a central theme, such as sales. This theme is represented by a table that we call DataFrame. The data for the data frame can typically come from CSV or Excel file, an SQL database, a Swing TableModel, or from an array as in the example below:

DataFrame<Integer,String,?> dataFrame = DataFrameFactory.fromRowMajorArray(
    new String[]{"Item", "Origin", "Quantity", "Price"},
    new Class[]{String.class, String.class, Integer.class, Double.class},
    new Object[][]{
            {"Apple", "Switzerland", 12, 112.0},
            {"Pear", "Switzerland", 24, 125.0},
            {"Pear", "Switzerland", 6, 130.0},
            {"Tomato", "Switzerland", 50, 122.0},
            {"Tomato", "Italy", 40, 44.0},
            {"Tomato", "Italy", 100, 80.0}
    }
);

Manipulating the data

MOLAP provides a good range of methods to modifying, slicing, joining, and rearranging data frames. All of this can be made elegantly, through an indexing mechanism that allows stitching data together similarly to what can be made with R or the Pandas libraries. We won't cover this in this introduction, but most of these methods can be found in the DataFrame interface as well its column and row elements that can be extracted as Series.

Aggregating the data

The cube models the data through a multidimensional data model that allows the efficient analysis of the data. The cube is automatically instanciated as soon as you use the data structure returned by DataFrame.aggregate().

Querying the highest level of abstraction

A cube at the highest level of abstraction is the apex cuboid. When a query is placed, this the the default cuboid being used.

Aggregation quantity = dataFrame.getSum("Quantity");
Aggregation revenue = dataFrame.getSum("Price").as("Revenue");
AggregateDataFrame<String> apexQuery = dataFrame.aggregate(quantity, revenue);

Table 1.1. Overall quantity and revenue

Sum(Quantity)	Revenue
232.0	613.0

Drilling down one level

Drill Down/Up allows the user to navigate among levels of data ranging from the most summarized (up) to the most detailed (down).

AggregateDataFrame<String> byItemQuery = dataFrame.aggregate(quantity, revenue)
            .drillDown("Item");

Table 1.2. Quantities and revenues by item

Item	Sum(Quantity)	Revenue
Apple	12.0	112.0
Pear	30.0	255.0
Tomato	190.0	246.0

Order by a specific value

Sorting can be defined through one or more aggregation value.

AggregateDataFrame<String> orderQuery = dataFrame.aggregate(quantity, revenue)
            .drillDown("Item").order(revenue);

Table 1.3. Quantities and revenues by item ordered by quantity

Item	Sum(Quantity)	Revenue
Apple	12.0	112.0
Tomato	190.0	246.0
Pear	30.0	255.0

Deriving an existing query

An existing query can be further refined.

AggregateDataFrame<String> byOriginQuery = apexQuery.drillDown("Origin");

Table 1.4. Quantities and revenues by origin

Origin	Sum(Quantity)	Revenue
Switzerland	92.0	489.0
Italy	140.0	124.0

Drilling down multiple levels

Drilling can be done on any number of levels, the cube will adapt automatically.

AggregateDataFrame<String> byItemOriginQuery = byItemQuery.drillDown("Origin");

Table 1.5. Quantities and revenues by item and origin

Item	Origin	Sum(Quantity)	Revenue
Apple	Switzerland	12.0	112.0
Pear	Switzerland	30.0	255.0
Tomato	Switzerland	50.0	122.0
Tomato	Italy	140.0	124.0

Alternate group by

A different drilling path should lead to the same results!

AggregateDataFrame<String> byOriginItemQuery = byOriginQuery.drillDown("Item");

Table 1.6. Quantities and revenues by origin and item

Origin	Item	Sum(Quantity)	Revenue
Switzerland	Apple	12.0	112.0
Switzerland	Pear	30.0	255.0
Switzerland	Tomato	50.0	122.0
Italy	Tomato	140.0	124.0

Pivoting

Pivot allows an analyst to rotate the cube in space to see its various faces.

AggregateDataFrame<String> pivotQuery = byItemOriginQuery.pivot(revenue);

Table 1.7. Revenues by item and origin (pivot)

Item	Italy	Switzerland
Apple	n.a.	112.0
Pear	n.a.	255.0
Tomato	124.0	122.0

Adding summary values

Summary information (e.g. totals) can be added by navigating up the the cuboid hierarchy.

AggregateDataFrame<String> byItemWithTotalQuery = dataFrame.aggregate(quantity, revenue)
            .on(byItemQuery.getCuboid(), byItemQuery.getCuboid().drillUp());

Table 1.8. Quantities and revenues by item with totals

Item	Sum(Quantity)	Revenue
Apple	12.0	112.0
Pear	30.0	255.0
Tomato	190.0	246.0
(all)	232.0	613.0

All cuboids

A view of all the possible combinations provides interesting analytic possibilities.

AggregateDataFrame<String> cubeQuery = dataFrame.aggregate(quantity, revenue)
            .on(byItemOriginQuery.getCuboid(),
                    byItemQuery.getCuboid(),
                    byOriginQuery.getCuboid(),
                    apexQuery.getCuboid());

Table 1.9. Quantities and revenues by all combinations

Origin	Item	Sum(Quantity)	Revenue
Switzerland	Apple	12.0	112.0
Switzerland	Pear	30.0	255.0
Switzerland	Tomato	50.0	122.0
Italy	Tomato	140.0	124.0
(all)	Apple	12.0	112.0
(all)	Pear	30.0	255.0
(all)	Tomato	190.0	246.0
Switzerland	(all)	92.0	489.0
Italy	(all)	140.0	124.0
(all)	(all)	232.0	613.0

Advanced statistics

The cube allow for computing advanced statistics without having to compute everything at every level.

Series unitPriceColumn = dataFrame.getColumn("Price")
            .divide(dataFrame.getColumn("Quantity"));
Series squareUnitPriceColumn = unitPriceColumn.pow(2.0)
            .multiply(dataFrame.getColumn("Quantity"));
Aggregation meanPrice = dataFrame.getMean("Price");
Aggregation stdDevPrice = dataFrame.getStdDev("Price");
Aggregation meanUnitPrice = revenue.dividedBy(quantity)
            .as("Mean unit price");
Aggregation stdDevUnitPrice = new StdDevAggregation(squareUnitPriceColumn,
            dataFrame.getColumn("Price"), dataFrame.getColumn("Quantity"))
            .as("Standard deviation of unit price");
AggregateDataFrame<String> statQuery = dataFrame
            .aggregate(meanPrice, stdDevPrice, meanUnitPrice, stdDevUnitPrice)
            .drillDown("Item")
            .drillDown("Origin");

Table 1.10. Mean and standard deviation of unit price by item

Item	Origin	Mean(Price)	StdDev(Price)	Mean unit price	Standard deviation of unit price
Apple	Switzerland	112.0	0.0	9.333333333333334	0.0
Pear	Switzerland	127.5	2.5	8.5	6.583333333333334
Tomato	Switzerland	122.0	0.0	2.44	0.0
Tomato	Italy	62.0	18.0	0.8857142857142857	0.1355261854357885