Table of Contents
This chapter will introduce the core features of MOLAP and is intended to help new users get started with analyzing their data.
A multidimensional data model is typically organized around a central theme, such as
sales. This theme is represented by a table that we call DataFrame
. The
data for the data frame can typically come from CSV or Excel file, an SQL database, a
Swing TableModel, or from an array as in the example below:
DataFrame<Integer,String,?> dataFrame = DataFrameFactory.fromRowMajorArray( new String[]{"Item", "Origin", "Quantity", "Price"}, new Class[]{String.class, String.class, Integer.class, Double.class}, new Object[][]{ {"Apple", "Switzerland", 12, 112.0}, {"Pear", "Switzerland", 24, 125.0}, {"Pear", "Switzerland", 6, 130.0}, {"Tomato", "Switzerland", 50, 122.0}, {"Tomato", "Italy", 40, 44.0}, {"Tomato", "Italy", 100, 80.0} } );
MOLAP provides a good range of methods to modifying, slicing, joining, and rearranging
data frames. All of this can be made elegantly, through an indexing mechanism that
allows stitching data together similarly to what can be made with
R or the Pandas libraries. We
won't cover this in this introduction, but most of these methods can be found in the
DataFrame
interface as well its column and row elements that can be
extracted as Series
.
The cube models the data through a multidimensional data model that allows the
efficient analysis of the data. The cube is automatically instanciated as soon as you use
the data structure returned by DataFrame.aggregate()
.
A cube at the highest level of abstraction is the apex cuboid. When a query is placed, this the the default cuboid being used.
Aggregation quantity = dataFrame.getSum("Quantity"); Aggregation revenue = dataFrame.getSum("Price").as("Revenue"); AggregateDataFrame<String> apexQuery = dataFrame.aggregate(quantity, revenue);
Drill Down/Up allows the user to navigate among levels of data ranging from the most summarized (up) to the most detailed (down).
AggregateDataFrame<String> byItemQuery = dataFrame.aggregate(quantity, revenue) .drillDown("Item");
Table 1.2. Quantities and revenues by item
Item | Sum(Quantity) | Revenue |
---|---|---|
Apple | 12.0 | 112.0 |
Pear | 30.0 | 255.0 |
Tomato | 190.0 | 246.0 |
Sorting can be defined through one or more aggregation value.
AggregateDataFrame<String> orderQuery = dataFrame.aggregate(quantity, revenue) .drillDown("Item").order(revenue);
Table 1.3. Quantities and revenues by item ordered by quantity
Item | Sum(Quantity) | Revenue |
---|---|---|
Apple | 12.0 | 112.0 |
Tomato | 190.0 | 246.0 |
Pear | 30.0 | 255.0 |
An existing query can be further refined.
AggregateDataFrame<String> byOriginQuery = apexQuery.drillDown("Origin");
Table 1.4. Quantities and revenues by origin
Origin | Sum(Quantity) | Revenue |
---|---|---|
Switzerland | 92.0 | 489.0 |
Italy | 140.0 | 124.0 |
Drilling can be done on any number of levels, the cube will adapt automatically.
AggregateDataFrame<String> byItemOriginQuery = byItemQuery.drillDown("Origin");
Table 1.5. Quantities and revenues by item and origin
Item | Origin | Sum(Quantity) | Revenue |
---|---|---|---|
Apple | Switzerland | 12.0 | 112.0 |
Pear | Switzerland | 30.0 | 255.0 |
Tomato | Switzerland | 50.0 | 122.0 |
Tomato | Italy | 140.0 | 124.0 |
A different drilling path should lead to the same results!
AggregateDataFrame<String> byOriginItemQuery = byOriginQuery.drillDown("Item");
Table 1.6. Quantities and revenues by origin and item
Origin | Item | Sum(Quantity) | Revenue |
---|---|---|---|
Switzerland | Apple | 12.0 | 112.0 |
Switzerland | Pear | 30.0 | 255.0 |
Switzerland | Tomato | 50.0 | 122.0 |
Italy | Tomato | 140.0 | 124.0 |
Pivot allows an analyst to rotate the cube in space to see its various faces.
AggregateDataFrame<String> pivotQuery = byItemOriginQuery.pivot(revenue);
Table 1.7. Revenues by item and origin (pivot)
Item | Italy | Switzerland |
---|---|---|
Apple | n.a. | 112.0 |
Pear | n.a. | 255.0 |
Tomato | 124.0 | 122.0 |
Summary information (e.g. totals) can be added by navigating up the the cuboid hierarchy.
AggregateDataFrame<String> byItemWithTotalQuery = dataFrame.aggregate(quantity, revenue) .on(byItemQuery.getCuboid(), byItemQuery.getCuboid().drillUp());
Table 1.8. Quantities and revenues by item with totals
Item | Sum(Quantity) | Revenue |
---|---|---|
Apple | 12.0 | 112.0 |
Pear | 30.0 | 255.0 |
Tomato | 190.0 | 246.0 |
(all) | 232.0 | 613.0 |
A view of all the possible combinations provides interesting analytic possibilities.
AggregateDataFrame<String> cubeQuery = dataFrame.aggregate(quantity, revenue) .on(byItemOriginQuery.getCuboid(), byItemQuery.getCuboid(), byOriginQuery.getCuboid(), apexQuery.getCuboid());
Table 1.9. Quantities and revenues by all combinations
Origin | Item | Sum(Quantity) | Revenue |
---|---|---|---|
Switzerland | Apple | 12.0 | 112.0 |
Switzerland | Pear | 30.0 | 255.0 |
Switzerland | Tomato | 50.0 | 122.0 |
Italy | Tomato | 140.0 | 124.0 |
(all) | Apple | 12.0 | 112.0 |
(all) | Pear | 30.0 | 255.0 |
(all) | Tomato | 190.0 | 246.0 |
Switzerland | (all) | 92.0 | 489.0 |
Italy | (all) | 140.0 | 124.0 |
(all) | (all) | 232.0 | 613.0 |
The cube allow for computing advanced statistics without having to compute everything at every level.
Series unitPriceColumn = dataFrame.getColumn("Price") .divide(dataFrame.getColumn("Quantity")); Series squareUnitPriceColumn = unitPriceColumn.pow(2.0) .multiply(dataFrame.getColumn("Quantity")); Aggregation meanPrice = dataFrame.getMean("Price"); Aggregation stdDevPrice = dataFrame.getStdDev("Price"); Aggregation meanUnitPrice = revenue.dividedBy(quantity) .as("Mean unit price"); Aggregation stdDevUnitPrice = new StdDevAggregation(squareUnitPriceColumn, dataFrame.getColumn("Price"), dataFrame.getColumn("Quantity")) .as("Standard deviation of unit price"); AggregateDataFrame<String> statQuery = dataFrame .aggregate(meanPrice, stdDevPrice, meanUnitPrice, stdDevUnitPrice) .drillDown("Item") .drillDown("Origin");
Table 1.10. Mean and standard deviation of unit price by item
Item | Origin | Mean(Price) | StdDev(Price) | Mean unit price | Standard deviation of unit price |
---|---|---|---|---|---|
Apple | Switzerland | 112.0 | 0.0 | 9.333333333333334 | 0.0 |
Pear | Switzerland | 127.5 | 2.5 | 8.5 | 6.583333333333334 |
Tomato | Switzerland | 122.0 | 0.0 | 2.44 | 0.0 |
Tomato | Italy | 62.0 | 18.0 | 0.8857142857142857 | 0.1355261854357885 |