synopsys: foundationsfor...

48
Michael Rudolf 1 , Hannes Voigt 1 , Christof Bornhoevd 2 , and Wolfgang Lehner 1 SynopSys: Foundations for Multidimensional Graph Analytics Business Intelligence for the Real-Time Enterprise (BIRTE 2014) 1 Database Technology Group, Technische Universitt Dresden 2 SAP Labs, LLC, Palo Alto September 1, 2014

Upload: others

Post on 12-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Michael Rudolf1, Hannes Voigt1, Christof Bornhoevd2,and Wolfgang Lehner1

SynopSys: Foundations forMultidimensional Graph AnalyticsBusiness Intelligence for the Real-Time Enterprise (BIRTE 2014)1Database Technology Group, Technische Universität Dresden2SAP Labs, LLC, Palo Alto

September 1, 2014

Page 2: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Motivation: Big (Graph) Data

Peak Performance

Nov. 26, 2012: 26.5M items (306/sec)Nov. 23, 2013: 36.8M items (426/sec)

645M users135 K new every day

58M tweets & 2.1 G searches / day

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 2

Page 3: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Motivation: Big (Graph) Data

Peak Performance

Nov. 26, 2012: 26.5M items (306/sec)Nov. 23, 2013: 36.8M items (426/sec)

645M users135 K new every day

58M tweets & 2.1 G searches / day

Intensional vs. Extensional� Schema & integrity constraints

� Created at design time bydomain experts

ETL

Once & forever

� Collect lots of data �rst

� Try to deduce the intension

� �The Fourth Paradigm� [Mic09]

. . .

Time© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 2

Page 4: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

The Property Graph Model

1

black

64GB�Apple iPadMC707LL/A�

2

black

32 GB

�AppleiPhone 5�

3white

16 GB�Apple

iPhone 4�

a

4�ConsumerElectronics�

5�Phones�7�Tablets�

b

8�Freddy�

FR

9

�Karl�DE

10�Mike�US

11�Steve�US

c

125/5stars

135/5 stars

14

4/5 stars

d

e

f

15

delivered 24/02/14

16ordered24/02/14

g h

part ofpart of

in

in

in

authors

authors

rates

rates

rates

likeslikes

records

records

contains 1

contains 2

contains 1

� Provides directed, attributed multi-relational graphs

� Attributes on vertices and edges as key-value pairs(instance-level instead of class-level)

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 3

Page 5: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

The Property Graph Model

1

black

64GB�Apple iPadMC707LL/A�

2

black

32 GB

�AppleiPhone 5�

3white

16 GB�Apple

iPhone 4�

a

4�ConsumerElectronics�

5�Phones�7�Tablets�

b

8�Freddy�

FR

9

�Karl�DE

10�Mike�US

11�Steve�US

c

125/5stars

135/5 stars

14

4/5 stars

d

e

f

15

delivered 24/02/14

16ordered24/02/14

g h

part ofpart of

in

in

in

authors

authors

rates

rates

rates

likeslikes

records

records

contains 1

contains 2

contains 1

� Provides directed, attributed multi-relational graphs

� Attributes on vertices and edges as key-value pairs(instance-level instead of class-level)

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 3

Page 6: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Agenda

Analytical Scenario: From Graphs to Cubes

Operations: Roll-up, Drill-down, Slice & Dice

Challenges: Unbalanced Hierarchies & OLAP Anomalies

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 4

Page 7: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Graph Cube

1

black

64GB�Apple iPadMC707LL/A�

2

black

32 GB

�AppleiPhone 5�

3white

16 GB�Apple

iPhone 4�

a

4�ConsumerElectronics�

5�Phones�7�Tablets�

b

8�Freddy�

FR

9

�Karl�DE

10�Mike�US

11�Steve�US

c

125/5stars

135/5 stars

14

4/5 stars

d

e

f

15

delivered 24/02/14

16ordered24/02/14

g h

part ofpart of

in

inauthors

authors

rates

rates

rates

in

likeslikes

records

records

contains 1

contains 2

contains 1

1. Identify facts

2. Specify dimensions

3. De�ne measures

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 5

Page 8: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Graph Cube

1

black

64GB�Apple iPadMC707LL/A�

2

black

32 GB

�AppleiPhone 5�

3white

16 GB�Apple

iPhone 4�

a

4�ConsumerElectronics�

5�Phones�7�Tablets�

b

8�Freddy�

FR

9

�Karl�DE

10�Mike�US

11�Steve�US

c

125/5stars

135/5 stars

14

4/5 stars

d

e

f

15

delivered 24/02/14

16ordered24/02/14

g h

part ofpart of

in

inauthors

authors

rates

rates

rates

in

likeslikes

records

records

contains 1

contains 2

contains 1

1. Identify facts

2. Specify dimensions

3. De�ne measures

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 5

Page 9: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Graph Cube

1

black

64GB�Apple iPadMC707LL/A�

2

black

32 GB

�AppleiPhone 5�

3white

16 GB�Apple

iPhone 4�

a

4�ConsumerElectronics�

5�Phones�7�Tablets�

b

8�Freddy�

FR

9

�Karl�DE

10�Mike�US

11�Steve�US

c

125/5stars

135/5 stars

14

4/5 stars

d

e

f

15

delivered 24/02/14

16ordered24/02/14

g h

part ofpart of

in

inauthors

authors

rates

rates

rates

in

likeslikes

records

records

contains 1

contains 2

contains 1

1. Identify facts

2. Specify dimensions

3. De�ne measures

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 5

Page 10: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Graph Cube

1

black

64GB�Apple iPadMC707LL/A�

2

black

32 GB

�AppleiPhone 5�

3white

16 GB�Apple

iPhone 4�

a

4�ConsumerElectronics�

5�Phones�7�Tablets�

b

8�Freddy�

FR

9

�Karl�DE

10�Mike�US

11�Steve�US

c

125/5stars

135/5 stars

14

4/5 stars

d

e

f

15

delivered 24/02/14

16ordered24/02/14

g h

part ofpart of

in

inauthors

authors

rates

rates

rates

in

likeslikes

records

records

contains 1

contains 2

contains 1

1. Identify facts

2. Specify dimensions

3. De�ne measures

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 5

Page 11: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Facts

Depending on the use case, a (base) fact can be

� a vertex attribute, an edge attribute, or

� the presence of an edge.

in general: a subgraph

Ô Use pattern matching Ô graphical speci�cation instead of DSL

Example

authorsrates Match reviews of products and

their authors (vertex typesindicated via color)

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 6

Page 12: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Facts

Depending on the use case, a (base) fact can be

� a vertex attribute, an edge attribute, or

� the presence of an edge.in general: a subgraph

Ô Use pattern matching Ô graphical speci�cation instead of DSL

Example

authorsrates Match reviews of products and

their authors (vertex typesindicated via color)

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 6

Page 13: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Facts

Depending on the use case, a (base) fact can be

� a vertex attribute, an edge attribute, or

� the presence of an edge.in general: a subgraph

Ô Use pattern matching Ô graphical speci�cation instead of DSL

Example

authorsrates Match reviews of products and

their authors (vertex typesindicated via color)

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 6

Page 14: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Facts

Depending on the use case, a (base) fact can be

� a vertex attribute, an edge attribute, or

� the presence of an edge.in general: a subgraph

Ô Use pattern matching Ô graphical speci�cation instead of DSL

Example

authorsrates Match reviews of products and

their authors (vertex typesindicated via color)

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 6

Page 15: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Dimensions

Dimensions can be

1. vertex or edge attributes

2. connectivity

1

black

64GB�Apple iPadMC707LL/A�

2

black

32 GB

�AppleiPhone 5�

3white

16 GB�Apple

iPhone 4�

a

4�ConsumerElectronics�

5�Phones�7�Tablets�

b

8�Freddy�

FR

9

�Karl�DE

10�Mike�US

11�Steve�US

c

125/5stars

135/5 stars

14

4/5 stars

d

e

f

15

delivered 24/02/14

16ordered24/02/14

g h

part ofpart of

in

inin

authors

authorsrates

rates

rates

likeslikes

records

records

contains 1

contains 2

contains 1

Structure in Dimensions� extrinsic: not contained in graph data,needs to be provided externally (e.g., GeoNames)

� intrinsic: embodied in graph data

explicit: captured as topological informationimplicit: has to be derived from attribute values

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 7

Page 16: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Dimensions

Dimensions can be

1. vertex or edge attributes

2. connectivity

1

black

64GB�Apple iPadMC707LL/A�

2

black

32 GB

�AppleiPhone 5�

3white

16 GB�Apple

iPhone 4�

a

4�ConsumerElectronics�

5�Phones�7�Tablets�

b

8�Freddy�

FR

9

�Karl�DE

10�Mike�US

11�Steve�US

c

125/5stars

135/5 stars

14

4/5 stars

d

e

f

15

delivered 24/02/14

16ordered24/02/14

g h

part ofpart of

in

inin

authors

authorsrates

rates

rates

likeslikes

records

records

contains 1

contains 2

contains 1

Structure in Dimensions� extrinsic: not contained in graph data,needs to be provided externally (e.g., GeoNames)

� intrinsic: embodied in graph data

explicit: captured as topological informationimplicit: has to be derived from attribute values

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 7

Page 17: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Intrinsic Dimensions

Explicit Dimensions� Can be speci�ed using path expressions� In general requires one path expression per level, e.g.

-[@type='belongsTo ']->[@type='state ']-[@type='partOf ']->[@type='country ']

Implicit Dimensions� Might require bucketization� In general requires one expression per level, e.g.

GetWeekOfYear(@ordered) and GetYear(@ordered)

alias @ attribute access of vertex or edge attribute

- [ edge predicate ] -> [ vertex predicate ] ( length )

paths (with optional recursion depth), optionally satisfying the predicates© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 8

Page 18: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Intrinsic Dimensions

Explicit Dimensions� Can be speci�ed using path expressions� In general requires one path expression per level, e.g.

-[@type='belongsTo ']->[@type='state ']-[@type='partOf ']->[@type='country ']

Implicit Dimensions� Might require bucketization� In general requires one expression per level, e.g.

GetWeekOfYear(@ordered) and GetYear(@ordered)

alias @ attribute access of vertex or edge attribute

- [ edge predicate ] -> [ vertex predicate ] ( length )

paths (with optional recursion depth), optionally satisfying the predicates© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 8

Page 19: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Intrinsic Dimensions

Explicit Dimensions� Can be speci�ed using path expressions� In general requires one path expression per level, e.g.

-[@type='belongsTo ']->[@type='state ']-[@type='partOf ']->[@type='country ']

Implicit Dimensions� Might require bucketization� In general requires one expression per level, e.g.

GetWeekOfYear(@ordered) and GetYear(@ordered)

alias @ attribute access of vertex or edge attribute

- [ edge predicate ] -> [ vertex predicate ] ( length )

paths (with optional recursion depth), optionally satisfying the predicates© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 8

Page 20: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Dimension Speci�cation

ExampleName Seed Pattern Levels

Nationality $c $c@nationality

Category $p

Product category:

$p-[@type='in']->

Product group:

$p-[@type='in']->-[@type='part-of']->

Product area:

$p-[@type='in']->-[@type='part-of']->(2)

Seed Pattern� Connects facts to dimensions� Is matched against facts

Ô Has to be a super pattern of the fact pattern (i.e., more general)

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 9

Page 21: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Dimension Speci�cation

ExampleName Seed Pattern Levels

Nationality $c $c@nationality

Category $p

Product category:

$p-[@type='in']->

Product group:

$p-[@type='in']->-[@type='part-of']->

Product area:

$p-[@type='in']->-[@type='part-of']->(2)

Seed Pattern� Connects facts to dimensions� Is matched against facts

Ô Has to be a super pattern of the fact pattern (i.e., more general)© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 9

Page 22: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Properties of Dimensions

MonotonyLevels should be ordered such thatthe number of items decreases.

Level Name # Elements

1 Region 125

2 Country 30

3 Continent 3

HierarchyLevels should form hierarchies.If two facts map to the sameelement in li, they should map tothe same element in li+1 as well.Ô Functional dependency

Fact Level 1 Level 2 Level 3

A Saxony Germany Europe

B Saxony Germany Europe

C Bavaria Germany Europe

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 10

Page 23: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Properties of Dimensions

MonotonyLevels should be ordered such thatthe number of items decreases.

Level Name # Elements

1 Region 125

2 Country 30

3 Continent 3

HierarchyLevels should form hierarchies.If two facts map to the sameelement in li, they should map tothe same element in li+1 as well.Ô Functional dependency

Fact Level 1 Level 2 Level 3

A Saxony Germany Europe

B Saxony Germany Europe

C Bavaria Germany Europe

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 10

Page 24: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Measures

A measure is a derived fact

� combining several facts

� computed by a speci�ed function(e.g., scalar, aggregation).

Ô Annotate the fact patternÔ Introduce representative vertex

Example� Average product rating byproduct category

� Minimum age of customersby nationality

$c

$r $p++

(Min. Age, $c@age,MIN)

(Avg. Rtg., $r@stars,AVG)

authors$a

rates$e

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 11

Page 25: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Measures

A measure is a derived fact

� combining several facts

� computed by a speci�ed function(e.g., scalar, aggregation).

Ô Annotate the fact patternÔ Introduce representative vertex

Example� Average product rating byproduct category

� Minimum age of customersby nationality

$c

$r $p++

(Min. Age, $c@age,MIN)

(Avg. Rtg., $r@stars,AVG)

authors$a

rates$e

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 11

Page 26: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Measures

A measure is a derived fact

� combining several facts

� computed by a speci�ed function(e.g., scalar, aggregation).

Ô Annotate the fact patternÔ Introduce representative vertex

Example� Average product rating byproduct category

� Minimum age of customersby nationality

$c

$r $p++

(Min. Age, $c@age,MIN)

(Avg. Rtg., $r@stars,AVG)

authors$a

rates$e

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 11

Page 27: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Operations: Roll-up, Drill-down, Slice & Dice

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 12

Page 28: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Roll-up/Drill-down

Granularity of the Cube� Represents the �grouping�: the current levels of interest

� Initially: the lowest level of each dimension

Roll-up� Reduces the granularity

� For dimension d, move up one level from li to li+1

Drill-down� Increases the granularity

� For dimension d, move down one level from li to li−1

Ô Introduce representative vertex for each groupÔ Expose computed values for measures as attributes

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 13

Page 29: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Roll-up/Drill-down

Granularity of the Cube� Represents the �grouping�: the current levels of interest

� Initially: the lowest level of each dimension

Roll-up� Reduces the granularity

� For dimension d, move up one level from li to li+1

Drill-down� Increases the granularity

� For dimension d, move down one level from li to li−1

Ô Introduce representative vertex for each groupÔ Expose computed values for measures as attributes

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 13

Page 30: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Roll-up/Drill-down

Granularity of the Cube� Represents the �grouping�: the current levels of interest

� Initially: the lowest level of each dimension

Roll-up� Reduces the granularity

� For dimension d, move up one level from li to li+1

Drill-down� Increases the granularity

� For dimension d, move down one level from li to li−1

Ô Introduce representative vertex for each groupÔ Expose computed values for measures as attributes

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 13

Page 31: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Slice & Dice

Function �lter transforms fact base of cube

� evaluates level-predicate pairs

� removes facts not matching the predicates

For a single predicate applied to one dimension Ô slice

Example

Slice product reviews by German customers from the cube c:filter(c, {(Nationality, λ = �DE�)}).

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 14

Page 32: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Slice & Dice

Function �lter transforms fact base of cube

� evaluates level-predicate pairs

� removes facts not matching the predicates

For a single predicate applied to one dimension Ô slice

Example

Slice product reviews by German customers from the cube c:filter(c, {(Nationality, λ = �DE�)}).

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 14

Page 33: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Challenges: Unbalanced Hierarchies & OLAP Anomalies

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 15

Page 34: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Unbalanced Hierarchies

Facts with di�erent granularities

Relative dimension speci�cation:

Product category:

$p-[@type='in']->

Product group:

$p-[@type='in']->-[@type='part-of']->

Product area:

$p-[@type='in']->-[@type='part-of']->(2)

Ô Absolute instead of relativedimension speci�cation required

Example

Products in categories andgroups

15red 16 GB

�Google Nexus 5�16black

�SamsungE1200�

4�Cell Phones& Accessories�

5 �Phones�

6

�Computers &Accessories�

7�Tablets�

12�Smartphones�in in

part of

part ofpart of

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 16

Page 35: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Unbalanced Hierarchies

Facts with di�erent granularities

Relative dimension speci�cation:

Product category:

$p-[@type='in']->

Product group:

$p-[@type='in']->-[@type='part-of']->

Product area:

$p-[@type='in']->-[@type='part-of']->(2)

Ô Absolute instead of relativedimension speci�cation required

Example

Products in categories andgroups

15red 16 GB

�Google Nexus 5�16black

�SamsungE1200�

4�Cell Phones& Accessories�

5 �Phones�

6

�Computers &Accessories�

7�Tablets�

12�Smartphones�in in

part of

part ofpart of

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 16

Page 36: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Unbalanced Hierarchies

Facts with di�erent granularities

Relative dimension speci�cation:

Product category:

$p-[@type='in']->

Product group:

$p-[@type='in']->-[@type='part-of']->

Product area:

$p-[@type='in']->-[@type='part-of']->(2)

Ô Absolute instead of relativedimension speci�cation required

Example

Products in categories andgroups

15red 16 GB

�Google Nexus 5�16black

�SamsungE1200�

4�Cell Phones& Accessories�

5 �Phones�

6

�Computers &Accessories�

7�Tablets�

12�Smartphones�in in

part of

part ofpart of

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 16

Page 37: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Unbalanced Hierarchies

Facts with di�erent granularities

Relative dimension speci�cation:

Product category:

$p-[@type='in']->

Product group:

$p-[@type='in']->-[@type='part-of']->

Product area:

$p-[@type='in']->-[@type='part-of']->(2)

Ô Absolute instead of relativedimension speci�cation required

Example

Products in categories andgroups

15red 16 GB

�Google Nexus 5�16black

�SamsungE1200�

4�Cell Phones& Accessories�

5 �Phones�

6

�Computers &Accessories�

7�Tablets�

12�Smartphones�in in

part of

part ofpart of

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 16

Page 38: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Unbalanced Hierarchies

Facts with di�erent granularities

Relative dimension speci�cation:

Product category:

$p-[@type='in']->

Product group:

$p-[@type='in']->-[@type='part-of']->

Product area:

$p-[@type='in']->-[@type='part-of']->(2)

Ô Absolute instead of relativedimension speci�cation required

Example

Products in categories andgroups

15red 16 GB

�Google Nexus 5�16black

�SamsungE1200�

4�Cell Phones& Accessories�

5 �Phones�

6

�Computers &Accessories�

7�Tablets�

12�Smartphones�in in

part of

part ofpart of

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 16

Page 39: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Unbalanced Hierarchies

Facts with di�erent granularities

Relative dimension speci�cation:

Product category:

$p-[@type='in']->

Product group:

$p-[@type='in']->-[@type='part-of']->

Product area:

$p-[@type='in']->-[@type='part-of']->(2)

Ô Absolute instead of relativedimension speci�cation required

Example

Products in categories andgroups

15red 16 GB

�Google Nexus 5�16black

�SamsungE1200�

4�Cell Phones& Accessories�

5 �Phones�

6

�Computers &Accessories�

7�Tablets�

12�Smartphones�in in

part of

part ofpart of

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 16

Page 40: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Unbalanced Hierarchies

Solution: Pre-process the graph

Data Cleansing� Balance hierarchies

� Add missing root nodes

Tagging� Add attributes for absolutereferencing

15red 16 GB

�Google Nexus 5�16black

�SamsungE1200�

4�Cell Phones& Accessories�

5 �Phones�

6

�Computers &Accessories�

7�Tablets�

12�Smartphones�in in

part of

part ofpart of

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 17

Page 41: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Unbalanced Hierarchies

Solution: Pre-process the graph

Data Cleansing� Balance hierarchies

� Add missing root nodes

Tagging� Add attributes for absolutereferencing

15red 16 GB

�Google Nexus 5�16black

�SamsungE1200�

4�Cell Phones& Accessories�

5 �Phones�

6

�Computers &Accessories�

7�Tablets�

12�Smartphones�in

part of

part ofpart of

13 �Dumbphones�

14�ConsumerElectronics�

in

part of

part ofpart of

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 17

Page 42: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Unbalanced Hierarchies

Solution: Pre-process the graph

Data Cleansing� Balance hierarchies

� Add missing root nodes

Tagging� Add attributes for absolutereferencing

15red 16 GB

�Google Nexus 5�16black

�SamsungE1200�

43 �Cell Phones& Accessories�

5

2

�Phones�

62

�Computers &Accessories�

7 1�Tablets�

12 1�Smartphones�in in

part of

part ofpart of

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 17

Page 43: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

OLAP Anomalies

It depends on the model:

� double counting can occur, if acardinality assumption isviolated (1:1 vs. 1:nrelationship)

7�Tablets�5 �Phones�

15128GB

�Apple iPad Air�

16 black

�SamsungE1200�

inin in

� incompleteness can occur, if aconnectivity assumption isviolated

7�Tablets�5 �Phones�

15128GB

�Apple iPad Air�

16 black

�SamsungE1200�

inin

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 18

Page 44: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

OLAP Anomalies

It depends on the model:

� double counting can occur, if acardinality assumption isviolated (1:1 vs. 1:nrelationship)

7�Tablets�5 �Phones�

15128GB

�Apple iPad Air�

16 black

�SamsungE1200�

inin in

� incompleteness can occur, if aconnectivity assumption isviolated

7�Tablets�5 �Phones�

15128GB

�Apple iPad Air�

16 black

�SamsungE1200�

inin

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 18

Page 45: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

Conclusion

Powerful Mapping of Multidimensional Analytics� Expose well-known concepts and operations

� Emphasize challenges posed by graph data

Ô Open up the graph world to Business Intelligence

Flexible Work�ow for the Big Graph Data Era� No up-front schema design

� Adapt to changing data and requirements

Ô What is a fact today can be a dimension tomorrow

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 19

Page 46: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

1 Additional Material & References

Page 47: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

References I

Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han, and Philip S. Yu.Graph OLAP: Towards Online Analytical Processing on Graphs.In Proceedings of the Eighth International Conference on Data Mining, pages 103�112, Pisa,Italy, December 2008. IEEE.

Microsoft Research.The Fourth Paradigm: Data-Intensive Scienti�c Discovery.Microsoft Press, 2009.

Marko A. Rodriguez and Peter Neubauer.Constructions from Dots and Lines.Bulletin of the American Society for Information Science and Technology, 36(6):35�41, 2010.

Yuanyuan Tian and Jignesh M. Patel.TALE: A Tool for Approximate Large Graph Matching.In 2008 IEEE 24th International Conference on Data Engineering, pages 963�972. IEEE, April2008.

Peixiang Zhao, Xiaolei Li, Dong Xin, and Jiawei Han.Graph Cube: On Warehousing and OLAP Multidimensional Networks.In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages853�864, Athens, Greece, 2011. ACM.

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 2

Page 48: SynopSys: Foundationsfor MultidimensionalGraphAnalyticsdb.csail.mit.edu/birte2014/slides/slides10.pdf · 2014-09-10 · Michael Rudolf 1, Hannes Voigt , Christof Bornhoevd2, and Wolfgang

References II

Ning Zhang, Yuanyuan Tian, and Jignesh M. Patel.Discovery-Driven Graph Summarization.In Proceedings of the 26th International Conference on Data Engineering, pages 880�891,Long Beach, CA, USA, 2010. IEEE.

© Michael Rudolf | SynopSys: Foundations for Multidimensional Graph Analytics | 3