parallel olap andrew rau-chaplin faculty of computer science dalhousie university joint work with f....

11
Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

Upload: benedict-glenn

Post on 24-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

Parallel OLAP

Andrew Rau-ChaplinFaculty of Computer ScienceDalhousie University

Joint Work withF. DehneT. EavisS. Hambrusch

Page 2: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

Decision Support Systems A time-oriented analysis of

scientific or organizational data

Information Processing

Online Analytical Processing (OLAP)

Data Minning

Page 3: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

Data Warehousing for Decision Support

Operational data collected into DW

DW used to support multi-dimensional views

Views form the basis of OLAP processing

Our focus: the OLAP server

Data MiningAnalysisQuery Reports

Olap ServerOlap Server

Meta Data Repository

MonitoringAdministration

Operational Databases

Data Warehouse

Data Marts

External Sources

ExtractClean

TransformLoad

Refresh

Output

Front-End Tools

Olap Engines

Data Storage

Data Cleaningand

Integration

Page 4: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

Data Cube Generation

Proposed by Gray et al in 1995 Can be generated from a

relational DB but…

A

B

C The cuboid ABC (or CAB)

ABC

AB AC BC

A C B

ALL

12

18

83

21

34

3850

21

Page 5: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

Core OLAP Operations Five fundamental OLAP operations:

roll-up, drill-down, slice, dice, and pivot

Range Queries

Page 6: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

The Challenge Design and build a parallel ROLAP system

Full cube generation Partial cube generation Indexing and query resolution

For High dimensionality: 10 – 30 D Large input data sizes: Gigabytes Large output data sizes: Terabytes

Implications Parallel + external memory Shared disk + Shared nothing

Page 7: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

The Architectural Model Shared Disk

A set of P processors connected via an interconnection fabric

standard-sized local memory concurrent access to a shared

disk array Shared Nothing

A set of p processors connected via and interconnection fabric

Standard size local memory Independent local disk(s)

Algorithm Design CGM (Coarse Grained

Multicomputer)

Communication Fabric

p1 p2

p3

p4

pn

Communication Fabric

p1 p2

p3

p4

pn

Page 8: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

Coarse Grained Multicomputer

A set of P processors Arbitrary

communication topology or shared memory

m memory per processor, m >>p

Communication round consists of an h-relation in which all proc. send and receive O(m) data

Communication Fabric

Page 9: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

MOLAP vs. ROLAP

Model Year Colour Sales

Chevy 1990 Blue 87

Chevy 1990 Red 5

Chevy 1990 ALL 92

Chevy ALL Blue 87

Chevy ALL Red 5

Chevy ALL ALL 92

Ford 1990 Blue 99

Ford 1990 Green 64

Ford 1990 ALL 163

Ford 1991 Blue 7

Ford 1991 Red 8

Ford 1991 ALL 15

Ford ALL Blue 106

Ford ALL Green 64

Ford ALL Red 8

ALL 1990 Blue 186

ALL 1990 Green 64

ALL 1991 Blue 7

ALL 1991 Red 8

Ford ALL ALL 178

ALL 1990 ALL 255

ALL 1991 ALL 15

ALL ALL Blue 193

ALL ALL Green 64

ALL ALL Red 13

ALL ALL ALL 270

Model Year Colour Sales

Chevy 1990 Red 5

Chevy 1990 Blue 87

Ford 1990 Green 64

Ford 1990 Blue 99

Ford 1991 Red 8

Ford 1991 Blue 7

Page 10: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

Existing Parallel Results Goil &

Choudhary MOLAP Approach

Parallelize the generation of each cuboid

Challenge > 2d comm.

rounds

Page 11: Parallel OLAP Andrew Rau-Chaplin Faculty of Computer Science Dalhousie University Joint Work with F. Dehne T. Eavis S. Hambrusch

Parallelizing the Data Cube

Generating Data Cubes (Shared Disk) Generating Data Cubes (Shared

Nothing) Generating Partial Data Cubes Parallel Multi-dimensional Indexing Conclusions and Future Work