presentation

20
A New OLAP Aggregation A New OLAP Aggregation Based on the AHC Technique Based on the AHC Technique DOLAP 2004 R. Ben Messaoud, O. Boussaid, S. Rabaséda Laboratoire ERIC – Université de Lyon 5, avenue Pierre-Mendès–Fran 69676, Bron Cedex – Fran http://eric.univ-lyon2.

Upload: tommy96

Post on 28-Nov-2014

275 views

Category:

Documents


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Presentation

A New OLAP Aggregation A New OLAP Aggregation Based on the AHC Based on the AHC

TechniqueTechnique

DOLAP 2004

R. Ben Messaoud, O. Boussaid, S. Rabaséda

Laboratoire ERIC – Université de Lyon 25, avenue Pierre-Mendès–France

69676, Bron Cedex – Francehttp://eric.univ-lyon2.fr

Page 2: Presentation

November 13, 2004 Ben Messaoud et al. 2

Complex data

1

2

3

4

5

0

Definition:

Data are considered complex if they are …

Multi-formats: information can be supported by different kind of data (numeric, symbolic, texts, images, sounds, videos …)

Multi-structures: structured, unstructured or semi-structured (relational databases, XML documents …)

Multi-sources: data come from different sources (distributed databases, web …)

Multi-modals: the same information can be described differently (data in different languages …)

Multi-versions: data are updated through time (temporal databases, periodical inventory …)

Page 3: Presentation

November 13, 2004 Ben Messaoud et al. 3

General context

1

2

3

4

5

0

Complex dataHuge volumes of complex dataWarehousing complex data …OLAP facts as complex objects

Analyze complex dataCurrent OLAP tools aren’t suited to process complex dataData mining is able to process complex data like images, texts, videos …

Coupling OLAP and data miningAnalyze complex data on-lineNew operator OpAC: Operator of Aggregation by Clustering (AHC)

Data mining OLAP

Complex data MDBMS

OpAC

Page 4: Presentation

November 13, 2004 Ben Messaoud et al. 4

Outline

Complex data and general context

Related work: Coupling OLAP and data mining

Objectives of the proposed operator

Formalization of the operator

Implementation and demonstration

Conclusion and future works

1

2

3

4

5

0

Page 5: Presentation

November 13, 2004 Ben Messaoud et al. 5

Three approaches for coupling OLAP and data miningFirst approach: Extending the query languages of decision support systemsSecond approach: Adapting multidimensional environment to classical data mining techniquesThird approach: Adapting data mining methods for multidimensional data

1

2

3

4

5

0

Related work

Data mining OLAP

DBMS

First approach

Second approach

Third approach

Page 6: Presentation

November 13, 2004 Ben Messaoud et al. 6

1

2

3

4

5

0

Data mining OLAP

Related work

These works proved that:Associating data mining to OLAP is a promising way to involve rich analysis tasksData mining is able to extend the analysis power of OLAP

Use data mining to enhance OLAP tools in order to process complex data

OpAC: A new OLAP operator based on a data mining technique

OpAC

Page 7: Presentation

November 13, 2004 Ben Messaoud et al. 7

ObjectivesClassic OLAP aggregation Vs OpAC aggregation

Classic OLAP:Summarizes numerical data in a fewer number of valuesComputes additive measures (Sum, Average, Max, Min …)

Example: Sales cube

+ Bellingham

+ Bremerton

+ Olympia

+ Redmond

+ Seattle

+ Berkeley

+ Beverly Hills

+ Los Angeles

$700

$400

$850

$250

$320

$820

$910

$680

32

20

44

9

15

41

50

38

Sales Count

- Washington

- California

$2520

$2410

Sales Count

+ Washington

+ California

120

129

$2520

$2410

Sales Count

+ Washington

+ California

120

129

1

2

3

4

5

0

Page 8: Presentation

November 13, 2004 Ben Messaoud et al. 8

Classic OLAP aggregation Vs OpAC aggregation

OpAC aggregation:What about aggregating complex objects?How to aggregate images, texts or videos with classic OLAP tools?Complex objects are not additive OLAP measures …

Orange coral

Nebraska, USA

Toco toucan

Maldives

Images Size

3560px

2340px

4434px

3260px

ASM

0,016

0,021

0,014

0,012

Example: Images cube

?

Objectives

1

2

3

4

5

0

Page 9: Presentation

November 13, 2004 Ben Messaoud et al. 9

How to aggregate complex objects?

Using a data mining technique: AHC (Agglomerative Hierarchical Clustering)

The AHC aggregates data

The hierarchical aspect of the AHC

Objectives

1

2

3

4

5

0

Page 10: Presentation

November 13, 2004 Ben Messaoud et al. 10

L1N

orm

aliz

ed f

or

hig

h h

om

ogeneit

y

L1Normalized for low entropy

Very

hig

h

Hig

h

Med

ium

Low

Very

low

Very high

HighMedium

LowVery low

Entr

opy

Homogeneity

Imag

es

Objectives

1

2

3

4

5

0

Page 11: Presentation

November 13, 2004 Ben Messaoud et al. 11

Formalization

1

2

3

4

5

0

Di : the ith dimension of a data cube C hij : the jth hirarchical level of the dimension Di

gijt : the tth modality of hij

gijt gijt hij

XXgijtMeasure of gsrv crossed with gijt

where gsrv hsr , s i and r is unique for each s

The set of individuals:

The set of variables:Dimension retained for individuals can’t generate variablesOnly one hierarchical level of a dimension is allowed to generate variables

Page 12: Presentation

November 13, 2004 Ben Messaoud et al. 12

Formalization

1

2

3

4

5

0Evaluation tools

Minimize the intra-cluster distancesMaximize the inter-cluster distances

Inter and intra-cluster inertia

A1 , A2 , …, Ak is a partition of PAi is the weight of Ai

GAi is the gravity center of Ai

Iintrak IAik

i=1

Iinterk PAidGAiGk

i=1

Page 13: Presentation

November 13, 2004 Ben Messaoud et al. 13V

ery

hig

h

Hig

h

Med

ium

Low

Very

low

Very high

HighMedium

LowVery low

Entr

opy

Homogeneity

1

2

3

4

5

0500

0

100

200

300

400

7 6 5 4 3 2 1

- Inter-clusters - Intra-cluster

Individuals: Modalities from the dimension of images

Variables:L1Normalized values of images for all possible modalities of the entropy dimensionL1Normalized values of images for all possible modalities of the homogeneity dimension

Formalization

Page 14: Presentation

November 13, 2004 Ben Messaoud et al. 14

Formalization

Results:

Exploits the cube’s facts describing images to construct groups of similar complex objects

Highlights significant groups of objects by a clustering technique

Clusters –aggregates- are defined both from dimensions and measures of a data cube

Implementation of a prototype

1

2

3

4

5

0

Page 15: Presentation

November 13, 2004 Ben Messaoud et al. 15

Implementation

1

2

3

4

5

0Prototype:

Data loading module: Connects to a data cube on Analysis Services of MS SQL ServerUses MDX queries to import information about the cube’s structureExtract data selected by the user

Parameter setting interface:Assists the user to extract individuals and variables from the cubeSelects modalities and measures Defines the clustering problem

Clustering module:Allows the definition of the clustering parameters like dissimilarity metric and aggregation criterionConstructs the AHCPlots the results of the AHC on a dendrogram

Page 16: Presentation

November 13, 2004 Ben Messaoud et al. 16

Implementation

1

2

3

4

5

0Images dataset:

3000 images collected from the web:

Semantic annotation: Description, subject and themeDescriptors of texture like:

ENT: EntropyCON: ContrastL1Normalized: Medium Color Characteristic…

Three color channels: RGB

Page 17: Presentation

November 13, 2004 Ben Messaoud et al. 17

Implementation

1

2

3

4

5

0Demonstration:

Page 18: Presentation

November 13, 2004 Ben Messaoud et al. 18

Conclusion

1

2

3

4

5

0 OpAC is a possible way to realize on-line analysis over complex data

OpAC aggregates complex objects

Aggregates –clusters- are defined from both dimensions and measures of a data cube

Prototype available at :http://bdd.univ-lyon2.fr/?page=logiciel&id=5

Page 19: Presentation

November 13, 2004 Ben Messaoud et al. 19

Future works

1

2

3

4

5

0The current evaluation tool may present some limits Use other evaluation indicators to evaluate the quality of partitions Assist user to find the best number of clusters

Exploit the aggregates generated by OpAC in order to reorganize the cube’s dimensions Get a new cube with remarkable regions

Use other data mining technique to enhance the OLAP power with explanation and prediction capabilities

Page 20: Presentation

November 13, 2004 Ben Messaoud et al. 20

The EndThe End