reference representation in large metamodel-based datasets

49
Markus Scheidgen Model representations for large meta-model based data-sets Introduction: Technological spaces and model representations Comparison of representation Implementation Application 1

Upload: markus-scheidgen

Post on 18-Dec-2014

282 views

Category:

Technology


4 download

DESCRIPTION

This presentation was held at the BigMDE Workshop (at STAF) in Budapest, 2013

TRANSCRIPT

Page 1: Reference Representation in Large Metamodel-based Datasets

Markus Scheidgen

Model representations for large meta-model based data-sets

■ Introduction: Technological spaces and model representations■ Comparison of representation ■ Implementation■ Application

1

Page 2: Reference Representation in Large Metamodel-based Datasets

Introduction:Technological Spaces

2

Software Models

Code

revers

e engin

eering

code genera

tion

XML

persistence / exchange

databases

persistence/versioning

processing (via ORMs: e.g. JPA)

Objects(e.g. POJOs)

debu

ggin

g/pr

ofilin

g

refle

ctio

n

runt

ime

mod

elin

g

processin

g (e.g

. dom/jaxb)

exchange

(e.g. i

n web-servic

es) xslt/xsl/xquery/xpath

model-transformation/-constraints/-queries

static analysis/compilation/refactoring

SQL

running programs

other data

othe

r dat

a other dataother d

ata other data

Page 3: Reference Representation in Large Metamodel-based Datasets

Introduction: State of the Art

3

Meta-ModelsModels

SchemasXML

GammarsCode

ClassesObjects

ER-SchemasRelational Data

*

visualization and editing by human users

processing in computer programs

exchange

large data-sets/persistence and querying

Page 4: Reference Representation in Large Metamodel-based Datasets

Introduction: New Class of DBMS

4

Meta-ModelsModels

SchemasXML

GammarsCode

ClassesObjects

ER-SchemasRelational Data

*

-Big Data

+

-Graphs

ER-SchemasBig Relational Data

?

Page 5: Reference Representation in Large Metamodel-based Datasets

Representation: Strategies

5

Object-by-object Fragments

Part-of-source Morsa, ( Java) XMI, EMF-Frag

Relations CDO ?

Refe

renc

es

Objects

Page 6: Reference Representation in Large Metamodel-based Datasets

Representation: Object-by-object vs. Fragmentation(considering traversal, theoretical results)

6

100 101 102 103 104 105 106

100

101

102

103

104

105

Number of loaded objects [l]

no fragmentation [f=m]

optimal fragmentation

total fragmentation [f=1]

Exec

utio

n tim

e [t]

(in

ms)

1e+001e+011e+021e+031e+041e+051e+06

Fragment size [f]

Page 7: Reference Representation in Large Metamodel-based Datasets

Representation: Object-by-object vs. Fragmentation(considering traversal, theoretical results vs. implementation)

7

100 101 102 103 104 105 106

100

101

102

103

104

105

Number of loaded objects [l]

no fragmentation [f=m]

optimal fragmentation

total fragmentation [f=1]

Exec

utio

n tim

e [t]

(in

ms)

1e+001e+011e+021e+031e+041e+051e+06

Fragment size [f]

100 101 102 103 104 105 106100

101

102

103

104

105

Number of loaded objects [l]

Exec

utio

n tim

e [t]

(in

ms)

1e+011e+021e+031e+041e+05

Fragment size [f]

optimal fragmentation

Page 8: Reference Representation in Large Metamodel-based Datasets

Representation: Object-by-object vs. Fragmentation(considering traversal, implementation with actual model)

■Model traversal of Grabats models with four different sizes and different characteristics

8

set0 set1 set2 set3 set40

1

2

3

4

5

6

7

8

XMI

CDO

Morsa

EMFFrag coarse

EMFFrag fine

no

t m

ea

su

red

– e

xtr

ap

ola

ted

no

t m

ea

su

red

– e

xtr

ap

ola

tedOb

jects

pe

r se

co

nd

(=

10

4)

set0 set1 set2 set3 set410

3

104

105

106

107

Nu

mb

er

of

fra

gm

en

ts

CDO/Morsa

EMFFrag coarse

EMFFrag fine

Page 9: Reference Representation in Large Metamodel-based Datasets

Representation: Object-by-object vs. Fragmentation(considering query, implementation with actual model)

■Query of Grabats models with four different sizes and different characteristics

9

set0 set1 set2 set3 set410

3

104

105

106

107

Nu

mb

er

of

fra

gm

en

ts

CDO/Morsa

EMFFrag coarse

EMFFrag fine

set0 set1 set2 set3 set40

50

100

150

200

250

300

350

Exe

cu

tio

n t

ime

(in

s)

XMI

CDO w/o SQL

CDO

Morsa w/o index

Morsa

EMFFrag coarse

EMFFrag fine

not m

easure

d –

extr

apola

ted

not m

easure

d –

extr

apola

ted

not m

easure

d –

extr

apola

ted

not m

easure

d –

extr

apola

ted

Page 10: Reference Representation in Large Metamodel-based Datasets

Representation: Part-of-source vs. Relations(real implementation, artificial model)

10

100 102 104 106

101

102

103

104

number of outgoing references

exec

utio

n tim

e in

ms

100 102 104 106

101

102

103

104

number of outgoing references

exec

utio

n tim

e in

ms

Part of source implementation Relation implementation with individual access

access of one outgoing referencetraversal of all outgoing references

access of one outgoing referencetraversal of all outgoing references

Page 11: Reference Representation in Large Metamodel-based Datasets

Representation: Part-of-source vs. Relations(real implementation, artificial model)

11

100 102 104 106

101

102

103

104

number of outgoing references

exec

utio

n tim

e in

ms

Part of source implementation

access of one outgoing referencetraversal of all outgoing references

100 102 104 106

101

102

103

104

number of outgoing references

exec

utio

n tim

e in

ms

Relation implementation with scanning

access of one outgoing referencetraversal of all outgoing references

Page 12: Reference Representation in Large Metamodel-based Datasets

1

2

3

4

Implementation: EMF-Fragments

12

map/reduce(hadoop)

“Share Nothing” Nodes(cluster, adhoc-network)

DFS (HDFS)

key-value-store(hbase)

structured datadata-sets

applications meta-model

structured datamodel transformations

Page 13: Reference Representation in Large Metamodel-based Datasets

Implementation: Datastore mapping

13

regular containment

metamodel

0

1

part of source fragmentation

relation based fragmentation

Page 14: Reference Representation in Large Metamodel-based Datasets

Implementation: Meta-mode-based declaration of representations

14

Project

Package

CompilationUnit

FieldMethod

Class

«fragments»

«fragments»

«fragments»

*

* *

*

*

*

Call«relation»

Page 15: Reference Representation in Large Metamodel-based Datasets

Implementation: Architecture

15

FragmentedModel extends Resource

ResourceSet

FObject extends EObject©UHÁHFWLYH�IHDWXUH�GHOHJDWLRQª

FStore extends EStore©VLQJOHWRQ��VWDWHOHVVª

ResourceSet

Fragment extends Resource

FInternalObject extends DynamicEObject

URIHandler

DataStore©GHULYHGª

©GHOHJDWHVª

©GHOHJDWHVª

*

*1

*

*

1

11

1GDWDEDVH

visi

ble

API

EMF-Fragments ClassesRegular EMF Classes

1EList

EObjectEList FValueSetList

*

1

*

Page 16: Reference Representation in Large Metamodel-based Datasets

Applications: Mining and Analyzing Software Repositories

■ Software repositories contain more information than the current software code:■ “developers who changed class/method/statement X also changed class/

method/statement Y”■ this information leads to knowledge about dependencies that cannot be

determined through static or even dynamic analysis■ this can be used to• predict/find bugs• understand/improve the code-base

■ dependency information should be stored as relational data

■ When a piece of software evolves, its metrics change. Such dynamic metrics describe software better than static code metrics. Could lead to a better assessment of methodologies or understanding of software engineering in general.

16

Page 17: Reference Representation in Large Metamodel-based Datasets

Applications: Mining and Analyzing Software Repositories

■ JGit: Java implementation of the Git version control system■ MoDisco: Reverse engineering framework for eclipse java

projects based on EMF■ EMF-Compare: Determines matches and differences between

models■ EMF-Fragments: My own framework for large models■ over 300 Git repositories with eclipse plug-ins that

constitute the whole eclipse foundation source base as “example” data-set

17

Page 18: Reference Representation in Large Metamodel-based Datasets

Applications: Model of a Software Repository

18

A B C

A

A B

A D

PB1.R1

B1.R2

B1.R3

B1.R4

B2.R1

B2.R2

A

A B

Repository

Revision Diff

CompilationUnit

Model

Package Class

...

* * * *

*

1

prevnext

JGit MoDisco

model

metamodel

usageInPackageAccess

*

package1

«relation,fragmentation»

«fragmentation» «relation,fragmentation»

«relation»

«fragmentation»

* * extends1

Page 19: Reference Representation in Large Metamodel-based Datasets

Summary■ Choosing the right representation makes a difference ■Meta-model-based declaration of representations works

(might not be good enough)■ There are applications that can benefit from different

representations

19

Object-by-object Fragments

Part-of-source Morsa, ( Java) XMI, EMF-Frag

Relations CDO ?

Refe

renc

es

Objects

Page 20: Reference Representation in Large Metamodel-based Datasets

Backup

20

Page 21: Reference Representation in Large Metamodel-based Datasets

Possible Approaches: Different Target Platforms

21

SchemasXML

*

-Big Data

-Graphs

BASE

CAP-Theorem1

1Eric A. Brewer: Towards robust distributed systems; 19th ACM Symposium on Principles of Distributed Computing, 20002K. Barmpis and D.S. Kolovos. Comparative Analysis of Data Persistence Technologies for Large-Scale Models. XM 2012

ORM

XMI

XMI+Resources

ER-SchemasRelational Data

ACID,structured data

ER-SchemasBig Relational Data

BASE,structured data

BASE,structured data

Big

*

ORM?

2

Page 22: Reference Representation in Large Metamodel-based Datasets

Possible Approaches: Different Types of Mapping

22

*

1Javier Espinazo-Pagán, Jesús Sánchez Cuadrado, Jesús García Molina: Morsa, A Scalable Approach for Persisting and Accessing Large Models; MoDELS 2011

per o

bject m

appin

g fragmentation

ER-SchemasRelational Data

fast query,slow traversal,slow entry,(fine transactions)

fast query,slow traversal,slow entry,(fine transactions)1

Big

*

per object m

apping

slow query,fast traversal,fast entry,(coarse trans.)

Big

*ER-SchemasBig Relational Data/

Page 23: Reference Representation in Large Metamodel-based Datasets

Fragmentation: Types of references

■ organizing large artifacts in different resources is already implemented in EMF■ resources are loaded if necessary, objects in unloaded

resources are represented by proxy objects■ objects in different resources (as all related objects) are

related through references, therefore models are fragmented along references■ EMF-Fragments automatically fragments large models based

on annotations in the meta-model■ resources are identified via URIs and can be serialized (e.g.

XMI), therefore resources can be stored in a key-value store

23

Page 24: Reference Representation in Large Metamodel-based Datasets

Fragmentation: Types of references

24

*normal

references

*«fragments»fragmenting

references

large value sets *

Page 25: Reference Representation in Large Metamodel-based Datasets

Applications

■ HWL sensor and network operation data (or experiment data in general)■ realtime persistence required ➜ fast data entry■ hierarchical structured data (different sensors and other data sources) ➜ meta-modeling■ queries for experiments, sensors, specific time periods ➜ only coarse simple queries■ traversal of larger sub-trees, mostly applications based on data aggregation■ actual demand for big-data depends on size of sensor network ➜ scalability

■ CityGML models (or geo-spatial data in general)■ standardized as XML-schemas ➜ XML based data■ special proprietary indexes (e.g. spacial indexes like R-trees) and corresponding queries■ rather query intense applications■ actual demand for big-data depends on LOL of the models ➜ scalability

■ Software Engineering■ Code/Model Version Control■ Mining Software Repositories (MSR)■ revisions of AST-trees and differences between AST-trees ➜ existing meta-model based frameworks (e.g. designed

for reverse engineering purposes)■ large number of revisions causes many large value sets■ queries for revisions, compilation-units ➜ rather coarse queries■ aggregations and statistics ➜ can be expressed in an OCL-like language■ immediate demand for processing in (at least smaller) clusters■ has to be mixed with relational data for some applications

25

Page 26: Reference Representation in Large Metamodel-based Datasets

Applications: Scientific Data

26

WSN

<xm

l? ..

. >

<xm

l? ..

. >

click *

*

xml-to-model

text-to-model*

Page 27: Reference Representation in Large Metamodel-based Datasets

Applications: CityGML

■ XML-based standard ➜ meta-models can be generated (1-to-1 mapping)■ different standards define XML-schemas that extend each

other: GML⇽CityGML⇽extensions■ transparent use of spacial indexes ■ map onto existing platforms (e.g. SpatialHadoop)■ use existing implementations and persist into the key-value

store

■ extensions to CityGML can be facilitated to reference CityGML-models as spatial context for sensor data

27

Page 28: Reference Representation in Large Metamodel-based Datasets

backup

28

Page 29: Reference Representation in Large Metamodel-based Datasets

Research Overview

29

WIRELESS SENSOR NETWORKS

DATA ANALYSIS FRAMEW

ORK

GEO INFORMATION SYSTEMS

sensor data

heterogenous networks

mesh-networks

cellular-networks

spatial dataregular databases

spatial databases

distributeddata stores

distributedanalysis

data homo-genisation

domain speci!c analysis languages

Page 30: Reference Representation in Large Metamodel-based Datasets

HWL: Commodity Hardware

30

Page 31: Reference Representation in Large Metamodel-based Datasets

31

Page 32: Reference Representation in Large Metamodel-based Datasets

‣120+ Nodes

‣indoor and outdoor

‣dense and sparse

‣short and long links

‣stationary and mobil nodes

Page 33: Reference Representation in Large Metamodel-based Datasets

‣120+ Nodes

‣indoor and outdoor

‣dense and sparse

‣short and long links

‣stationary and mobil nodes

Page 34: Reference Representation in Large Metamodel-based Datasets
Page 35: Reference Representation in Large Metamodel-based Datasets

1

2

3

4

6

7

8

9

stein

? m

10m

5 10

Richtung Groß-Berliner Damm

Richtung Institut

Markus Scheidgen: H

WL – A

High-Perform

ance Wireless Sensor R

esearch Netw

ork

35

Experiments: The Test Site

§ simplest case: two lane, newly paved road

§ spatially equally distributed nodes on both sides of the rode

§ 2x5 nodes§ homogeneous test-bed:

same nodes, equally calibrated, same stone ground

§ one camera to record control data

Page 36: Reference Representation in Large Metamodel-based Datasets

0 20 40 60 80 100 120 140 160 180 2000

50

100

150

200

250

300

350

400

450Single−sided Amplitude Spectrum

Frequency (Hz)

|Y(fr

)|

Channel ZChannel YChannel X

0 500 1000 1500 2000 2500 3000−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Time sample (1/400 sec)

Acce

lera

tor v

alue

Time signal of all 3 channels

Channel ZChannel YChannel X

Markus Scheidgen: H

WL – A

High-Perform

ance Wireless Sensor R

esearch Netw

ork

Experiments: Example Data

36

Amplitudes Frequencies

Page 37: Reference Representation in Large Metamodel-based Datasets
Page 38: Reference Representation in Large Metamodel-based Datasets

Markus Scheidgen: H

WL – A

High-Perform

ance Wireless Sensor R

esearch Netw

ork

Experiment: Algorithm

§ Similar to earthquake detection: comparison of short and long moving averages (S=0.2s, L=4s)

38

s

x

= xth acceleration value (1)

mavg(s

x

,W ) =

Px

i=x�W

s

i

W

(2)

s

x

= |sx

� avg(s

x

, L)| (3)

w

S

x

= mavg(s

x

, S) (4)

w

L

x

= mavg(s

x

, L) (5)

�w = w

S

x

� w

L

x

(6)

Page 39: Reference Representation in Large Metamodel-based Datasets

Data Management

39

Page 40: Reference Representation in Large Metamodel-based Datasets

Research Overview

40

WIRELESS SENSOR NETWORKS

DATA ANALYSIS FRAMEW

ORK

GEO INFORMATION SYSTEMS

sensor data

heterogenous networks

mesh-networks

cellular-networks

spatial dataregular databases

spatial databases

distributeddata stores

distributedanalysis

data homo-genisation

domain speci!c analysis languages

Page 41: Reference Representation in Large Metamodel-based Datasets

41

internetcellular

cellular

wifi

zigbee

zigbee

Technological Infrastructure

Page 42: Reference Representation in Large Metamodel-based Datasets

Logical Infrastructure

actions

visualization

sensors

information

Page 43: Reference Representation in Large Metamodel-based Datasets

43

internetcellular

cellular

wifi

zigbee

zigbee

information/knowledge

distributed programming models

data bases

data representation

algorithmsprocesses

programming languages

CPUs

machine code radios

network protocols

hard drives

gene

ric

dom

ain

spec

ific

software engineering

algorithmsprocesses

programming languages

information/knowledge

distributed programming models

data bases

data representation

DSL

Page 44: Reference Representation in Large Metamodel-based Datasets

Complex Data Types

44

➡ complex data structures➡ lots of links between data objects➡ evolving structures➡ requires a type safe programming

environment that proliferates re-use

Page 45: Reference Representation in Large Metamodel-based Datasets

Large Amounts of Data

45

➡ a certain amount of data needs to be stored per second (HWL: 120 nodes)

~140x103 data objects per second~7MB/s serialized

➡ a certain amount of data needs to be stored all together (24h)

~12x109 data objects~600GB serialized

➡ Data analysis must complete in reasonable time. For live applications in real time.

Page 46: Reference Representation in Large Metamodel-based Datasets

From Click to ClickWatch

46

Click API software

Element

Element

Element

CompoundHandler

Han

dler

Net

wor

k In

terf

ace

Page 47: Reference Representation in Large Metamodel-based Datasets

Complex Data Types: Meta-Modeling

47

This [ ] happens all the time in software modeling

state charts class diagrams MSCsOCL

context Fooself.properties-> foreach(a|a.x != a.y)

eclipse modeling framework (EMF)

➡ Distributed storage and links between different types of data is only a simple extension of existing technology: multi resource persistence is already implemented

Page 48: Reference Representation in Large Metamodel-based Datasets

“Share Nothing” Nodes(cluster, adhoc-network)

DFS (HDFS)

key-value-store1

(hbase)

Large Amounts of Data: Problem Statement

48

1. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. Bigtable: A distributed storage system for structured data (awarded best paper!). In Brian N. Bershad and Jeffrey C. Mogul, editors, OSDI, pages 205–218. USENIX Association, 2006.

2. Jeffrey Dean and Sanjay Ghemawat. Map/reduce: Simplified data processing on large clusters. In OSDI, pages 137–150. USENIX Association, 2004.

map/reduce2

(hadoop)

hierarchical data(XML, OGC standards)

data series(sensor data)

signal analysis, statistics, sensor-fusion

dom

ain

spec

ific

gene

ric

Page 49: Reference Representation in Large Metamodel-based Datasets

1

2

3

4

Large Amounts of Data: Approach

49

map/reduce(hadoop)

“Share Nothing” Nodes(cluster, adhoc-network)

DFS (HDFS)

key-value-store(hbase)

hierarchical data(XML, OGC standards)

data series(sensor data)

signal analysis, statistics, sensor-fusion meta-model

structured datamodel transformations