efficient model partitioning for distributed model transformations

36
Efficient Model Partitioning for Distributed Model Transformations SLE’16, 1 Nov. 2016, Amsterdam, Netherlands Amine Benelallam Massimo Tisi AtlanMod team Nantes, France Jesús Sánchez Cuadrado Juan de Lara Universidad Autónoma de Madrid, Spain Jordi Cabot ICREA Open University of Catalonia, Spain 1

Upload: amine-benelallam

Post on 07-Jan-2017

142 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Page 1: Efficient Model Partitioning for Distributed Model Transformations

Efficient Model Partitioning forDistributed Model Transformations

SLE’16, 1 Nov. 2016, Amsterdam, Netherlands

Amine BenelallamMassimo Tisi

AtlanMod teamNantes, France

Jesús Sánchez CuadradoJuan de Lara

Universidad Autónomade Madrid, Spain

Jordi Cabot

ICREAOpen University

of Catalonia, Spain

1

Page 2: Efficient Model Partitioning for Distributed Model Transformations

2

e

hgf

a

dcb

a

d

c

b

e

h

g

f

Distributed (MOF-Compliant) model access and persistence API

a

dcb g

e

hgf d

Coordination

Task node (worker) Data node Concurrent Read/Write

Data distribution Parallel local transformation Parallel global compositionS

plit

1S

plit

2

a

b c d

e

f g h

a

b c d

e

f g h

g

a a

g

e

d

e

d

System Assumps.

● On-demand loading, to ensure that only needed elements are loaded

● concurrent read/write to the persistence backend

● fast look-up of already loaded elements by using caching and/or indexing mechanisms

A.Benelallam et.al.: Distributed model-to-model transformation with ATL on MapReduce. In Proceedings of the 2015th ACM SIGPLAN Int. Conf. on SLE

Page 3: Efficient Model Partitioning for Distributed Model Transformations

3

What makes it different than for other distributed applications?

Model Partitioning for Distributed MTs

Page 4: Efficient Model Partitioning for Distributed Model Transformations

4

I need an example !!

Class2Relational

Page 5: Efficient Model Partitioning for Distributed Model Transformations

Atlanmod Transformation Language (ATL)

module Class2Relational;

create OUT : Relational from IN : Class ;rule Class2Table {

fromc : Class ! Class ( not c.isAbstract )

toout : Relational ! Table (

col <− Sequence { key } −>union ( c . attr−>select( e | not e.multiValued ) ) −>union ( c.assoc−>select ( e | not e.mvalued ) ) ,

keys <− Sequence { key } −>union ( c . assoc−>select ( e | not e.mvalued ) )),

key : Relational ! Column (name <− c.name+’objectId ’ ,type <− thisModule.getObjectIdType

)} [ … ]

Module

Rule

Input pattern

Output pattern

guard

ATL helperbinding

Page 6: Efficient Model Partitioning for Distributed Model Transformations

Class2Relational

6

Page 7: Efficient Model Partitioning for Distributed Model Transformations

Running example

7

Model elmt. Dependencies

p1 {p1, c1, a1, c2, t1}

c1 {c1, a1, att1, t1}

a1 {a1, c2}

att1 {att1, t1}

c2 {c2, att2, att3, t1}

att2 {att2, t1}

att3 {att3, t1}

t1 {t1}

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

Page 8: Efficient Model Partitioning for Distributed Model Transformations

Partitioning: Scenario I

8Distributed (MOF-Compliant) model access API

Task nodesInput modelModel elmt. Dependencies

p1 {p1, c1, a1, c2, t1}

c1 {c1, a1, att1, t1}

a1 {a1, c2}

att1 {att1, t1}

c2 {c2, att2, att3, t1}

att2 {att2, t1}

att3 {att3, t1}

t1 {t1}

p1

att1

a1

c1

c2

t1

att3

att2

Page 9: Efficient Model Partitioning for Distributed Model Transformations

Partitioning: Scenario I

9Distributed (MOF-Compliant) model access API

Task nodesInput modelModel elmt. Dependencies

p1 {p1, c1, a1, c2, t1}

c1 {c1, a1, att1, t1}

a1 {a1, c2}

att1 {att1, t1}

c2 {c2, att2, att3, t1}

att2 {att2, t1}

att3 {att3, t1}

t1 {t1}

p1

att1 t1

c2

a1c1 att3

att2

p1

att1

a1

c1

c2

t1

att3

att2

Page 10: Efficient Model Partitioning for Distributed Model Transformations

Partitioning: Scenario I

10Distributed (MOF-Compliant) model access API

Task nodesInput modelModel elmt. Dependencies

p1 {p1, c1, a1, c2, t1}

c1 {c1, a1, att1, t1}

a1 {a1, c2}

att1 {att1, t1}

c2 {c2, att2, att3, t1}

att2 {att2, t1}

att3 {att3, t1}

t1 {t1}

p1

att1

a1c1

t1

att3

att2

c2

att1

a1c1

t1

att3

att2

c2

8 + 7 = 15

p1

att1

a1

c1

c2

t1

att3

att2

Page 11: Efficient Model Partitioning for Distributed Model Transformations

Partitioning: Scenario II

11Distributed (MOF-Compliant) model access API

Task nodesInput model

p1

att1

a1c1

t1

c2

Model elmt. Dependencies

p1 {p1, c1, a1, c2, t1}

c1 {c1, a1, att1, t1}

a1 {a1, c2}

att1 {att1, t1}

c2 {c2, att2, att3, t1}

att2 {att2, t1}

att3 {att3, t1}

t1 {t1}

c2

t1att3

att2

6 + 4 = 10 (↑%33)

p1

att1

a1

c1

c2

t1

att3

att2

Page 12: Efficient Model Partitioning for Distributed Model Transformations

# 1 Dense Structure● Even though models are structured:

● Their density is often high & irregular● The structure of the computation is

only known @runtime

# 2 Variating complexity

● Graph computations is often data-driven and dictated by the structure of the graph

● Irregular computation structure => Irregular computation cost

12

Simple Complex

Highly-dense

Page 13: Efficient Model Partitioning for Distributed Model Transformations

Model-data partitioning

13

● Access patterns tend to have poor data locality

● High data access to computation ratio

● Guarantee a balanced computational load

● Ensure a good data locality

Difficult to

Page 14: Efficient Model Partitioning for Distributed Model Transformations

x Proposal

14

● Existing graph-data partitioning approaches are not suitable, they either:

a. Assume that the dependency graph exists

b. Reason only on the vertex-connectivity

● We Propose a two steps approach:

Page 15: Efficient Model Partitioning for Distributed Model Transformations

I- Footprint extraction

15

● Extract access patterns as sequences of steps● Resulting footprints have the form:

[sourceType][. ‘( ‘?[propertyName][ ‘ ) ‘ ?∗]?]+● Parse OCL expressions in guards, bindings, and

Helpers● Visit OCL’s AST and perform one of the

following unary|binary operations:○ ⊲ : chain the naviagtionCallExp○ ⊕ : decouple the LHS and RHS into two

separate footprints (e.g. conditional expression)

○ Ⓧ : if RHS is accessible from LHS then ⊲ otherwise ⊕ (e.g. select)

● Organize footprints by sourceType

Page 16: Efficient Model Partitioning for Distributed Model Transformations

16

:OpCallExp (flatten)

:IteratorExp (collect)

:IteratorExp (reject)

:IteratorExp (select)

:NavCallExp(assoc)

:VarExp(cc)

:AttrCallExp(multiValued)

:AttrCallExp(isAbstract)

:NavCallExp(classes)

:VarExp(p)

:VarExp(c)

:VarExp(a)

Footprint extraction

I- Footprint extraction

p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();

Page 17: Efficient Model Partitioning for Distributed Model Transformations

17

:OpCallExp (flatten)

:IteratorExp (collect)

:IteratorExp (reject)

:IteratorExp (select)

:NavCallExp(assoc)

:VarExp(cc)

:AttrCallExp(multiValued)

:AttrCallExp(isAbstract)

:NavCallExp(classes)

:VarExp(p)

:VarExp(c)

:VarExp(a)

Footprint extraction

I- Footprint extraction

p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();

FP= {Package.classes}

FP= { Package(p) }

Page 18: Efficient Model Partitioning for Distributed Model Transformations

18

:OpCallExp (flatten)

:IteratorExp (collect)

:IteratorExp (reject)

:IteratorExp (select)

:NavCallExp(assoc)

:VarExp(cc)

:AttrCallExp(multiValued)

:AttrCallExp(isAbstract)

:NavCallExp(classes)

:VarExp(p)

:VarExp(c)

:VarExp(a)

Footprint extraction

I- Footprint extraction

p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();

FP= { Package.classes}

FP= {Class(c)}

FP= {Package.classes}

FP= { Package(p) }

Page 19: Efficient Model Partitioning for Distributed Model Transformations

19

:OpCallExp (flatten)

:IteratorExp (collect)

:IteratorExp (reject)

:IteratorExp (select)

:NavCallExp(assoc)

:VarExp(cc)

:AttrCallExp(multiValued)

:AttrCallExp(isAbstract)

:NavCallExp(classes)

:VarExp(p)

:VarExp(c)

:VarExp(a)

Footprint extraction

I- Footprint extraction

p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();

FP= { Package.classes}

FP= {Class(c)}

FP= {Package.classes}

FP= { Package(p) }

FP= { Class.ass }

FP= { Package.classes.ass }

FP= { Class(cc) }

FP= { Attribute(a) }

FP= {Class.ass}

⊲ ⊲

Page 20: Efficient Model Partitioning for Distributed Model Transformations

20

:OpCallExp (flatten)

:IteratorExp (collect)

:IteratorExp (reject)

:IteratorExp (select)

:NavCallExp(assoc)

:VarExp(cc)

:AttrCallExp(multiValued)

:AttrCallExp(isAbstract)

:NavCallExp(classes)

:VarExp(p)

:VarExp(c)

:VarExp(a)

FP= { Package.classes.ass }

FP= { Package.classes}

FP= { Class.ass }

FP= { Package.classes.ass }

FP= {Package.classes}

FP= { Package(p) }

FP= {Class(c)}

FP= { Class(cc) }

FP= { Attribute(a) }

FP= {Class.ass}

⊲ ⊲ ⊲ ⊲

Ⓧ Ⓧ

Footprint extraction

I- Footprint extraction

p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();

Page 21: Efficient Model Partitioning for Distributed Model Transformations

I- Resulting Footprints

21

Rules Footprints

Package2Schema Package.classes.assocPackage.types

Class2TableClass.assocClass.attrDataType.allInstances

Attribute2Column Attribute.type

MVAttribute2Column Attribute.typeAttribute.owner

Association2Column DataType.allInstances

MVAssociation2Column Association.typeDataType.allInstances

Types Footprints

Package Package.classes.assocPackage.types

ClassClass.assocClass.attrDataType.allInstances

Attribute Attribute.typeAttribute.owner

Association Association.typeDataType.allInstances

Type Ø

Page 22: Efficient Model Partitioning for Distributed Model Transformations

II- Model partitioning

22

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

● Greedy & bi-objective algorithm

a. Maximizing data locality

b. Balancing the machine load

● On-live approximation of dependency graph in the form of <machine-id,nextStep>

● A buffer to delay the processing of elements not participating to the construction of the approximate dependency graph

● Instant assignment based on a score function

Page 23: Efficient Model Partitioning for Distributed Model Transformations

23

II- Model partitioning

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

Page 24: Efficient Model Partitioning for Distributed Model Transformations

24

p1att3a1 c1 c2 t1att1att2

Input stream elmt. Per Machine Dependencies

c1

c2

t1

att2

a1

Types Footprints

Package Package.classes.assocPackage.types

ClassClass.assoc, Class.attrDataType.allInstances

Attribute Attribute.typeAttribute.owner

Association Association.typeDataType.allInstances

Type Ø

Buffer

II- Model partitioning

● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

Page 25: Efficient Model Partitioning for Distributed Model Transformations

25

p1

att3a1 c1 c2 t1att1att2

Input stream elmt. Per Machine Dependencies

c1 {<1,assoc>; }

c2 {<1,assoc>; }

t1

att2

a1

Types Footprints

Package Package.classes.assocPackage.types

ClassClass.assoc, Class.attrDataType.allInstances

Attribute Attribute.typeAttribute.owner

Association Association.typeDataType.allInstances

Type Ø

Buffer

II- Model partitioning

● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

Page 26: Efficient Model Partitioning for Distributed Model Transformations

26

p1

att3a1 c1 c2 att1att2

Input stream elmt. Per Machine Dependencies

c1 {<1,assoc>; }

c2 {<1,assoc>; }

t1

att2

a1

Types Footprints

Package Package.classes.assocPackage.types

ClassClass.assoc, Class.attrDataType.allInstances

Attribute Attribute.typeAttribute.owner

Association Association.typeDataType.allInstances

Type Ø

Buffer

II- Model partitioning

● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

t1

Page 27: Efficient Model Partitioning for Distributed Model Transformations

27

p1

att3a1 c1 c2

t1

att1

att2

Input stream elmt. Per Machine Dependencies

c1 {<1,assoc>; <2,Ø>}

c2 {<1,assoc>; }

t1 <2,Ø>

att2

a1

Types Footprints

Package Package.classes.assocPackage.types

ClassClass.assoc, Class.attrDataType.allInstances

Attribute Attribute.typeAttribute.owner

Association Association.typeDataType.allInstances

Type Ø

Buffer

II- Model partitioning

● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

Page 28: Efficient Model Partitioning for Distributed Model Transformations

28

p1

att3a1 c1

c2t1

att1

att2

Input stream elmt. Per Machine Dependencies

c1 {<1,assoc>; <2,Ø>}

c2 {<1,assoc>}

t1 <2,Ø>

att2 {<1,Ø>}

att3 {<1,Ø>}

a1

Types Footprints

Package Package.classes.assocPackage.types

ClassClass.assoc, Class.attrDataType.allInstances

Attribute Attribute.typeAttribute.owner

Association Association.typeDataType.allInstances

Type Ø

Buffer

II- Model partitioning

● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

Page 29: Efficient Model Partitioning for Distributed Model Transformations

29

p1

att3

a1

c1

c2t1

att1

att2

Input stream elmt. Per Machine Dependencies

c1 {<1,assoc>; <2,Ø>}

c2 {<1,assoc>}

t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}

att2 {<1,Ø>}

att3 {<1,Ø>}

a1 {<2,Ø>}

Types Footprints

Package Package.classes.assocPackage.types

ClassClass.assoc, Class.attrDataType.allInstances

Attribute Attribute.typeAttribute.owner

Association Association.typeDataType.allInstances

Type Ø

Buffer

II- Model partitioning

● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

Page 30: Efficient Model Partitioning for Distributed Model Transformations

30

p1

att3

a1

c1

c2t1

att1

att2

Input stream elmt. Per Machine Dependencies

c1 {<1,assoc>; <2,Ø>}

c2 {<1,assoc>}

t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}

att2 {<1,Ø>}

att3 {<1,Ø>}

a1 {<2,Ø>}

Types Footprints

Package Package.classes.assocPackage.types

ClassClass.assoc, Class.attrDataType.allInstances

Attribute Attribute.typeAttribute.owner

Association Association.typeDataType.allInstances

Type Ø

Buffer

II- Model partitioning

● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

Page 31: Efficient Model Partitioning for Distributed Model Transformations

31

Input stream

Buffer

p1

att1

c2 att2

c1

att3

a1

t1

c1

a1

c2

t1

7 + 5 = 12 (↑%20)

II- Model partitioning

● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

elmt. Per Machine Dependencies

c1 {<1,assoc>; <2,Ø>}

c2 {<1,assoc>}

t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}

att2 {<1,Ø>}

att3 {<1,Ø>}

a1 {<2,Ø>}

Types Footprints

Package Package.classes.assocPackage.types

ClassClass.assoc, Class.attrDataType.allInstances

Attribute Attribute.typeAttribute.owner

Association Association.typeDataType.allInstances

Type Ø

Page 32: Efficient Model Partitioning for Distributed Model Transformations

Evaluation

32

Eclipse Modeling Framework

NeoEMF/HBase

HDFS

XML Metadata Interchange

ATL-MR

Hadoop Task nodes

ATL-MR Master ATL-MR Slaves

Hadoop D

ata nodes

1. Distribute input

2. Monitor

3. Return output

Page 33: Efficient Model Partitioning for Distributed Model Transformations

Evaluation results

33

Page 34: Efficient Model Partitioning for Distributed Model Transformations

34

Limitations● The performance of our approach may be reduced

when

a. having elements with big amounts of dependencies or

sometimes exceeding the average size of the split (e.g.

allInstances() operation)

b. having approximate graph containing false positive

dependencies (e.g. select() or reject())

c. having an unfavourable order of streamed elements.

Page 35: Efficient Model Partitioning for Distributed Model Transformations

Conclusion● We presented our solution for efficient partitioning of distributed MTs as a

greedy algorithm.

○ We introduced an algorithm for the footprints extraction

○ We presented our greedy algorithm for stream model partitioning

○ We experimentally show the scalability of our solution (up to 16% in average)

● In future work we plan to:

○ Extending our work to balanced edge partitioning and conducting a more exhaustive study

on the impact of the model density on the partitioning strategy.

○ Improving the distribution of the intermediate transformation data (tracing information)

35

Page 36: Efficient Model Partitioning for Distributed Model Transformations

Questions

Check us out on githubhttps://github.com/atlanmod/ATL_MR

36