efficient model partitioning for distributed model transformations

Efficient Model Partitioning forDistributed Model Transformations

SLE’16, 1 Nov. 2016, Amsterdam, Netherlands

Amine BenelallamMassimo Tisi

AtlanMod teamNantes, France

Jesús Sánchez CuadradoJuan de Lara

Universidad Autónomade Madrid, Spain

Jordi Cabot

ICREAOpen University

of Catalonia, Spain

1

2

e

hgf

a

dcb

a

d

c

b

e

h

g

f

Distributed (MOF-Compliant) model access and persistence API

a

dcb g

e

hgf d

Coordination

Task node (worker) Data node Concurrent Read/Write

Data distribution Parallel local transformation Parallel global compositionS

plit

1S

plit

2

a

b c d

e

f g h

a

b c d

e

f g h

g

a a

g

e

d

e

d

System Assumps.

● On-demand loading, to ensure that only needed elements are loaded

● concurrent read/write to the persistence backend

● fast look-up of already loaded elements by using caching and/or indexing mechanisms

A.Benelallam et.al.: Distributed model-to-model transformation with ATL on MapReduce. In Proceedings of the 2015th ACM SIGPLAN Int. Conf. on SLE

3

What makes it different than for other distributed applications?

Model Partitioning for Distributed MTs

4

I need an example !!

Class2Relational

Atlanmod Transformation Language (ATL)

module Class2Relational;

create OUT : Relational from IN : Class ;rule Class2Table {

fromc : Class ! Class ( not c.isAbstract )

toout : Relational ! Table (

col <− Sequence { key } −>union ( c . attr−>select( e | not e.multiValued ) ) −>union ( c.assoc−>select ( e | not e.mvalued ) ) ,

keys <− Sequence { key } −>union ( c . assoc−>select ( e | not e.mvalued ) )),

key : Relational ! Column (name <− c.name+’objectId ’ ,type <− thisModule.getObjectIdType

)} [ … ]

Module

Rule

Input pattern

Output pattern

guard

ATL helperbinding

Class2Relational

6

Running example

7

Model elmt. Dependencies

p1 {p1, c1, a1, c2, t1}

c1 {c1, a1, att1, t1}

a1 {a1, c2}

att1 {att1, t1}

c2 {c2, att2, att3, t1}

att2 {att2, t1}

att3 {att3, t1}

t1 {t1}

p1 : Package

c1 : Classc2 : Class t1 : Type

att1 : Attributea1 : Assoc

att2 : Attributeatt3 : Attribute

Partitioning: Scenario I

8Distributed (MOF-Compliant) model access API

Task nodesInput modelModel elmt. Dependencies

p1 {p1, c1, a1, c2, t1}

c1 {c1, a1, att1, t1}

a1 {a1, c2}

att1 {att1, t1}

c2 {c2, att2, att3, t1}

att2 {att2, t1}

att3 {att3, t1}

t1 {t1}

p1

att1

a1

c1

c2

t1

att3

att2




p1 {p1, c1, a1, c2, t1}

c1 {c1, a1, att1, t1}

a1 {a1, c2}

att1 {att1, t1}

c2 {c2, att2, att3, t1}

att2 {att2, t1}

att3 {att3, t1}

t1 {t1}

p1

att1 t1

c2

a1c1 att3

att2

p1

att1

a1

c1

c2

t1

att3

att2




p1 {p1, c1, a1, c2, t1}

c1 {c1, a1, att1, t1}

a1 {a1, c2}

att1 {att1, t1}

c2 {c2, att2, att3, t1}

att2 {att2, t1}

att3 {att3, t1}

t1 {t1}

p1

att1

a1c1

t1

att3

att2

c2

att1

a1c1

t1

att3

att2

c2

8 + 7 = 15

p1

att1

a1

c1

c2

t1

att3

att2

Partitioning: Scenario II


Task nodesInput model

p1

att1

a1c1

t1

c2

Model elmt. Dependencies

p1 {p1, c1, a1, c2, t1}

c1 {c1, a1, att1, t1}

a1 {a1, c2}

att1 {att1, t1}

c2 {c2, att2, att3, t1}

att2 {att2, t1}

att3 {att3, t1}

t1 {t1}

c2

t1att3

att2

6 + 4 = 10 (↑%33)

p1

att1

a1

c1

c2

t1

att3

att2

# 1 Dense Structure● Even though models are structured:

● Their density is often high & irregular● The structure of the computation is

only known @runtime

# 2 Variating complexity

● Graph computations is often data-driven and dictated by the structure of the graph

● Irregular computation structure => Irregular computation cost

12

Simple Complex

Highly-dense

Model-data partitioning

13

● Access patterns tend to have poor data locality

● High data access to computation ratio

● Guarantee a balanced computational load

● Ensure a good data locality

Difficult to

x Proposal

14

● Existing graph-data partitioning approaches are not suitable, they either:

a. Assume that the dependency graph exists

b. Reason only on the vertex-connectivity

● We Propose a two steps approach:

I- Footprint extraction

15

● Extract access patterns as sequences of steps● Resulting footprints have the form:

[sourceType][. ‘( ‘?[propertyName][ ‘ ) ‘ ?∗]?]+● Parse OCL expressions in guards, bindings, and

Helpers● Visit OCL’s AST and perform one of the

following unary|binary operations:○ ⊲ : chain the naviagtionCallExp○ ⊕ : decouple the LHS and RHS into two

separate footprints (e.g. conditional expression)

○ Ⓧ : if RHS is accessible from LHS then ⊲ otherwise ⊕ (e.g. select)

● Organize footprints by sourceType

16

:OpCallExp (flatten)

:IteratorExp (collect)

:IteratorExp (reject)

:IteratorExp (select)

:NavCallExp(assoc)

:VarExp(cc)

:AttrCallExp(multiValued)

:AttrCallExp(isAbstract)

:NavCallExp(classes)

:VarExp(p)

:VarExp(c)

:VarExp(a)

Footprint extraction


p.classes -> reject (c | c.isAbstract) -> collect (cc | cc.assoc -> select (a | a.mValued)) -> flatten();

17





:NavCallExp(assoc)

:VarExp(cc)




:VarExp(p)

:VarExp(c)

:VarExp(a)




FP= {Package.classes}

FP= { Package(p) }

⊲

18





:NavCallExp(assoc)

:VarExp(cc)




:VarExp(p)

:VarExp(c)

:VarExp(a)




FP= { Package.classes}

FP= {Class(c)}

⊲

Ⓧ


FP= { Package(p) }

⊲

19





:NavCallExp(assoc)

:VarExp(cc)




:VarExp(p)

:VarExp(c)

:VarExp(a)





FP= {Class(c)}

⊲

Ⓧ


FP= { Package(p) }

⊲

FP= { Class.ass }

FP= { Package.classes.ass }

FP= { Class(cc) }

FP= { Attribute(a) }

FP= {Class.ass}

⊲ ⊲

Ⓧ

20





:NavCallExp(assoc)

:VarExp(cc)




:VarExp(p)

:VarExp(c)

:VarExp(a)



FP= { Class.ass }



FP= { Package(p) }

FP= {Class(c)}

FP= { Class(cc) }

FP= { Attribute(a) }

FP= {Class.ass}

⊲ ⊲ ⊲ ⊲

Ⓧ

Ⓧ Ⓧ




I- Resulting Footprints

21

Rules Footprints

Package2Schema Package.classes.assocPackage.types

Class2TableClass.assocClass.attrDataType.allInstances

Attribute2Column Attribute.type

MVAttribute2Column Attribute.typeAttribute.owner

Association2Column DataType.allInstances

MVAssociation2Column Association.typeDataType.allInstances

Types Footprints

Package Package.classes.assocPackage.types

ClassClass.assocClass.attrDataType.allInstances

Attribute Attribute.typeAttribute.owner

Association Association.typeDataType.allInstances

Type Ø

II- Model partitioning

22

p1 : Package




● Greedy & bi-objective algorithm

a. Maximizing data locality

b. Balancing the machine load

● On-live approximation of dependency graph in the form of <machine-id,nextStep>

● A buffer to delay the processing of elements not participating to the construction of the approximate dependency graph

● Instant assignment based on a score function

23


p1 : Package




24

p1att3a1 c1 c2 t1att1att2

Input stream elmt. Per Machine Dependencies

c1

c2

t1

att2

a1

Types Footprints


ClassClass.assoc, Class.attrDataType.allInstances



Type Ø

Buffer


● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package




25

p1

att3a1 c1 c2 t1att1att2


c1 {<1,assoc>; }

c2 {<1,assoc>; }

t1

att2

a1

Types Footprints





Type Ø

Buffer


● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package




26

p1

att3a1 c1 c2 att1att2


c1 {<1,assoc>; }

c2 {<1,assoc>; }

t1

att2

a1

Types Footprints





Type Ø

Buffer


● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package




t1

27

p1

att3a1 c1 c2

t1

att1

att2


c1 {<1,assoc>; <2,Ø>}

c2 {<1,assoc>; }

t1 <2,Ø>

att2

a1

Types Footprints





Type Ø

Buffer


● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package




28

p1

att3a1 c1

c2t1

att1

att2


c1 {<1,assoc>; <2,Ø>}

c2 {<1,assoc>}

t1 <2,Ø>

att2 {<1,Ø>}

att3 {<1,Ø>}

a1

Types Footprints





Type Ø

Buffer


● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package




29

p1

att3

a1

c1

c2t1

att1

att2


c1 {<1,assoc>; <2,Ø>}

c2 {<1,assoc>}

t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}

att2 {<1,Ø>}

att3 {<1,Ø>}

a1 {<2,Ø>}

Types Footprints





Type Ø

Buffer


● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package




30

p1

att3

a1

c1

c2t1

att1

att2


c1 {<1,assoc>; <2,Ø>}

c2 {<1,assoc>}

t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}

att2 {<1,Ø>}

att3 {<1,Ø>}

a1 {<2,Ø>}

Types Footprints





Type Ø

Buffer


● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package




31

Input stream

Buffer

p1

att1

c2 att2

c1

att3

a1

t1

c1

a1

c2

t1

7 + 5 = 12 (↑%20)


● Parameters

● avgSize = 4

● var = 2

● buffCap = 2

p1 : Package




elmt. Per Machine Dependencies

c1 {<1,assoc>; <2,Ø>}

c2 {<1,assoc>}

t1 {<2,Ø>;<1,Ø>;<1,Ø>;<1,Ø>}

att2 {<1,Ø>}

att3 {<1,Ø>}

a1 {<2,Ø>}

Types Footprints





Type Ø

Evaluation

32

Eclipse Modeling Framework

NeoEMF/HBase

HDFS

XML Metadata Interchange

ATL-MR

Hadoop Task nodes

ATL-MR Master ATL-MR Slaves

Hadoop D

ata nodes

1. Distribute input

2. Monitor

3. Return output

Evaluation results

33

34

Limitations● The performance of our approach may be reduced

when

a. having elements with big amounts of dependencies or

sometimes exceeding the average size of the split (e.g.

allInstances() operation)

b. having approximate graph containing false positive

dependencies (e.g. select() or reject())

c. having an unfavourable order of streamed elements.

Conclusion● We presented our solution for efficient partitioning of distributed MTs as a

greedy algorithm.

○ We introduced an algorithm for the footprints extraction

○ We presented our greedy algorithm for stream model partitioning

○ We experimentally show the scalability of our solution (up to 16% in average)

● In future work we plan to:

○ Extending our work to balanced edge partitioning and conducting a more exhaustive study

on the impact of the model density on the partitioning strategy.

○ Improving the distribution of the intermediate transformation data (tracing information)

35

Questions

Check us out on githubhttps://github.com/atlanmod/ATL_MR

36

http://www.freepik.com/

http://www.flaticon.com/authors/yannick

http://www.flaticon.com/

efficient model partitioning for distributed model transformations

Presentations & Public Speaking