design of experiments on federator polystore architecture

18
Design of Experiments on Federator Polystore Architecture Luiz Henrique Zambom Santana

Upload: luiz-henrique-zambom-santana

Post on 13-Feb-2017

360 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Design of Experiments on Federator Polystore Architecture

Design of Experiments onFederator Polystore Architecture

Luiz Henrique Zambom Santana

Page 2: Design of Experiments on Federator Polystore Architecture

Agenda● Introduction● Federator Polystore● Experiment Design● Data and experiment● Conclusions● References

Page 3: Design of Experiments on Federator Polystore Architecture

“No one size fits all”● “A panoply of data models, and they typically operate on flexible storage

formats such as JSON” [1]● “Increasingly, we see applications that deploy multiple engines, resulting in a

need to join data across systems.” [1]● “Increasingly, desktop and mobile applications are using the cloud

infrastructure to take advantage of the high-availability and scalability characteristics. In the past, these type of systems used local databases to store information and application state. There are many new applications that share some or all their data with applications running on other hosts or in the cloud and use these data stores for persistence.” [2]

Page 4: Design of Experiments on Federator Polystore Architecture

Federation Polystore

API

Rendezvous

NoSQL NoSQL

RDF + Sparql

Federator

Canonical Model

Application 1 Application 2 Application N

Must beROBUSTto delivery<1 sec.response

Page 5: Design of Experiments on Federator Polystore Architecture

Very complex architecture

{\"user\":{\"username\":\"luiz\",\"password\":\"luiz\",\"address\":\"lagoa\"}}

{\"post\":{\"body\":\"hello\",\"author\":\"12345"\"}}

Client FederatorMongoDB

Cassandra

InsertEndpoint

Mappings

Planner PlanAccess 1

Access .. N

{“address”:”lagoa”}{“body”:”hello”}

Transaction management

INSERT INTO user..INSERT INTO post...

{12345}

{123456} EntityManager

{12345} + Meta

{12345} + Entire document

Cache(graph)

PersistentHashMap

DictionaryReader

Execution Flow

Page 6: Design of Experiments on Federator Polystore Architecture

Experiment Design● Measure the response time● Factors and levels

○ Architecture■ Local■ Mixed■ Fully distributed

○ Protocol■ HTTP■ HTTPS■ Thrifty

○ Mapping■ Social Network (graph)■ E-commerce (key/value)■ Bank (columnar)

○ Cache■ Bloom filter■ Hash■ None

Robust to the mappping

Page 7: Design of Experiments on Federator Polystore Architecture

Experiment Design● Experiment 34 with 100 replications

○ Four factors○ Three levels each

● Goals:○ The architecture must be robust to delivery response in <1 second in both big and small

installations

○ The architecture must be robust to mapping variation, because one of the most important results of my Doctoral work is to have a real-time response.

● Data collection:○ 100 accesses with 4 blocks: insert, update, query, and delete, 100ms delay between each

access○ Each block with 81 experiments, 8100 queries○ Data generated synthetically, based on LUBM

Page 8: Design of Experiments on Federator Polystore Architecture

Experiment executionhttps://github.com/lhzsantana/federator

Page 9: Design of Experiments on Federator Polystore Architecture

Hipothesis● A, B, C e D● H0: the response time for all the instalations are the same● ...● ABCD

○ H0: changing the factors generates the same effect in all the installations

Page 10: Design of Experiments on Federator Polystore Architecture

Creating the Design in R

Page 11: Design of Experiments on Federator Polystore Architecture

Creating the design in R● Linear model

○ > Design.1 <- fac.design(nfactors= 4 ,replications= 100 ,repeat.only= FALSE ,blocks= 3 ,randomize= TRUE , seed= 29063 ,nlevels=c( 3,3,3,3 ), factor.names=list(architecture=c("local","mixed","distributed"), protocol=c("http","https","thrifty"), mapping=c("socialnetwork","ecommerce","bank"), cache=c("bloom","hash","none") ) )

● Removed one of the blocks because of “Too few factors with even number of factor levels for this number of blocks”,choosed to remove “update” because it can be implemented with the other operations

Page 12: Design of Experiments on Federator Polystore Architecture

Data distribution

Page 13: Design of Experiments on Federator Polystore Architecture

Creating the design in R● > Design.1 <- add.response(Design.1,data, replace=FALSE)● > LinearModel.1 <- lm(response ~ Blocks + (architecture + protocol + mapping + cache)^2, data=Design.1)● > anova(LinearModel.1)

Page 14: Design of Experiments on Federator Polystore Architecture
Page 15: Design of Experiments on Federator Polystore Architecture

Plot of means

The more distributed, the better thrifty is

Hash better for local, and bloom filter for distribute

Bloom and thrifty seems to be the best mix

Page 16: Design of Experiments on Federator Polystore Architecture

Conclusions● Not a fractionary problem because there is virtually no limitation to the

number of replications● There are still unmapped sources of variability (internal databases caching,

network, testing machine)● Delete is not affected by the factors, or it not very well implemented● Query average is very high, also the variability is unacceptable● Local + Thrify + HashMap makes the architecture to use too much

memory● Somehow frustating...● The only good news: the mapping was not a relevant factor

Page 17: Design of Experiments on Federator Polystore Architecture

Future work● There is still a lot of work :)● Limitations of the test:

○ Mixed queries not tested yet (another block is necessary for real word testing)○ Fully distributed and parallel architecture is very expensive to test

● When I solve the “small” architecture variation, I will begin the more expensive tests (fully distributed)

● Real word example● Finally, I will repeat the tests to the Semantic Layer and compare how the

architecture behaves

Page 18: Design of Experiments on Federator Polystore Architecture

References1. Duggan, Jennie, et al. "The BigDAWG Polystore System." ACM SIGMOD

Record 44.2 (2015): 11-16.2. Dey, Akon, Alan Fekete, and Uwe Röhm. "Scalable transactions across

heterogeneous NoSQL key-value data stores." Proceedings of the VLDB Endowment 6.12 (2013): 1434-1439.

3. Berners-Lee, Tim, James Hendler, and Ora Lassila. "The semantic web." Scientific american 284.5 (2001): 28-37.

4. Montgomery, Douglas C., and Douglas C. Montgomery. Design and analysis of experiments. Vol. 7. New York: Wiley, 1984.

5. Barbetta, Pedro Alberto, Marcelo Menezes Reis, and Antonio Cezar Bornia. Estatística: para cursos de engenharia e informática. Vol. 3. São Paulo: Atlas, 2004.