rafał słota, michał wrzeszcz, renata g. słota, Łukasz dutka, jacek kitowski acc cyfronet agh...

15
Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków, Poland, October 26-28, 2015 Efficient Storing of Metadata for Distributed Data Management

Upload: laura-wilkerson

Post on 18-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski

ACC Cyfronet AGHDepartment of Computer Science, AGH - UST

CGW 2015Kraków, Poland, October 26-28, 2015

Efficient Storing of Metadata for Distributed Data Management

Page 2: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Distributed data management in global environmentonedata

System’s descriptionData and metadata organization

Metadata challenges in onedataAnalyzed solutionsProposed solution

Performance testsConclusions

Agenda

Page 3: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

28.10.15

Managing data over different storage solution in globally dispersed environments is hot topic.Global data management challenges are investigated by many research and commercial groups.

Distributed Data Management in Global Environment

Page 4: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

28.10.15

Onedata is a distributed data management system that virtualizes access to organizationally distributed data and hides environment’s complexity where there is no trust between resources providers.

Data and metadata organization is a key to provide:

easy view on data for each user,automatic data management for better efficiency.

Onedata – overall description

Page 5: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

28.10.15

Direct access whenever possible

Management of blocks’ replicas to minimize delays

Caching, prefetching and fast parallel transport

Onedata – work in distributed environment

Page 6: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Data organization

SpacesLogical

files Providers Storages

Users

Groups

Logical files organization via spaces separates users from problems connected with resources and data locations’ management.

Page 7: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Results of data organization design

Easy management and sharing of data for users.

Limitation of metadata that each provider stores

and processes.

Page 8: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Metadata organization

3 levels of metadata for data organization and usage description

1. Metadata used to

coordinate providers’

cooperation

2. Files metadata stored

by each provider

3. Current usage

metadata

Usage optimization

Lower level -> more frequent usage

-> higher distribution

Page 9: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Metadata challenges in onedata

Too slow storing of metadata when all metadata is stored on diskRisk of loosing important metadata when metadata is saved only in memory

Examples:metadata that describes location of actual data file has to be persistentmetadata that describes the way files are used by current sessions should be - at most - available as long as the session is active and be available extremely fast

Page 10: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Various solutionsIn-memory vs. persistent databasesStandalone vs. build-in applicationsExamples: Mnesia, Redis, Riak, Couchbase, Cassandra

No solution with all 3 features:SafetyHigh throughput (many operations per seconds)Low delay

Analysed solutions

Page 11: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Proposed solution - datastore

ModelsAPI that defines how specific types of metadata should be stored (e.g. in global memory)

StoresElements where data is kept

Worker with APISet of functionalities for data access optimization

Page 12: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Datastore key features and examples

Dynamic Cache SystemDatastore allows to set one store as cache for otherReads and writes are done on cacheWrites are aggregated and done asynchronousDynamic load/unload of data from cache when needed

Hooks for models cooperationSeparation of modelsEasy reaction for other models actions

Exemplary models: file_meta, session, task_pool

Page 13: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Performance tests

Speed vs. risk of metadata lossCache as compromise

Page 14: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Conclusions

For systems that globalize data access, efficient metadata management is key element. Proposed datastore provides flexible, efficient and safe solution for storing of metadata.Proposed datastore allows onedata to provide data access in a globally distributed environment.

Page 15: Rafał Słota, Michał Wrzeszcz, Renata G. Słota, Łukasz Dutka, Jacek Kitowski ACC Cyfronet AGH Department of Computer Science, AGH - UST CGW 2015 Kraków,

Thank youonedata homepage:

http://www.onedata.orgSee also:

Łukasz Dutka, Michał Wrzeszcz, Tomasz Lichoń, Rafał Słota, Konrad Zemek, Krzysztof Trzepla, Łukasz Opioła, Renata Słota, and Jacek Kitowski. Onedata - a Step Forward towards Globalization of Data Access for Computing Infrastructures, ICCS 2015 Computational Science at the Gates of Nature, Procedia Computer Science, volume 51, pages 2843–2847. 2015.

M. Wrzeszcz, T. Lichoń, R. Słota, K. Zemek, K. Trzepla, Ł. Opioła, D. Nikolow, Ł. Dutka, R. Słota and J. Kitowski, Metadata Organization and Management for Globalization of Data Access with onedata, PPAM 2015 : book of abstracts, 2015, pp. 31

MichałWrzeszcz,ŁukaszDutka,RenataSłota,andJacekKitowski.VeilFS-AnewfaceofStorage as a Service. In eChallenges e-2014, 2014 Conference, pages 1–10, Oct 2014.

Łukasz Dutka, Renata Słota, Michał Wrzeszcz, Dariusz Król, and Jacek Kitowski. Uniform and Efficient Access to Data in Organizationally Distributed Environments. eScience on Distributed Computing Infrastructure, volume 8500 of Lecture Notes in Computer Science, pages 178–194. Springer International Publishing, 2014.

Słota,R., Dutka,Ł ., Wrzeszcz,M. Kryza,B., Nikolow,D., Król, D., Kitowski, J.: Storage Management Systems for Organizationally Distributed Environments - PLGrid PLUS Case Study. Lecture Notes in Computer Science, Vol. 8384, 2014, pp. 724–733.