Transcript

Bringing cloud technology to distributed data infrastructures EGI CF 2013 Martin Hellmich (presenter) Jedrzej Rybicki Maciej Brzeniak Date : A bit of context 2 Towards a pan-European Collaborative Data Infrastructure Production Services Safe Replication Data Staging Metadata AAI Research & Development Scalable Federation Architectures Data Preservation Data Access and Transfer Workflows Three Projects Cloud storage integration iRODS managing an OpenStack Swift backend Extending DPM with S3 storage In-storage processing Call Hadoop jobs from iRODS 3 My Goal Show the projects Find interest in the communities (we are interdisciplinary) Start discussion about cloud integration Backend or frontend? Outsource or restructure? Where are limitations? 4 The Cloud Integration Projects iRODS-OpenStack Expose existing S3/OpenStack storage (managed otherwise) iRODS frontend protocols Local storage as cache 5 DPM-S3 Add new storage to DPM Expose HTTP only (but grid-aware, X509, VOMS) Outsource storage and network traffic iRODS-OpenStack Swift Maciej Brzezniak Date : Sidestep: iRODS compound resources 7 iRODS resources: Cache Archive Virtual iRODS compound resources: Virtual resource Maps from PUT/GET to POSIX Provides a cache iRODS managing an S3 backend Ingredients: iRODS server S3 Driver (in C) iRODS-S3 Driver Glue Swift-to-S3 frontend 8 iRODS Site Disks OpenStack Swift/S3 Achievements Transparent cloud storage Cloud auth through central accounts Low Overhead through iRODS Speedups with caching Limitations: Filesize limit (2/5GB) Issue moving files inside the cloud 9 iRODS Site Disks S3/OpenStack DPM-S3 Martin Hellmich Date : DPM now uses dmlite 11 S3 Sidestep: the S3 protocol HTTP + custom headers Access ID + Secret Key + HTTP Cmd + Time => Signature Can be: Header: Authorization: AWS WSAccessKeyId:Signature In URL: ?AWSAccessKeyId=AKIAIOSFODNN7EXAMPLE&Signature= NpgCjnDzr%2BWFzoENXmpNDUsSn8%3D&Expires= Extending DPM with S3 Storage 13 Site Disks S3 Signed URL redirect Ingredients: dmlite dmlite-plugins-s3 Amazon S3 OpenStack Swift S3 frontend Ceph/RadosGW Achievements Only nameserver traffic local Cloud storage managed with central account Grid-enabled HTTP Standard HTTP clients Filesize limit (or S3 client) 14 Site Disks S3 Signed URL redirect In-Storage Processing Jedrzej Rybicki & Benedikt von St. Vieth Date : Motivation Example HPC workflow: 16 Site High Performance Computing Storage preprocessing Site High Performance Computing Storage + preprocessing Sidestep: iRODS rules 17 Condition: $objPath like /x/y/z/* Or $rescName == demoResc8 Rule: printHello { print_hello; } Act freely on certain triggers At least C and Python Benedikt von St. Vieth & Jedrzej Rybicki 18 In-Storage Processing Achievements 19 Everything is a file Easy job specification in Apache Pig Caching of results Predefined scripts or custom jobs? Summary 20 There are different ways to integrate cloud storage for different scenarios Storage-based computing can be made transparent Thank you! OpenStack/iRODS Maciej Brzezniak (PSNC) DPM-S3 Martin Hellmich (CERN) In-storage processing on iRODS Jedrzej Rybicki / Benedikt von St. Vieth (JSC) 21 Projects contacts Any Questions?


Top Related