policy-based cloud storage: persisting data in a multi-site, multi-cloud world

23
© 2015 Apcera Policy-based Cloud Storage Persisting Data in a Multi-Site, Multi-Cloud World V 2015-09-23.1 Earl C. Ruby III Principal Software Engineer Apcera [email protected] @earlruby http://earlruby.org

Upload: apcera

Post on 15-Apr-2017

841 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Policy-based Cloud Storage Persisting Data in a Multi-Site, Multi-Cloud World

V 2015-09-23.1

Earl C. Ruby III

Principal Software Engineer

Apcera

[email protected]

@earlruby

http://earlruby.org

Page 2: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

IntroductionMy name is Earl, and I work at Apcera

This is the picture I used last year at RICON

Storage covers a lot of ground, so I’m going to focus on file system storage and policy

Q&A afterwards, but feel free to ask questions

Page 3: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

What problem are we trying to solve?

Page 4: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

What problem are we trying to solve?

We want to reduce application development time by allowing engineers to easily provision services -- network, DNS, NoSQL, DB, Web, etc. --- without having to request the service from some other group and without compromising system security or stability

Page 5: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

File systems in the cloud● Provide a temporary file system to a single job

● Provide a persistent file system to a single job

● Provide a persistent file system shared with multiple jobs

Do all of the above on any cloud, anywhere

Page 6: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Temporary file system / single jobNot too interesting...

● Job starts, has a file system (how much data?)

● Job writes data to the file system (what kind of performance?)

● Data is stored locally on the same host where the job runs (limits the available volume size)

● Job ends, data goes away (hence the name “temporary”)

● Linux containers handle this well today

Page 7: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Persistent file system / single jobMore interesting...

● Job starts, has a file system

● Job writes data to the file system (what performance?)

● Job migrates to a new host, data moves with it (but how quickly?)

● Job restarts, data persists (how durable? RAID? 3 copy?)

● Job ends, data goes away (difference between restart and end?)

Page 8: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Persistent FS / shared / multiple jobsVery interesting...

● First job starts, gets a file system (how much data?)

● More jobs start, can access same file system (how many jobs?)

● Jobs write data to the file system (how many jobs at the same time? same file? same directory?)

● Jobs migrate to a new cloud (all jobs or some? does the shared data migrate and if so, when?)

● Will it scale?

Page 9: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Define “Scale”When we say “Scale”, what are we talking about?

● Total volume of data

● Number of simultaneous read/write/update/delete operations

● Number of simultaneous connections

● Total IOPS / Gbps (“Noisy Neighbor” problem)

● Consistency / Availability / Partition Tolerance (CAP)

● Predictable performance as all of the above increase

It depends...

Page 10: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Trade-offsDifferent types of storage systems have different tradeoffs

● NFS - all traffic across network, tends to degrade if many jobs write to the same file or directory (locking issues)

● Local SSD - fast, but total space is limited to the size of the disk, SPOF

● HDFS - Optimized for large files and sequential reads

● AWS EBS - Only works on AWS, can get expensive for high IOPS

● AWS Glacier - Cheap to store, slow and expensive to read. Optimized for large files, write once, read rarely

● Legacy SAN - fast, works for on-premises cloud, not AWS, expensive to maintain and extend

Page 11: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Policy!Use policy for Provisioning, Security, Performance, and Business Logic.

Instead of trying to redesign the wheel, describe what you want to happen and let the application platform make that happen.

Page 12: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Storage creation policy● Max capacity (disk quota)

● Geo-replicated / HA fail-over required? (Y/N)

● Data Durability

○ Durability affects performance during recovery from hardware failure

○ Let the system decide if an app’s requirements means 1-copy, 2-copy, 3-copy, RAID level, or erasure encoded

● Thick provisioning required? (Y/N)

Page 13: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Storage performance policy● Min / Max IOPS (or best effort)

● Min / Max bandwidth

● Max Latency

● Max Concurrent Access

DO NOT define SSD / HDD / SAN -- define the performance you require, let the platform figure out how to deliver that performance

Page 14: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Data handling policy● Deduplication (Y/N)

● Compression (Y/N)

● At-rest encryption (None, LUKS, etc.)

● Point-in-time recovery required? (Y/N)

○ At what points ? (schedule)

Page 15: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Business policy● Max Cost / GB

● Location - where is this data allowed to be physically located?

● Retention - keep data forever or delete after some date?

Page 16: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Play nice with othersIn a multi-cloud world, your platform has to play nice with others

● Connect to other vendors’ storage solutions

● Be able to determine what policies those solutions support

● Apply policies that are supported by those solutions

● React gracefully when policies are not supported

● Self-heal if policy is not supported (or permitted) in any cloud -- give the user a suggestion on how to move forward

Page 17: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Where are we now?● We currently provide NFS volumes to Apcera jobs

● Provide persistent storage to Docker containers

● We are actively engaged with ClusterHQ/Flocker, ObjectiveFS, and ConvergeIO, among others

● We are part of the Open Container Initiative

● Quotas and access permissions are supported by policy today

Page 18: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Policy example 1The following is a basic quota policy that limits package size, RAM, disk, and network resources for the user Sam’s sandboxed namespace. Without such policy, Sam has an unrestricted use of resources.

quota::/sandbox/sam {

{ max.package.size 2GB }

{ total.package.size 6GB }

{ total.memory 5GB }

{ total.disk 15GB }

{ total.network 1Gbps }

}

Page 19: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Policy example 2The following policy block limits the maximum amount of resources individual jobs and job instances in the /dev namespace may consume.

quota::/dev {

{ max.job.cpu 200 }

{ max.instance.cpu 100 }

{ max.job.memory 64GB }

{ max.instance.memory 32GB }

{ max.job.disk 50TB }

{ max.instance.disk 25TB }

{ max.job.network 10Mbps }

{ max.instance.network 5Mbps }

}

Page 20: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Policy example 3The following policy block limits the total amount of resources all jobs in the /prod/website namespace may consume.

quota::/prod/website {

{ total.cpu 1000 }

{ total.memory 100GB }

{ total.disk 10TB }

{ total.network 250Gbps }

}

Page 21: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Policy example 4The following policy block limits the maximum memory and disk space all job instances in the /prod namespace may consume, and the maximum memory and disk space each job instance in the /prod namespace may consume.

quota::/prod {

{ max.instance.memory 256MB }

{ total.memory 20GB }

{ max.instance.disk 5GB }

{ total.disk 100GB }

}

Page 22: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

When will we get there?

"WHERE ARE WE GOING?"

"PLANET TEN!"

"WHEN WILL WE GET THERE?"

"REAL SOON!"

-- Buckaroo Banzai

Page 23: Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World

© 2015 Apcera

Thanks for listening!

Earl C. Ruby III

Principal Software Engineer

Apcera

[email protected]

@earlruby

http://earlruby.org