Transcript
Page 1: Cloud and Big Data trends

Sebastien Goasguen, January 29th

@sebgoa

Cloud and Big Data

Page 2: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

A view on Big Data

Page 3: Cloud and Big Data trends

http://www.economist.com/node/15557443?story_id=15557443

Page 4: Cloud and Big Data trends

SKA

Page 5: Cloud and Big Data trends
Page 6: Cloud and Big Data trends
Page 7: Cloud and Big Data trends
Page 8: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

How did we get there ?

Page 9: Cloud and Big Data trends

A natural evolution

Page 10: Cloud and Big Data trends

New Distributed systems for:

Large scale datasets• From scientific instruments• From Web apps logs

Complex datasets• Not necessarily large.

Object stores• S3 clones

Page 11: Cloud and Big Data trends

BigData and map-reduce

• While BigData is often associated with HDFS, Map-Reduce is the algorithm used to parallelize data processing.

• BigData ≠ Map-Reduce ≠ HDFS• Map-reduce is a way to express

embarrassingly parallel work easily.• You can do Map-Reduce without HDFS.

• e.g Basho map-reduce on riackCS

Page 12: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

A really quick view on Clouds

Page 13: Cloud and Big Data trends
Page 14: Cloud and Big Data trends
Page 15: Cloud and Big Data trends

Open Source IaaS

Page 16: Cloud and Big Data trends

Today

Page 17: Cloud and Big Data trends

BigData at peak

Page 18: Cloud and Big Data trends

History

2003 –Google File System2005 – Hadoop2006 – Hadoop enters ASF incubator (Feb)2006 – S3 launched 2007 – Paper on Amazon Dynamo2009 – EMR launched2013 – CloudStack as a ASF TLP (March)2013 – Spark/Mesos enters ASF incubator

Page 19: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

The Apache Software Foundation

Page 20: Cloud and Big Data trends

Apache Software Foundation

Page 21: Cloud and Big Data trends

35 projects in incubation:• 12 Hadoop related• ~30% Big Data related• Spark

117 top level projects:• ~16 cloud or bigdata +10%• Deltacloud, Libcloud, Whirr, jclouds• Hadoop, couchdb, cassandra, mesos• Bigtop, accumulo, lucene, UIMA• CloudStack

Page 22: Cloud and Big Data trends

Hadoop Ecosystem

+ Up-coming next generation BD systems

Page 23: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

Big Data and Cloud (Stack)s

Page 24: Cloud and Big Data trends

Clouds and BigData

• Object store + compute IaaS to build EC2+S3 clone

• BigData solutions as storage backends for image catalogue and large scale instance storage.

• BigData solutions as workloads to CloudStack based clouds.

Page 25: Cloud and Big Data trends

EC2, S3 clone• An open source IaaS with an EC2

wrapper e.g Opennebula• Deploy a S3 compatible object store –

separately- e.g riakCS• Two independent distributed systems

deployed

Cloud = EC2 + S3

Page 26: Cloud and Big Data trends

Big Data as IaaS backend

“Big Data” solutions can be used as secondary storage .

Page 27: Cloud and Big Data trends

Example• Open source IaaS + EC2 wrapper, e.g

CloudStack• Deploy S3 compatible object store, e.g

riakCS or Ceph or glusterFS• Use S3 as image store• Your EC2 service is a customer to your S3

service• Logstash + elasticsearch for logs/monitoring

Page 28: Cloud and Big Data trends

Even use Bare Metal

Page 29: Cloud and Big Data trends

Drag picture to placeholder or click icon to add

Big Data as a Workload to the Cloud

Page 30: Cloud and Big Data trends

Mesos, Spark are EC2 native

oec2_deploy.pyoec2_deploy.sho…

Page 31: Cloud and Big Data trends

Tools

Page 32: Cloud and Big Data trends

“PaaS”

Page 33: Cloud and Big Data trends

Dev Pipeline

Page 34: Cloud and Big Data trends

Conclusions

• Big Data is “catching up”• Tackle the big three head on:

• BigData, Cloud and DevOps• Add a big data backend to your cloud

from the start • Provide Big Data services on your cloud

Page 35: Cloud and Big Data trends

Still behind !

Page 36: Cloud and Big Data trends

Final Thoughts

Who manages my data transfers ?

Page 37: Cloud and Big Data trends

Event

ApacheCON + CloudStack Collaboration Conference

Denver April 7-11th.

Cloud and Big Data

Page 38: Cloud and Big Data trends

Get Involved with Apache CloudStack

Web: http://cloudstack.apache.org/

Mailing Lists: cloudstack.apache.org/mailing-lists.html

IRC:  irc.freenode.net: 6667 #cloudstack #cloudstack-dev

Twitter:  @cloudstack

LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859

If it didn’t happen on the mailing list, it didn’t happen.


Top Related