Cloud and Big Data trends

Download Cloud and Big Data trends

Post on 01-Nov-2014




0 download

Embed Size (px)


A look at clouds and big data trends and history. While Big Data arrived first on the scene -looking at google file system, hadoop, dynamo- Cloud was first in the hyper cycle. Google trends show this clearly. Amazon AWS however has already deployed analytics services on the their cloud while open source IaaS solutions are still struggling to deliver a EC2 clone. Cloud and Big data has three common points: 1-use an EC2 clone and a S3 clone (riakCS, glusterfs etc) to build a cloud 2-Use a big data solutions as a backend to your cloud to provide EBS or large scale image catalogue 3-deploy big data solutions on your cloud with tools like apache whirr, pallet, and newer devops tool chains with vagrant and co.


<ul><li> 1. Cloud and Big Data Sebastien Goasguen, January 29th @sebgoa </li> <li> 2. A view on Big Data </li> <li> 3. </li> <li> 4. SKA </li> <li> 5. How did we get there ? </li> <li> 6. A natural evolution </li> <li> 7. New Distributed systems for: Large scale datasets From scientific instruments From Web apps logs Complex datasets Not necessarily large. Object stores S3 clones </li> <li> 8. BigData and map-reduce While BigData is often associated with HDFS, Map-Reduce is the algorithm used to parallelize data processing. BigData Map-Reduce HDFS Map-reduce is a way to express embarrassingly parallel work easily. You can do Map-Reduce without HDFS. e.g Basho map-reduce on riackCS </li> <li> 9. A really quick view on Clouds </li> <li> 10. Open Source IaaS </li> <li> 11. Today </li> <li> 12. BigData at peak </li> <li> 13. History 2003 Google File System 2005 Hadoop 2006 Hadoop enters ASF incubator (Feb) 2006 S3 launched 2007 Paper on Amazon Dynamo 2009 EMR launched 2013 CloudStack as a ASF TLP (March) 2013 Spark/Mesos enters ASF incubator </li> <li> 14. The Apache Software Foundation </li> <li> 15. Apache Software Foundation </li> <li> 16. 35 projects in incubation: 12 Hadoop related ~30% Big Data related Spark 117 top level projects: ~16 cloud or bigdata +10% Deltacloud, Libcloud, Whirr, jclouds Hadoop, couchdb, cassandra, mesos Bigtop, accumulo, lucene, UIMA CloudStack </li> <li> 17. Hadoop Ecosystem + Up-coming next generation BD systems </li> <li> 18. Big Data and Cloud (Stack)s </li> <li> 19. Clouds and BigData Object store + compute IaaS to build EC2+S3 clone BigData solutions as storage backends for image catalogue and large scale instance storage. BigData solutions as workloads to CloudStack based clouds. </li> <li> 20. EC2, S3 clone An open source IaaS with an EC2 wrapper e.g Opennebula Deploy a S3 compatible object store separately- e.g riakCS Two independent distributed systems deployed Cloud = EC2 + S3 </li> <li> 21. Big Data as IaaS backend Big Data solutions can be used as secondary storage . </li> <li> 22. Example Open source IaaS + EC2 wrapper, e.g CloudStack Deploy S3 compatible object store, e.g riakCS or Ceph or glusterFS Use S3 as image store Your EC2 service is a customer to your S3 service Logstash + elasticsearch for logs/monitoring </li> <li> 23. Even use Bare Metal </li> <li> 24. Big Data as a Workload to the Cloud </li> <li> 25. Mesos, Spark are EC2 native o o o </li> <li> 26. Tools </li> <li> 27. PaaS </li> <li> 28. Dev Pipeline </li> <li> 29. Conclusions Big Data is catching up Tackle the big three head on: BigData, Cloud and DevOps Add a big data backend to your cloud from the start Provide Big Data services on your cloud </li> <li> 30. Still behind ! </li> <li> 31. Final Thoughts Who manages my data transfers ? </li> <li> 32. Event ApacheCON + CloudStack Collaboration Conference Denver April 7-11th. Cloud and Big Data </li> <li> 33. Get Involved with Apache CloudStack Web: Mailing Lists: IRC: 6667 #cloudstack #cloudstack-dev Twitter: @cloudstack LinkedIn: If it didnt happen on the mailing list, it didnt happen. </li> </ul>