may 2013 hug: building common denominator of hadoop distributions with bigtop

25
Hadoop.next

Upload: yahoo-developer-network

Post on 26-Jan-2015

105 views

Category:

Technology


1 download

DESCRIPTION

Bigtop is stepping up in its role as the foundation of a standard Hadoop-based data analytics stack, essentially bringing most of the commercial offering to the standard footing. 6 out of 7 commercial vendors using Bigtop framework to power their distributions based on ASF Hadoop. Bigtop is also the must have stabilization tool for Hadoop platform where's any downstream application or system developer can make sure that their software would work with the next version of Hadoop. Presenter(s): Dr. Konstantin Boudnik, ASF Hadoop committer, Bigtop PMC; Director of Engineering, WANdisco Roman Shaposhnik, VP, Apache Bigtop, IPMC member at ASF; Software engineer, Cloudera inc.

TRANSCRIPT

Page 1: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

Hadoop.next

Page 2: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop
Page 3: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop
Page 4: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop
Page 5: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop
Page 6: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop
Page 7: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop
Page 8: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

Who are we?● Hadoop downstream community● Well, specifically:

– Roman Shaposhnik● VP, Apache Bigtop, IPMC member at ASF● Software engineer, Cloudera inc.

– Dr. Konstantin Boudnik,● ASF Hadoop committer, Bigtop PMC,● Director of Engineering, WANdisco

Page 9: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

What are we dealing with?● Hadoop 1.x

– stable, but old

● Hadoop 2.0.x– modern, used to be alpha, now stabilizing

● Hadoop 2.1.x– modern, feature-driven

● Hadoop 3.x– perpetual trunk

Page 10: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

What are the implications?● YARN's appeal as IaaS● Fragmentation● Repeat of “UNIX vendor wars”● Cutting off vital sources of feedback● Jaded downstream● Confused users● Delayed world domination

Page 11: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

What's downstream to do?● mvn help:all-profiles

Profile Id: hadoop_0.20.203 (active)Profile Id: hadoop_1.0Profile Id: hadoop_non_secureProfile Id: hadoop_facebookProfile Id: hadoop_0.23Profile Id: hadoop_yarnProfile Id: hadoop_2.0.0Profile Id: hadoop_2.0.1Profile Id: hadoop_2.0.2Profile Id: hadoop_2.0.3Profile Id: hadoop_trunkProfile Id: hadoop_cdh4.1.2

Page 12: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

That active profile?● http://mvnrepository.com/artifact/

org.apache.hbase/hbase/0.94.3

<dependency> <groupId>org.apache.hbase</groupId> <artifactId>hbase</artifactId> <version>0.94.3</version></dependency>

● Try finding Apache Giraph artifacts!

Page 13: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

13

We don't have the TCK, but...

Zookeeper

HBase

Pig

Hive

Impala

Giraph

Hama

Hue

Solr

Crunch

Sqoop

Oozie

Whirr

Mahout

Flume

Page 14: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

Apache Bigtop

“open-source software related to a system for integration, packaging, deployment and validation of a big data management software distribution based on Apache Hadoop”

Page 15: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

15

Remember what Debian did to Linux?

GNU Software Linux kernelLinux kernel

Page 16: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

16

Bigtop is trying to do it with Hadoop

Hadoop Ecosystem(Pig, Hive, Mahout)

Linux kernelHadoop(HDFS + MR)

Page 17: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

What does Bigtop offer:● Community focused on all of the above● Software for:

– Integration

– Build (make, Maven)

– Packaging (RPM, DEB)

– Deployment (Puppet)

– Testing (iTest)

● A continuous integration Jenkins server

Page 18: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

Embrace asynchronous nature ● Don't expect flag days● Don't expect agreement on releases● Do practice Last Known Good Builds

Av1 Bv22

Cv3 Dv4

Av1 Bv2

Cv3 Dv2

........Av1 Bv2

Cv3 Dv4

Bv22

Dv44

Page 19: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

Who's on-board?● Cloudera

– CDH4 is 100% based on Bigtop (hadoop v2)

● WANdisco● TrendMicro● Hortonworks, EMC, EBay, Intel (partially)● Canonical

– Ubuntu Server: Hadoop and Bigdata blueprint

● Illumos (early stages of interest)

Page 20: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

Who's on-board?● Cloudera

– CDH4 is 100% based on Bigtop (hadoop v2)

● WANdisco● TrendMicro● Hortonworks, EMC, EBay, Intel (partially)● Canonical

– Ubuntu Server: Hadoop and Bigdata blueprint

● Illumos (early stages of interest)

Page 21: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

Who's on-board?● Cloudera

– CDH4 is 100% based on Bigtop (hadoop v2)

● WANdisco● TrendMicro● Hortonworks, EMC, EBay, Intel (partially)● Canonical

– Ubuntu Server: Hadoop and Bigdata blueprint

● Illumos (early stages of interest)

Page 22: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

Who's on-board?● Cloudera

– CDH4 is 100% based on Bigtop (hadoop v2)

● WANdisco● TrendMicro● Hortonworks, EMC, EBay, Intel (partially)● Canonical

– Ubuntu Server: Hadoop and Bigdata blueprint

● Illumos (early stages of interest)

Page 23: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

What's happening● A special release: Bigtop 0.3.0-incubating

– Hadoop 1.0.1

● Last stable release: Bigtop 0.5.0– Hadoop 2.0.2-alpha

● Next stable release: Bigtop 0.6.0– End of Mar 2013 release

– Hadoop 2.0.4-alpha

– Major focus on developers

Page 24: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

A special note on 2.0.4-alpha● It really will be 2.0.4.1● First release to use Bigtop as release criteria● A 100% community effort● First non-vendor stabilization effort● A stable base for 18 applications and

counting!

Page 25: May 2013 HUG: Building common denominator of Hadoop distributions with Bigtop

<your idea here>