apache bigtop working group cluster stuff. cloud computing
TRANSCRIPT
Apache Bigtop Working Group
Cluster stuff
Cloud computing
Bigtop Administration
• Make sure you are signed up on the bigtop-dev mailing list. Lots of info which will never get repeated if you miss it
• Bigtop-user, bigtop-dev
Bigtop Administration
• Sign up for jira
Bigtop Administration
– Registration, Join Biocurious. Pays for space nobody takes a cut of this
– Free drinks – Registration = AWS Credits. Cancelling IntelliJ.
Expires end of April. – [email protected]
Newbie Slide• Structure:
– Do labs• Lab 1 Modified to take 1-2 weeks. Update the wiki with your findings• Lab 2 Build Bigtop 0.3.0; • Can start projects here, do Jira tickets• Lab 3 map reduce program• Lab 4 Run the unit tests under the component downloads• Lab 5 Run the integration tests• Lab 6 Puppet, deploy and run• Lab 7 Port a module
– Labs are changing; not a class. Requires time commitment– Demo, doesn’t need to be working; for your benefit not ours
Lab 1
• Install bigtop. Web search for apache bigtop, go to wiki link http://incubator.apache.org/bigtop/
• https://cwiki.apache.org/confluence/display/BIGTOP/Index
• https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop
Lab 1
• Install bigtop, run all the components, Hive/Hbase/Pig/Hadoop/Mahout/Oozie
• There are bugs, document them• Add the sample programs in quickstart to the
wiki. Not all are included yet
Lab 1
• Update the wiki• Sqoop open (User group meeting next week)• Flume/Flume NG (open/nothing)• Zookeeper(open/nothing)
Hadoop Components
• Old: Don’t stop at running Pi as test of HDFS• Still missing: Run Terasort in Hadoop, need
cluster• https://cwiki.apache.org/confluence/display/B
IGTOP/How+to+install+Hadoop+distribution+from+Bigtop
• Whirr may need patch depending on where you run it from
Mahout
• Don’t run jar like in Hadoop• Scripts handle downloading and clustering,
email demo, etc.. Under /examples/bin. • Bigtop puts example/bin under
/usr/share/doc/mahout. Is this correct? Not documentation
• Add documentation to wiki• Ticket filed
Oozie
• Oozie runs, forget the error message, set to highest version
Oozie
Flume/Flume NG
• New patch checkin for Flume NG• Testing
Whirr
• sudo apt-get install whirr• Run as: whirr launch-cluster --config
/udt/lib/whirr/recipes/mahout-ec2.properties• If successful will see directory under ~/.whirr• whirr.log• mvn clean install
Puppet
• sudo apt-get install puppet facter fails
Ticket Questions/Demo• Bigtop install should include stable for ubuntu? Diff between
stable and bigtop-0.3.0-incubating. There used to be a diff. • Monitoring, metrics.properties ->metrics2• Ganglia or JMX? All components w/daemon;• Bruno has Ganglia recipes to monitor status of cluster. Hadoop
monitoring: performance and functionality. Hooked up to kerberos/ commercial version is Cloudera
manager. Networking, i/o, block sizes, swap space, disk space. • Stable vs. incubating?
• Anwar: LogMining (M/R, clickstream and FE log data, exception on day to day basis);