apache mahout - last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · agenda...
TRANSCRIPT
![Page 1: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/1.jpg)
Apache Mahout
Scaling Machine Learning
Presented by:Isabel Drost
![Page 2: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/2.jpg)
Agenda
● Motivation.
● Machine learning?
● Introducing Mahout.
● How can you help?
![Page 3: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/3.jpg)
Some motivation.
![Page 4: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/4.jpg)
January 3, 2006 by Matt Callowhttp://www.flickr.com/photos/blackcustard/81680010
![Page 5: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/5.jpg)
Follow news stories
Automatic topic tracker.Search through papers.
September 10, 2008 by Alex Barthhttp://www.flickr.com/photos/a-barth/2846621384
![Page 6: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/6.jpg)
March 7, 2008 by extranoise
http://www.flickr.com/photos/extranoise/2317950586/
![Page 7: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/7.jpg)
Movie recommendation
Aggregate reviews from IMDB, twitter, ...
IMDB + movie reviews.
March 22, 2008 by Crystian Cruzhttp://www.flickr.com/photos/crystiancruz/2353895708
![Page 8: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/8.jpg)
● Lots and lots of data.
● Structured and unstructured.
![Page 9: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/9.jpg)
Mission
Provide scalable data mining algorithms.
![Page 10: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/10.jpg)
Machine Learning?
![Page 11: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/11.jpg)
![Page 12: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/12.jpg)
Density of ObjectDensity of Fluid
=.
WeightWeight−Apparent immersed weight
Archimedes generates model:
![Page 13: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/13.jpg)
June 25, 2008 by chase-mehttp://www.flickr.com/photos/sasy/2609508999
![Page 14: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/14.jpg)
March 28, 2007 by dullhunkhttp://www.flickr.com/photos/dullhunk/437551254
![Page 15: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/15.jpg)
Machine learning generates model
![Page 16: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/16.jpg)
Machine learning pipeline
Gather data.(and meta data).
Identifycharacteristics.
Chose rightalgorithm.
Tune parametersof your algorithm.
Train on thegathered data.
Keep model in syncwhen nature changes.
![Page 17: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/17.jpg)
January 8, 2008 by Pink Sherbet Photographyhttp://www.flickr.com/photos/pinksherbet/2177961471/
![Page 18: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/18.jpg)
Machine learning pipeline
Gather data.(and meta data).
Identifycharacteristics.
Chose rightalgorithm.
Tune parametersof your algorithm.
Train on thegathered data.
Keep model in syncwhen nature changes.
![Page 19: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/19.jpg)
E-Bay
password
Differenttopic
Auctionstatus?
PhishingSpam?
Requestedpassword?
![Page 20: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/20.jpg)
01
1000011
One of your mails:
Apache
London
Hadoop
Lucene
London
. . .
![Page 21: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/21.jpg)
Machine learning pipeline
Gather data.(and meta data).
Identifycharacteristics.
Chose rightalgorithm.
Tune parametersof your algorithm.
Train on thegathered data.
Keep model in syncwhen nature changes.
![Page 22: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/22.jpg)
![Page 23: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/23.jpg)
Machine learning pipeline
Gather data.(and meta data).
Identifycharacteristics.
Chose rightalgorithm.
Tune parametersof your algorithm.
Train on thegathered data.
Keep model in syncwhen nature changes.
![Page 24: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/24.jpg)
Parameter tuning
● Penalty for mistakes.
● Kernel type for data transformation.
● Tune kernel parameters.
![Page 25: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/25.jpg)
Machine learning pipeline
Gather data.(and meta data).
Identifycharacteristics.
Chose rightalgorithm.
Tune parametersof your algorithm.
Train on thegathered data.
Keep model in syncwhen nature changes.
![Page 26: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/26.jpg)
Training
● Build model from data.
![Page 27: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/27.jpg)
Machine learning pipeline
Gather data.(and meta data).
Identifycharacteristics.
Chose rightalgorithm.
Tune parametersof your algorithm.
Train on thegathered data.
Keep model in syncwhen nature changes.
![Page 28: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/28.jpg)
Nature changes?
● Spammers adapt to spam filters.● Users write mails in different styles.● Expand to new languages.● ...
![Page 29: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/29.jpg)
Machine learning pipeline
Gather data.(and meta data).
Identifycharacteristics.
Chose rightalgorithm.
Tune parametersof your algorithm.
Train on thegathered data.
Keep model in syncwhen nature changes.
![Page 30: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/30.jpg)
Introducing Mahout
![Page 31: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/31.jpg)
Classification
● Categorize data.
● Examples:● Identify spam mails.● Classify movies as “Action”, “Comedy” ...
![Page 32: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/32.jpg)
Classification
● Naive bayes.
● Complementary naive bayes.
● Winnow/Perceptron
● Others upcoming.
![Page 33: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/33.jpg)
Discovering groups of data
● Group data by similarity.
● Examples:● News articles by topic.● Developers by favorite modules.
![Page 34: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/34.jpg)
Discovering groups of data
● Canopy.
● K-Means.
● Dirichlet based.
● PLSI.
● Others upcoming.
![Page 35: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/35.jpg)
Recommendation mining
● Recommend items.
● Examples:● Find books a user my like.● Identify movies a user likes.
![Page 36: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/36.jpg)
Upcoming
● More algorithms.
● More examples.
![Page 37: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/37.jpg)
What Mahout can do for you
“Why should I participate?”
![Page 38: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/38.jpg)
Jumpstart your project with proven code.
January 8, 2008 by dreizehn28http://www.flickr.com/photos/1328/2176949559
![Page 39: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/39.jpg)
Discuss with researchers and engineers.November 16, 2005 [phil h]
http://www.flickr.com/photos/hi-phi/64055296
![Page 40: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/40.jpg)
Become a community member.
![Page 41: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/41.jpg)
s
October 22, 2008 by e_calamarhttp://www.flickr.com/photos/e_calamar/2964991182/
http://.../pub/mirrors/apache/lucene/mahout/0.1/
Thank you to all thosemaking this possible.
![Page 42: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/42.jpg)
● We need You:
● Enthusiasm.● Mathematical knowledge.● Proficiency in Hadoop.● Interest in understanding data.
July 9, 2006 by trackrecordhttp://www.flickr.com/photos/trackrecord/185514449
![Page 43: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?](https://reader030.vdocuments.mx/reader030/viewer/2022021808/5bf5df1109d3f2941d8bed32/html5/thumbnails/43.jpg)
Some advertising
Berlin - June* at 5p.m.
newthinking store Berlin
Tucholskystr. 48
Hadoop** User/Developer Meeting Germany
* Exact date is set by speaker – that is you!
** Lucene, Tika, Solr, UIMA, Mahout, katta, ... people welcome.