tools andtechnologies for large scale data mining

Post on 19-May-2015

1.295 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Tools andTechnologies for Large Scale DataMining

TRANSCRIPT

Tools andTechnologies for Large Scale DataMining

Jaganadh GProject Lead NLP R&D

365Media Pvt. Ltd.jaganadhg@gmail.com

DRDO Sponsored National Level Seminaron

Challenging Issues on Data Mining Semantic Web,Sri Krishna College of Engineering and Technology,

Coimbatore

27th Jan 2012

Jaganadh G Tools andTechnologies for Large Scale Data Mining

About me !!

Software Engineer Specializing in Text Analytics Research &Development

When free, teaches Python, Speaks about FOSS and blogs athttp://jaganadhg.in

Working as Project Lead (NLP) 365Media Pvt. Ltd.Coimbatore

I am a computational linguist / Linguist and Indologist, Bookreviewer

Maters Degree Holder in Sanskrit from University of Kerala

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Machine Learning

Machine Learning

Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.

This talk is not aimed to give introduction about MachineLearning

Dont expect some mathy equations here

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Machine Learning

Machine Learning

Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.

This talk is not aimed to give introduction about MachineLearning

Dont expect some mathy equations here

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Machine Learning

Machine Learning

Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.

This talk is not aimed to give introduction about MachineLearning

Dont expect some mathy equations here

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Machine Learning

Machine Learning

Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.

This talk is not aimed to give introduction about MachineLearning

Dont expect some mathy equations here

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Machine Learning and Our Life

Do you think that Machine Learning has any impact in our life??

Yes

In our day to day life we may use many Machine Learningpowered tools

E-mail spam filtering , product recommendations etc ..

Fraud detection

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Machine Learning and Our Life

Do you think that Machine Learning has any impact in our life??

Yes

In our day to day life we may use many Machine Learningpowered tools

E-mail spam filtering , product recommendations etc ..

Fraud detection

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Machine Learning and Our Life

Do you think that Machine Learning has any impact in our life??

Yes

In our day to day life we may use many Machine Learningpowered tools

E-mail spam filtering , product recommendations etc ..

Fraud detection

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Machine Learning and Our Life

Do you think that Machine Learning has any impact in our life??

Yes

In our day to day life we may use many Machine Learningpowered tools

E-mail spam filtering , product recommendations etc ..

Fraud detection

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Machine Learning and Our Life

Do you think that Machine Learning has any impact in our life??

Yes

In our day to day life we may use many Machine Learningpowered tools

E-mail spam filtering , product recommendations etc ..

Fraud detection

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Examples

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Examples

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Examples

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Tool for building Machine Learning powerd product/service

Apache Mahout

Apache Mahout is a scalable machine learning library that supportslarge data sets. Apache Mahout’s goal is to build scalable machinelearning libraries.

Commercially friendly licence

Well documented

Healthy community

Targeted to developers

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Algorithms in Apache Mahout

Collaborative Filtering

User and Item based recommenders

K-Means, Fuzzy K-Means clustering

Mean Shift clustering

Dirichlet process clustering

Latent Dirichlet Allocation

Singular value decomposition

Parallel Frequent Pattern mining

Complementary Naive Bayes classifier

Random forest decision tree based classifier

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Demo

Building recommendations engines with Mahout

Document Classification with Mahout

Some Python stuff on Machine Learning

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Reference

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Reference

Mahout in Action - Book by Sean Owen and Robin Anil,published by Manning Publications.

Taming Text - By Grant Ingersoll and Tom Morton, publishedby Manning Publications.

Introducing Apache Mahout - Grant Ingersoll - Intro toApache Mahout focused on clustering, classification andcollaborative filtering.https://www.ibm.com/developerworks/java/library/j-mahout/index.html

Programming Collective Intelligence: Building Smart Web 2.0Applicationshttp://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Useful Resources

Apache Mahout Site http://mahout.apache.org/

Apache Mahout Mailing List user@mahout.apache.org

The code which I used for Mahout demo is available athttp://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/

Twenty News Group data sethttp://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Questions ??

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Acknowledgments

Thanks to :

Manning Publications for Review Copy of the book ”Mahoutin Action”

Apache Mahout mailing list members

Ted Dunning and Robin Anil for suggestions

Sreejith S and Biju B for Java help

@chelakkandupoda for review and criticism

Mukundhanchari R&D Director 365Media Pvt. Ltd. forsupport and encouragement

Jaganadh G Tools andTechnologies for Large Scale Data Mining

Finally

Jaganadh G Tools andTechnologies for Large Scale Data Mining

top related