pycon 2012 scikit-learn

27
Learning from the Past: with Scikit-Learn ANOOP THOMAS MATHEW Profoundis Labs Pvt. Ltd.

Upload: anoop-thomas-mathew

Post on 27-Jan-2015

135 views

Category:

Technology


2 download

DESCRIPTION

Presentation for PyCon India 2012

TRANSCRIPT

Page 1: Pycon 2012 Scikit-Learn

Learning from the Past: with Scikit-Learn

ANOOP THOMAS MATHEWProfoundis Labs Pvt. Ltd.

Page 2: Pycon 2012 Scikit-Learn
Page 3: Pycon 2012 Scikit-Learn

●Basics of Machine Learning

● Introduction some common techniques

●Let you know scikit-learn exists

●Some inspiration on using machine learning in daily life scenarios and live projects.

Agenda

Page 4: Pycon 2012 Scikit-Learn

How to draw a snake?How to draw a snake?

Page 5: Pycon 2012 Scikit-Learn

How to draw a snake?How to draw a snake?

This is WEIRD!

Page 6: Pycon 2012 Scikit-Learn

A lot of Data!

What to do???

Introduction

Page 7: Pycon 2012 Scikit-Learn

What is

Machine Learning (Data Mining)?

(in plain english)

Introduction

Page 8: Pycon 2012 Scikit-Learn

"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E"

Tom M. Mitchell

Machine Learning

Page 9: Pycon 2012 Scikit-Learn

●Supervised Learning - model.fit(X, y)●Unsupervised Learning - model.fit(X)

Machine Learning

Page 10: Pycon 2012 Scikit-Learn

Supervised Learning

Page 11: Pycon 2012 Scikit-Learn

from sklearn.linear_model import Ridge as RidgeRegressionfrom sklearn import datasetsfrom matplotlib import pyplot as plt

boston = datasets.load_boston()X = boston.datay = boston.targetclf = RidgeRegression()clf.fit(X, y)clf.predict(X)

For example ...

Page 12: Pycon 2012 Scikit-Learn

Unsupervised Learning

Page 13: Pycon 2012 Scikit-Learn

from sklearn.cluster import KMeansfrom numpy.random import RandomStaterng = RandomState(42)k_means = KMeans(3, random_state=rng)k_means.fit(X)

For example ...

Page 14: Pycon 2012 Scikit-Learn

ClusteringClassificationRegression

What can Scikit-learn do?

Page 15: Pycon 2012 Scikit-Learn

• Model the collection of parameters you are trying to fit

• Data what you are using to fit the model

• Target the value you are trying to predict with your model

• Features attributes of your data that will be used in prediction

• Methods algorithms that will use your data to fit a model

Terminology

Page 16: Pycon 2012 Scikit-Learn

● Understand the task. See how to measure the performance.

● Choose the source of training experience.

● Decide what will be input and output.

● Choose a set of models to the output function.

● Choose a learning algorithm.

Steps for Analysis

Page 17: Pycon 2012 Scikit-Learn

● Understand the task. See how to measure the performance. Find the right question to ask.

● Choose the source of training experience.● Keep training and testing dataset separate. Beware of overfitting !

● Decide what will be input and expected output.

● Choose a set of models to approximate the output function. (use dimensinality reduction)

● Choose a learning algorithm. Try different ones ;)

Steps for Analysis

Page 18: Pycon 2012 Scikit-Learn

Principal Component Analysis

Some Common AlgorithmsSome Common Algorithms

Page 19: Pycon 2012 Scikit-Learn

● Support Vector Machine

Some Common AlgorithmsSome Common Algorithms

Page 20: Pycon 2012 Scikit-Learn

● Nearest Neighbour Classifier

Some Common AlgorithmsSome Common Algorithms

Page 21: Pycon 2012 Scikit-Learn

● Decision Tree Learning

Some Common AlgorithmsSome Common Algorithms

Page 22: Pycon 2012 Scikit-Learn

k-means clustering

Some Common AlgorithmsSome Common Algorithms

Page 23: Pycon 2012 Scikit-Learn

DB SCAN Clustering

Some Common AlgorithmsSome Common Algorithms

Page 24: Pycon 2012 Scikit-Learn

●Log file analysis●Outlier dectection●Fraud Dectection●Forcasting●User patterns

Some Example UsecasesSome Example Usecases

Page 25: Pycon 2012 Scikit-Learn

●nltk is a good(better) for text processing

●scikit-learn is for medium size problems

● for humongous projects, think of mahout●matplotlib can be used for visualization

● visualize it in browser using d3.js● have a look at pandas for numerical analysis

A few commentsA few comments

Page 26: Pycon 2012 Scikit-Learn

●This is just the tip of an iceberg.●Scikit-learn is really cool to hack with.●A lot of examples(http://scikit-learn.org/stable/auto_examples/index.html)

Conclusion Conclusion

Page 27: Pycon 2012 Scikit-Learn

pip install scikit-learn

Its all in the internet.

Happy Hacking!

Final words Final words