Transcript
Page 1: A Beginner's Guide to Machine Learning with Scikit-Learn

A Beginner’s Guide to Machine Learning with Scikit-LearnSarah Guido

PyTennessee 2014

Page 2: A Beginner's Guide to Machine Learning with Scikit-Learn

All about me

• Grad student at the University of Michigan• Data analyst for HathiTrust• Organizer of Ann Arbor PyLadies chapter

Page 3: A Beginner's Guide to Machine Learning with Scikit-Learn

My talk

• Machine learning and scikit-learn• Supervised and unsupervised learning• Preprocessing, validation and testing, strategies for machine learning

Page 4: A Beginner's Guide to Machine Learning with Scikit-Learn

What is machine learning?

• Application of algorithms that learn from examples

• Representation and generalization

Page 5: A Beginner's Guide to Machine Learning with Scikit-Learn

Why should we care?

• Useful in every day life• Email spam, handwriting analysis, stock market

analysis, Netflix

• Especially useful in data analysis• Feature extraction, linear regression, classification,

clustering

Page 6: A Beginner's Guide to Machine Learning with Scikit-Learn

Machine Learning Vocab

• Instance• Feature• Class• Categorical

• Nominal• Ordinal

• Continuous

Page 7: A Beginner's Guide to Machine Learning with Scikit-Learn

Machine Learning VocabFeature Class

Instance

Page 8: A Beginner's Guide to Machine Learning with Scikit-Learn

Scikit-Learn

• Machine learning module• Open-source• Built-in datasets• Good resources for learning

Page 9: A Beginner's Guide to Machine Learning with Scikit-Learn

Scikit-Learn

• Model = EstimatorObject()• Model.fit(dataset.data, dataset.target)

• dataset.data = dataset• dataset.target = labels

• Model.predict(dataset.data)

Page 10: A Beginner's Guide to Machine Learning with Scikit-Learn

Scikit-Learn

• Supervised• Unsupervised• Semi-supervised• Reinforcement learning• Neural networks• …and many more!

Page 11: A Beginner's Guide to Machine Learning with Scikit-Learn

Supervised learning

• Labeled data• You know what you’re looking for• Classification: predict categorical labels• Regression: predict continuous target variables

Page 12: A Beginner's Guide to Machine Learning with Scikit-Learn

Classification

• Categorical variables• Relationship between instance and feature• Classification algorithms == classifiers

Page 13: A Beginner's Guide to Machine Learning with Scikit-Learn

Classification

• Naïve Bayes classifier• Features are independent• Fast performance• Decent classifier

Page 14: A Beginner's Guide to Machine Learning with Scikit-Learn

Classification

• Car evaluation dataset-UCI• Features: buying price, the maintenance price, the number of doors, the number of seats, the size of the trunk, and the safety ranking

• Labels: unacceptable, acceptable, good, or very good

Page 15: A Beginner's Guide to Machine Learning with Scikit-Learn

Classification

Page 16: A Beginner's Guide to Machine Learning with Scikit-Learn

Classification

Page 17: A Beginner's Guide to Machine Learning with Scikit-Learn

Classification

Page 18: A Beginner's Guide to Machine Learning with Scikit-Learn

Unsupervised algorithms

• Unlabeled data• You might have no idea what you’re looking for• Clustering: splitting observations into groups• Dimensionality reduction: flatten data to fewer dimensions

Page 19: A Beginner's Guide to Machine Learning with Scikit-Learn

Clustering

• Exploring the data• Similar objects in the same group• Distance between data points

Page 20: A Beginner's Guide to Machine Learning with Scikit-Learn

Clustering

• K-means clustering• Three steps

• Chooses initial cluster centers• Assigns data instance to cluster• Recalculates cluster center

• Efficient

Page 21: A Beginner's Guide to Machine Learning with Scikit-Learn

Clustering

Page 22: A Beginner's Guide to Machine Learning with Scikit-Learn

Clustering

Page 23: A Beginner's Guide to Machine Learning with Scikit-Learn

Clustering

Page 24: A Beginner's Guide to Machine Learning with Scikit-Learn

Data preprocessing

• Encoding categorical features

Page 25: A Beginner's Guide to Machine Learning with Scikit-Learn

Data preprocessing

Page 26: A Beginner's Guide to Machine Learning with Scikit-Learn

Data preprocessing

Page 27: A Beginner's Guide to Machine Learning with Scikit-Learn

Data preprocessing

• Split the dataset into training and test data

Page 28: A Beginner's Guide to Machine Learning with Scikit-Learn

Validation and testing

• Model evaluation

• Cross-validation

Page 29: A Beginner's Guide to Machine Learning with Scikit-Learn

Good strategies

• Avoid overfitting• Use lots of data• Intuition fails in high dimensions

Page 30: A Beginner's Guide to Machine Learning with Scikit-Learn

My materials

• Scikit-learn.org documentation and tutorials• Machine learning class at U of M• Scikit-learn talks

Page 31: A Beginner's Guide to Machine Learning with Scikit-Learn

Resources

• Scikit-learn documentation and tutorials• scikit-learn.org/stable/documentation.html

• Other resources• http://archive.ics.uci.edu/ml/datasets.html• Mldata.org

• Videos• Scikit-learn tutorial: http://vimeo.com/53062607• Intro to scikit-learn: http://vimeo.com/72859487

Page 32: A Beginner's Guide to Machine Learning with Scikit-Learn

Contact me!

• @sarah_guido• Linkedin.com/sarahguido• github.com/sarguido


Top Related