Know Thy Neighbor: An Introduction to Scikit-learn and K-NN
Portia BurtonPortland Data Science GroupMarch 25, 2014
What We will Cover Today
1. Define What is Machine Learning2. Go Over Scikit-learn3. Explain k-Nearest Neighbor4. Demo of Scikit-learn and k-Nearest Neighbor
What is Machine Learning
• The art of creating a predictive models
• Uses input to make predictions
• Enabling computers to pattern match data
Scikit-Learn
What is scikit-learn?
• Python machine learning package• Built on NumPy, SciPy, and matplotlib
**
k-NN
• k Nearest Neighbor algorithm– The simplest machine learning algorithm– K being the constant
Basic Information about KNN
• It is a lazy algorithm : doesn’t generalize the training data until approached with a new data point
Supervised vs. Unsupervised Learning
Supervised LearningWhen your samples are labeled
Example: Spam Filters
Unsupervised LearningThe given instances are not labeled, and the
categories are determined independently
How k-NN works
How k-NN works
?
What can KNN be used for
• Clustering
• Regression
Downsides of KNN
• Since there is minimum training there is a high cost in testing new data
• Correlation is falsely high (data points can be given too much weight)
Alternatives to kNN• KDTree• BallTree
References:http://www.solver.com/xlminer/help/k-nearest-neighbors-prediction-example
http://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knn-algorithm/
http://scikit-learn.org/stable/modules/neighbors.html
http://peekaboo-vision.blogspot.com/2013/01/machine-learning-cheat-sheet-for-scikit.html
http://stackoverflow.com/questions/1832076/what-is-the-difference-between-supervised-learning-and-unsupervised-learning
http://stackoverflow.com/questions/2620343/what-is-machine-learning