![Page 1: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/1.jpg)
Know Thy Neighbor: An Introduction to Scikit-learn and K-NN
Portia BurtonPortland Data Science GroupMarch 25, 2014
![Page 2: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/2.jpg)
What We will Cover Today
1. Define What is Machine Learning2. Go Over Scikit-learn3. Explain k-Nearest Neighbor4. Demo of Scikit-learn and k-Nearest Neighbor
![Page 3: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/3.jpg)
What is Machine Learning
• The art of creating a predictive models
• Uses input to make predictions
• Enabling computers to pattern match data
![Page 4: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/4.jpg)
![Page 5: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/5.jpg)
![Page 6: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/6.jpg)
Scikit-Learn
![Page 7: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/7.jpg)
What is scikit-learn?
• Python machine learning package• Built on NumPy, SciPy, and matplotlib
![Page 8: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/8.jpg)
**
![Page 9: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/9.jpg)
k-NN
• k Nearest Neighbor algorithm– The simplest machine learning algorithm– K being the constant
![Page 10: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/10.jpg)
Basic Information about KNN
• It is a lazy algorithm : doesn’t generalize the training data until approached with a new data point
![Page 11: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/11.jpg)
Supervised vs. Unsupervised Learning
![Page 12: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/12.jpg)
Supervised LearningWhen your samples are labeled
![Page 13: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/13.jpg)
Example: Spam Filters
![Page 14: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/14.jpg)
Unsupervised LearningThe given instances are not labeled, and the
categories are determined independently
![Page 15: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/15.jpg)
How k-NN works
![Page 16: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/16.jpg)
How k-NN works
?
![Page 17: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/17.jpg)
What can KNN be used for
• Clustering
• Regression
![Page 18: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/18.jpg)
Downsides of KNN
• Since there is minimum training there is a high cost in testing new data
• Correlation is falsely high (data points can be given too much weight)
![Page 19: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/19.jpg)
Alternatives to kNN• KDTree• BallTree
![Page 20: Know Thy Neighbor: An Introduction to Scikit-learn and K-NN](https://reader036.vdocuments.mx/reader036/viewer/2022062323/56816223550346895dd24eee/html5/thumbnails/20.jpg)
References:http://www.solver.com/xlminer/help/k-nearest-neighbors-prediction-example
http://saravananthirumuruganathan.wordpress.com/2010/05/17/a-detailed-introduction-to-k-nearest-neighbor-knn-algorithm/
http://scikit-learn.org/stable/modules/neighbors.html
http://peekaboo-vision.blogspot.com/2013/01/machine-learning-cheat-sheet-for-scikit.html
http://stackoverflow.com/questions/1832076/what-is-the-difference-between-supervised-learning-and-unsupervised-learning
http://stackoverflow.com/questions/2620343/what-is-machine-learning