scikit learn: data normalization techniques that work

16
HELP YOUR DATA BE NORMAL

Upload: damian-r-mingle-mba

Post on 13-Apr-2017

88 views

Category:

Data & Analytics


7 download

TRANSCRIPT

Page 1: Scikit Learn: Data Normalization Techniques That Work

HELP YOUR DATA BE NORMAL

Page 2: Scikit Learn: Data Normalization Techniques That Work

DAMIAN MINGLECHIEF DATA SCIENTIST

@DamianMingle

Page 3: Scikit Learn: Data Normalization Techniques That Work

GET THE FULL STORYbit.ly/UseSciKitNow

Page 4: Scikit Learn: Data Normalization Techniques That Work

Want faster model run times and better accuracy?

Try Normalizing Your Data

Page 5: Scikit Learn: Data Normalization Techniques That Work

What’s Normal Anyway?

Often stated as “scaling individual samples to have unit norm” or “scale input vectors individually to unit norm (vector length).

Adjusting values measured on different scales to a notionally common scale

Page 6: Scikit Learn: Data Normalization Techniques That Work

Why Normalization Matters

In truth, not all machine learning models are sensitive to magnitude.

Data on the same scale can help machine learning models learn (think k-nearest neighbors and coefficients in regression)

Page 7: Scikit Learn: Data Normalization Techniques That Work

Power in SciKit Learn

Preprocessing Clustering Regression Classification Dimensionality Reduction Model Selection

Power of SciKit Learn

Page 8: Scikit Learn: Data Normalization Techniques That Work

Let’s Look at ML Recipe

Normalization

Page 9: Scikit Learn: Data Normalization Techniques That Work

The Imports

from sklearn.datasets import load_iris from sklearn import preprocessing

Page 10: Scikit Learn: Data Normalization Techniques That Work

Separate Features from Target

iris = load_iris() print(iris.data.shape)X = iris.data y = iris.target

Page 11: Scikit Learn: Data Normalization Techniques That Work

Normalize the Features

normalized_X = preprocessing.normalize(X)

Page 12: Scikit Learn: Data Normalization Techniques That Work

Normalization Recipe

# Normalize the data attributes for the Iris dataset. from sklearn.datasets import load_iris from sklearn import preprocessing # load the iris dataset iris = load_iris() print(iris.data.shape) # separate the data from the target attributes X = iris.data y = iris.target # normalize the data attributes normalized_X = preprocessing.normalize(X)

Page 13: Scikit Learn: Data Normalization Techniques That Work

HELP YOUR DATA BE NORMAL

Page 14: Scikit Learn: Data Normalization Techniques That Work

DAMIAN MINGLECHIEF DATA SCIENTIST

@DamianMingle

Page 15: Scikit Learn: Data Normalization Techniques That Work

GET THE FULL STORYbit.ly/UseSciKitNow