analytics machine learning in weka

66
Modeling using WEKA

Upload: sudhakar-chavan

Post on 15-Apr-2017

152 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Analytics machine learning in weka

Modeling using WEKA

Page 2: Analytics machine learning in weka

Index• WEKA Introduction• WEKA file formats• Loading data• Univariate analysis• Data Manipulation• Feature Selection• Creating Training, Validation and Test Sets• Model Execution - Logistic Regression• Model Analysis - ROC Curve• Model Analysis – Cost/Benefit Analysis• Re-apply model on new data• Weka Plus and Negatives

Page 3: Analytics machine learning in weka

Introduction• Weka is a collection of machine learning algorithms for data mining tasks• The algorithms can either be applied directly to a dataset or called from your

own Java code.• ARFF – Attribute relation file format

Page 4: Analytics machine learning in weka

Dataset File formats

Page 5: Analytics machine learning in weka

Load Data Set

Page 6: Analytics machine learning in weka

Univariate Analysis

Page 7: Analytics machine learning in weka

Univariate Analysis• Current Relation – Dataset name, number of records, number of attributes in

dataset.

• Attribute details- All attributes to select for univariate analysis

Page 8: Analytics machine learning in weka

Univariate Analysis• Selected attribute –

• Provides information about attribute type, Missing values, Distinct values, etc.

Page 9: Analytics machine learning in weka

Univariate Analysis• Selected attribute – histogram

• Dispersion of attribute.

Page 10: Analytics machine learning in weka

Univariate Analysis• All attribute visualization/plots

Page 11: Analytics machine learning in weka

Data Manipulation• Changing data type of field• Missing values update• Creating BINS from data• Standardize data• Outlier Treatment• Creating new calculated fields

Page 12: Analytics machine learning in weka

1. Convert NA to 0• Flow > Preprocess > Edit > Right Click on Attribute > Replace Values

Page 13: Analytics machine learning in weka

1. Convert NA to 0

Page 14: Analytics machine learning in weka

2. Changing data types of attribute• Flow > Preprocess > Filter > Choose > Filters > Unsupervised > Attribute >

Page 15: Analytics machine learning in weka

2. Changing data types

Page 16: Analytics machine learning in weka

3. Creating BINS• Flow > Preprocess > Filter > Choose > Filters > Unsupervised > Attribute >

Discretize

Page 17: Analytics machine learning in weka

3. Creating BINS• Provide attribute number, Number of BINs to be created and click on ‘Apply’

Page 18: Analytics machine learning in weka

3. Creating BINS• Click on attribute to see the bins and distribution•

Page 19: Analytics machine learning in weka

3. Creating Custom BINS• Flow > Preprocess > Filter > Choose > Filters > Unsupervised > Attribute >

AddExpression• ifelse(a2 > 0, ifelse(a2 > 10,ifelse(a2 > 20,4, 3), 2), 1)

Page 20: Analytics machine learning in weka

4. Standardize data • To convert all numeric attributes in data to zero mean and unit variance.• Flow > Preprocess > Filter > Choose > Filters > Unsupervised > Attribute >

Standardize

Page 21: Analytics machine learning in weka

4. Standardize data

Page 22: Analytics machine learning in weka

4. Standardize data- Log values • To convert specific numeric attributes to log.• Flow > Preprocess > Filter > Choose > Filters > Unsupervised > Attribute >

Numeric Transform

Page 23: Analytics machine learning in weka

4. Standardize data- Log values • Provide value for attribute number which is to be converted to log value.• Also provide method name – log. Here we can provide any other methods such

as abs,round,floor•

Page 24: Analytics machine learning in weka

5. Identify Outliers• Flow > Preprocess > Filter > Choose > Filters > Unsupervised > Attribute >

Interquartile Range• Outliers can be identified for separate attribute or for all together

Page 25: Analytics machine learning in weka

5. Identify Outliers

Page 26: Analytics machine learning in weka

5. Remove Outliers• Flow > Preprocess > Filter > Choose > Filters > Unsupervised > Instance >

RemoveWithValues

Page 27: Analytics machine learning in weka

5. Remove Outliers• Params : attributeIndices - Attribute number, NominalIndices=Nominal value

of outlier in Outlier attribute

Page 28: Analytics machine learning in weka

5. Transform Outliers• Flow > Preprocess > Filter > Choose > Filters > Unsupervised > Attribute >

AddExpression – this option will create new field e.g : ifelse(a2 > 1000,200, 1)•

Page 29: Analytics machine learning in weka

6. New Calculated fields• This is helpful In case any new field is to be derived from existing fields• Flow > Preprocess > Filter > Choose > Filters > Unsupervised > Attribute >

Add Expression

Page 30: Analytics machine learning in weka

6. New Calculated fields• This is helpful In case any new field is to be derived from existing fields• Provide expression/equation and new field name

Page 31: Analytics machine learning in weka

Feature Selection/Attribute selection

1. Info Gain2. Correlation

Page 32: Analytics machine learning in weka

Feature Selection – Info Gain• Flow > Select Attribute > Attribute evaluator > Choose >

Page 33: Analytics machine learning in weka

Feature Selection – Correlation• Flow > Select Attribute > Attribute evaluator > Choose >

Page 34: Analytics machine learning in weka

Features for Model• Features selected for model

Page 35: Analytics machine learning in weka

Creating Training, Validation and Test Sets

Page 36: Analytics machine learning in weka

Creating data sets1. Dividing data into 60-20-20 % (Train-Test-Evaluate)2. Weka inbuilt methods

Page 37: Analytics machine learning in weka

Creating data sets• For 60%-20%-20%• Step 1-

• Flow > Preprocess > Filter > Choose > Unsupervised > Instance > Resample

Page 38: Analytics machine learning in weka

Creating data sets• Step 2- Parameters for resample

• Flow > Preprocess > Filter > Choose > Unsupervised > Instance > Resample > • Check noReplacement = True, sample size percent – 60 > ok > Apply

Page 39: Analytics machine learning in weka

Creating data sets• Step 3 – Check• After apply we can check the current relation for number of records selected

• Step 4 – Save the result as filename_train.arff• Step 5 – Click on ‘Undo’ to get to original data set• Step 6 – Change the Resample parameters again

• Parameters - > Invert selection = True, noReplacement = True, sampleSizePercent = 60

Page 40: Analytics machine learning in weka

Creating data sets• Step 7 – Apply and check results as below

Page 41: Analytics machine learning in weka

Creating data sets• Step 8 – Don’t save the results• Step 9 – Open Resample parameters set below parameters

• Invert selection = False, noReplacement = True, sampleSizePercent = 50 • OK > Apply Check the results

Page 42: Analytics machine learning in weka

Creating data sets• Step 10 – Check the results• Step 11 – Save the results as Test Data.

• Step 13 – Click on Undo to get earlier 40% of dataset• Step 13 – Parameters invertSelection = True, noReplacement=True ,

sampleSizePercent = 50• Step 14 – Ok > Apply • Step 15 – Save as Evaluation Data

Page 43: Analytics machine learning in weka

Creating data sets

• .

Page 44: Analytics machine learning in weka

Creating data sets2. Weka inbuilt – Flow > Classify > Test Options > Use Training Set – With this option selected data set will be used as training set to create model

Page 45: Analytics machine learning in weka

Creating data sets2. Weka inbuilt – Flow > Classify > Test Options > Use Supplied Test Set – With this option selected data set will be used as test set to create model

Page 46: Analytics machine learning in weka

Creating data sets2. Weka inbuilt – Flow > Classify > Test Options > Use Supplied Test Set – With this option selected data set will be used as test set to test model

Page 47: Analytics machine learning in weka

Creating data sets2. Weka inbuilt – Flow > Classify > Test Options > Cross Validation – With this option selected data set will be divided into 10 folds create model internally and weka will take average of all these models to show final model on UI

Page 48: Analytics machine learning in weka

Creating data sets2. Weka inbuilt – Flow > Classify > Test Options > Percentage Split – With this option selected data set will be divided into Training and Test set for model creation

Page 49: Analytics machine learning in weka

Logistic Regression

Page 50: Analytics machine learning in weka

Logistic Regression• Flow > Classify > Functions > Logistic

Page 51: Analytics machine learning in weka

Logistic Regression• Parameter selection

Page 52: Analytics machine learning in weka

Logistic Regression

• Model Results:

Page 53: Analytics machine learning in weka

Model Analysis• Flow > Right Click on Model > Visualize threshold curve > ROC Curve

Page 54: Analytics machine learning in weka

Model Analysis- ROC Curve

Page 55: Analytics machine learning in weka

Model Analysis- Cost Benefit Analysis• Flow > Classify > Right click model > Cost/Benefit Analysis

Page 56: Analytics machine learning in weka

Model Analysis- Cost/Benefit Analysis• Flow > Classify > Right click model > Cost/Benefit Analysis > Threshold Bar• Sliding the bar under Threshold label will change the accuracy and threshold

curve

Page 57: Analytics machine learning in weka

Save prediction output to file• Flow > Classify > Test Options > More Options > Output Predictions > Text Bar

to provide file name• Parameters: Choose: to provide file type, Attributes : First-last to get all fields,

outPutFile : File name to save data

Page 58: Analytics machine learning in weka

Re-apply model on new data

Page 59: Analytics machine learning in weka

WEKA Pluses:• Platform independent and portable, java library can be invoked from any

program in any language• User friendly GUI, with built in visualization, Simpler to use than R, large

collection of different data mining algorithms• Better results for classification and cluster modeling• Ease of designing solutions.• Provides 3 ways to use the software: the GUI, a Java API, and a command line

interface (CLI)• Can work with Spark, BigData using other packages on Experimenter or in

batch mode.

Page 60: Analytics machine learning in weka

WEKA Limitation:• Visualizations can be managed better in R with different packages like ggplot• Not really flexible for data manipulation• Accepts only limited file format lile CSV,ARFF• Limited documentation available on Explorer.

Page 61: Analytics machine learning in weka

THANK YOU !

Page 62: Analytics machine learning in weka

Decision Tree

Page 63: Analytics machine learning in weka

Decision Tree: Algorithm selection

Page 64: Analytics machine learning in weka

Decision Tree: Setting params for algo

Page 65: Analytics machine learning in weka

Decision Tree: Execution

Page 66: Analytics machine learning in weka

Tree Visualization