data mining

56
MSc IT Part – I, Semester-1 Page No:- ________ DATA MINING Date:- ____________ PRACTICAL NO: 1 Aim: Build the data mining model structure and built the decision tree with proper decision nodes and infer at least five different types of reports. Implement Using RTool. Solution: Dataset Used : Iris Step 1: Display the Structure of iris data. Fig 1.1: Structure of iris data Step 2: The random seed is set to a fixed value below to make the results reproducible. Fig 1.2:Random Seed Set Sonali. Parab.

Upload: sonali-parab

Post on 15-Jun-2015

253 views

Category:

Education


1 download

DESCRIPTION

Data mining (DM) manual. Data mining refers to the process of analysing the data from different perspectives and summarizing it into useful information. Data mining software is one of the number of tools used for analysing data. It allows users to analyse from many different dimensions and angles, categorize it, and summarize the relationship identified. Data mining is about technique for finding and describing Structural Patterns in data.

TRANSCRIPT

Page 1: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

PRACTICAL NO: 1

Aim:Build the data mining model structure and built the decision tree with proper decision nodes and infer at least five different types of reports. Implement Using RTool.

Solution:

Dataset Used :Iris

Step 1:Display the Structure of iris data.

Fig 1.1: Structure of iris data

Step 2:The random seed is set to a fixed value below to make the results reproducible.

Fig 1.2:Random Seed Set

Sonali. Parab.

Page 2: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 3:Install the party package if it is not installed. Load the party package, build adecision tree, and check the prediction result.

Fig 1.3: Load Party library

Fig 1.4: iris table

Step 4:printing the rules and plot the tree

Fig 1.5: Rules of data

Sonali. Parab.

Page 3: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

A. Report 1

Fig 1.6: Decision Tree

Sonali. Parab.

Page 4: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 5:Plot Decision tree in simple style

Fig 1.7: Command to plot decision tree in simple style

B. Report 2

Fig 1.8: Decision tree (Simple Style)

Sonali. Parab.

Page 5: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 6:Plot iris species in bar plot

Fig 1.9: bar plot command

C. Report 3

Fig 1.10:Barplot of Species

Sonali. Parab.

Page 6: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 7:Plot iris Species in pie chart

Fig 1.11: Command for pie chart

D. Report 4

Fig 1.12: Pie Chart

Sonali. Parab.

Page 7: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 8:Plot histogram of iris Petal Length

Fig 1.13: Command to plot histogram

E. Report 5

Fig 1.14: Histogram of iris Petal Length

Sonali. Parab.

Page 8: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

PRACTICAL NO: 2

Aim:Build the data mining model structure and Implement Naïve Bayes Algorithm.

Implement Using WEKA.

Solution:

Dataset Used :Diabetes.arff

Step 1:Pre-processing

Go to WekaOpen file go to weka folder select diabetes.arff dataset open

Fig 2.1 Choosing diabetes.arff dataset

Sonali. Parab.

Page 9: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 2:Filter the data

FilterssuperviseddiscretizeApply

Fig 2.2 Selecting the Filter

Fig 2.3 Structure of Filtered Diabetes.arff Dataset

Sonali. Parab.

Page 10: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 3:Classify the data using Naïve Bayes Algorithm

Fig 2.4 Select Classification Algorithm

Fig 2.5 Running and Displaying Result

Sonali. Parab.

Page 11: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

=== Run information ===

Scheme:weka.classifiers.bayes.NaiveBayes

Relation: pima_diabetes-weka.filters.supervised.attribute.Discretize-Rfirst-last

Instances: 768

Attributes: 9

preg

plas

pres

skin

insu

mass

pedi

age

class

Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

Naive Bayes Classifier

Class

Attribute tested_negative tested_positive

(0.65) (0.35)

====================================================

preg

Sonali. Parab.

Page 12: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

'(-inf-6.5]' 427.0 174.0

'(6.5-inf)' 75.0 96.0

[total] 502.0 270.0

plas

'(-inf-99.5]' 182.0 17.0

'(99.5-127.5]' 211.0 79.0

'(127.5-154.5]' 86.0 77.0

'(154.5-inf)' 25.0 99.0

[total] 504.0 272.0

pres

'All' 501.0 269.0

[total] 501.0 269.0

skin

'All' 501.0 269.0

[total] 501.0 269.0

insu

'(-inf-14.5]' 237.0 140.0

'(14.5-121]' 165.0 28.0

'(121-inf)' 101.0 103.0

[total] 503.0 271.0

mass

'(-inf-27.85]' 196.0 28.0

Sonali. Parab.

Page 13: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

'(27.85-inf)' 306.0 242.0

[total] 502.0 270.0

pedi

'(-inf-0.5275]' 362.0 149.0

'(0.5275-inf)' 140.0 121.0

[total] 502.0 270.0

age

'(-inf-28.5]' 297.0 72.0

'(28.5-inf)' 205.0 198.0

[total] 502.0 270.0

Time taken to build model: 0 seconds

Sonali. Parab.

Page 14: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 4: Visualize classifiers errors

Fig 2.6 Visualization of Classification Errors

Sonali. Parab.

Page 15: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

PRACTICAL NO: 3

Aim:Implement the clustering Algorithm By Using Weka Tool.

Solution:

Dataset Used :Iris.arff

Step 1:Preprocess

Open file go to weka folder select iris dataset Choose Filterssuperviseddiscretize

Fig 3.1: Structure of iris data

Sonali. Parab.

Page 16: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Fig 3.2: Filtering the Data

Fig 3.3: Filtered Dataset

Sonali. Parab.

Page 17: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 2:Cluster

Select cluster tabchoose button clusterers select simplekmeans click radio button use training setright click “Poperties” numClusters= 3click start button.

Fig 3.4 Configuring Clustering Algorithm

Fig 3.5 Generating Result

Sonali. Parab.

Page 18: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

=== Run information ===

Scheme:weka.clusterers.SimpleKMeans -N 3 -A "weka.core.EuclideanDistance -R first-last" -I 500 -S 10

Relation: iris-weka.filters.supervised.attribute.Discretize-Rfirst-last

Instances: 150

Attributes: 5

sepallength

sepalwidth

petallength

petalwidth

class

Test mode:evaluate on training data

=== Model and evaluation on training set ===

kMeans

======

Number of iterations: 5

Within cluster sum of squared errors: 109.0

Missing values globally replaced with mean/mode

Cluster centroids:

Cluster#

Sonali. Parab.

Page 19: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Attribute Full Data 0 1 2

(150) (50) (50) (50)

=====================================================

sepallength '(-inf-5.55]' '(-inf-5.55]' '(5.55-6.15]' '(6.15-inf)'

sepalwidth '(-inf-2.95]' '(3.35-inf)' '(-inf-2.95]' '(2.95-3.35]'

petallength '(4.75-inf)' '(-inf-2.45]' '(2.45-4.75]' '(4.75-inf)'

petalwidth '(0.8-1.75]' '(-inf-0.8]' '(0.8-1.75]' '(1.75-inf)'

class Iris-setosa Iris-setosa Iris-versicolor Iris-virginica

Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 50 ( 33%)

1 50 ( 33%)

2 50 ( 33%)

Sonali. Parab.

Page 20: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 4:Visualizing the Result

Right click on resultvisualize cluster assignments

Fig 3.6 Selecting Visualization

Fig 3.7 Displaying Visualization Result

Sonali. Parab.

Page 21: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

PRACTICAL NO: 4

Aim :Build the basic Time series model structure and create the predictions BodyFatDataset.By Using RTool.

Solution:

Dataset Used :BodyFat

Step 1 :load Package mboost.

Fig 4.1 : Show the load Of Package mboost.

Sonali. Parab.

Page 22: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step2 :To Show Data stored in BodyFat Dataset.

Fig 4.2 : Show The Data stored in BodyFat Dataset.

Step 3 :Select the Summary Of BodyFat Dataset.

Fig 4.3 :Show The Summary Of BodyFat Dataset.

Sonali. Parab.

Page 23: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step4 :Using Predication Method And Plot Graph On BodyFat Dataset.

Fig 4.4 : Show Predication Method And Plot Graph Formula ApplyOn BodyFat Dataset.

Step5 :Predication Graph For BodyFat Dataset.

Fig 4.5 :Show The Predication Graph For BodyFat Dataset.

Sonali. Parab.

Page 24: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

PRACTICAL NO: 5

Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool.

Solution:

Dataset Used:ContactLenses.arff

Step 1:Preprocess

Open file go to weka folder select contact lens dataset Choose Filterssuperviseddiscretize

Fig 5.1: Structure of contact lens dataset

Sonali. Parab.

Page 25: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Fig 5.2: Filtering the Data

Fig 5.3:Filtered Dataset

Sonali. Parab.

Page 26: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 2:Classify

Select classify tabchoose buttonexpand Lazy folderselect IBKclick radio button use training setclick start button.

Fig 5.4 Choosing K-nearest neighbour algorithm

Fig 5.5 Generating Result

Sonali. Parab.

Page 27: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

=== Run information ===

Scheme:weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""

Relation: contact-lenses-weka.filters.supervised.attribute.Discretize-Rfirst-last

Instances: 24

Attributes: 5

age

spectacle-prescrip

astigmatism

tear-prod-rate

contact-lenses

Test mode:evaluate on training data

=== Classifier model (full training set) ===

IB1 instance-based classifier

using 1 nearest neighbour(s) for classification

Time taken to build model: 0 seconds

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 24 100 %

Sonali. Parab.

Page 28: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Incorrectly Classified Instances 0 0 %

Kappa statistic 1

Mean absolute error 0.0494

Root mean squared error 0.0524

Relative absolute error 13.4078 %

Root relative squared error 12.3482 %

Total Number of Instances 24

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

1 0 1 1 1 1 soft

1 0 1 1 1 1 hard

1 0 1 1 1 1 none

Weighted Avg. 1 0 1 1 1 1

=== Confusion Matrix ===

a b c <-- classified as

5 0 0 | a = soft

0 4 0 | b = hard

0 0 15 | c = none

Sonali. Parab.

Page 29: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

PRACTICAL NO: 6

Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool.

Solution:

Dataset Used:Supermarket.arff

Step 1:Preprocess

Open file go to Weka folder select Supermarket dataset Choose FiltersAll Filter

Fig 6.1: Structure of Supermarket dataset

Sonali. Parab.

Page 30: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Fig 6.2: Filtering the Data

Fig 6.3: Filtered Dataset

Sonali. Parab.

Page 31: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 2:Associate

Select Associate tabchoose apriori algorithmpropertiesconfigure algorithm according to requirementsclick ‘start’

Fig 6.4 Choosing Apriori Algorithm

Fig 6.5 Configuring Algorithm

Sonali. Parab.

Page 32: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Fig 6.6 Displaying Association Results

=== Run information ===

Scheme: weka.associations.Apriori -N 12 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1

Relation: supermarket-weka.filters.AllFilter-weka.filters.AllFilter-weka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.AllFilter

Instances: 4627

Attributes: 217

[list of attributes omitted]

=== Associator model (full training set) ===

Sonali. Parab.

Page 33: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Apriori

=======

Minimum support: 0.15 (694 instances)

Minimum metric <confidence>: 0.9

Number of cycles performed: 17

Generated sets of large itemsets:

Size of set of large itemsets L(1): 44

Size of set of large itemsets L(2): 380

Size of set of large itemsets L(3): 910

Size of set of large itemsets L(4): 633

Size of set of large itemsets L(5): 105

Size of set of large itemsets L(6): 1

Best rules found:

1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 conf:(0.92)

2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696 conf:(0.92)

3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705 conf:(0.92)

Sonali. Parab.

Page 34: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

4. biscuits=t fruit=t vegetables=t total=high 815 ==> bread and cake=t 746 conf:(0.92)

5. party snack foods=t fruit=t total=high 854 ==> bread and cake=t 779 conf:(0.91)

6. biscuits=t frozen foods=t vegetables=t total=high 797 ==> bread and cake=t 725 conf:(0.91)

7. baking needs=t biscuits=t vegetables=t total=high 772 ==> bread and cake=t 701 conf:(0.91)

8. biscuits=t fruit=t total=high 954 ==> bread and cake=t 866 conf:(0.91)

9. frozen foods=t fruit=t vegetables=t total=high 834 ==> bread and cake=t 757 conf:(0.91)

10. frozen foods=t fruit=t total=high 969 ==> bread and cake=t 877 conf:(0.91)

11. baking needs=t fruit=t vegetables=t total=high 831 ==> bread and cake=t 752 conf:(0.9)

12. biscuits=t milk-cream=t total=high 907 ==> bread and cake=t 820 conf:(0.9)

Sonali. Parab.

Page 35: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

PRACTICAL NO: 7

Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool.

Solution:

Dataset Used:Titanic

Step 1:Preprocess

Loading the Data in Data Frame

Transforming the Data into Suitable Format

Fig 7.1: Structure of Titanic dataset

Sonali. Parab.

Page 36: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Fig 7.2 Summary of Titanic Dataset

Step 2:Associate

Loading library ‘arules’ that contains functions for Association mining

Function used to apply Apriori Algorithm with Default Configuration

Fig 7.3 Choosing Apriori Algorithm

Sonali. Parab.

Page 37: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Fig 7.4 Inspecting the Results of Apriori Algorithm

Fig 7.5 Applying Settings to Display Rules with RHS containing survived only

Sonali. Parab.

Page 38: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 3:Finding and Removing Redundant Rules

Code to Find Redundant Rules

Code to Remove Redundant Rules

Fig 7.6 Finding & Removing Redundant Rules

Sonali. Parab.

Page 39: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 4:Visualizing:

Loading library aulesViz which contains functions for Visualizing Assoication Results

Function to plot Results Using Scatter Plot

X axis: Support

Y axis:Confidence

Fig 7.7 Scatter Plot

Sonali. Parab.

Page 40: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Function to plot Association Results as Graph Plot

Fig 7.8 Graph Plot Showing How Data Items are Assoicated

Sonali. Parab.

Page 41: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

PRACTICAL NO: 8

Aim:Consider the suitable data for text mining and Implement the Text Mining technique using R-Tool.

Solution:

Dataset Used:Plain Text File (www.txt)

Step 1:Loading the Text File

Loading Essential Libraries for Text Mining tm, SnowballC and twitteR

Loading The Data From Text File Into RTool Using readLines()

Fig 8.1: Using tail() and head() functions to display start and of paragraphs

Sonali. Parab.

Page 42: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 2:Transforming

Loading tm library and transforming document to Corpusdoc

Fig 8.2 Inspecting Corpusdoc

Function to Remove Punctuations

Fig 8.3 Removing Punctuations

Sonali. Parab.

Page 43: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Function to Strip White Spaces

Fig 8.4 Stripping White Spaces

Function to Remove Stop Words from Document

Fig 8.5 Removing Stop Words From Document

Sonali. Parab.

Page 44: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Function to Stem the Document

Fig 8.6 Stemming the Document

Function to Convert corpusdoc to TermDocumentMatrix

Fig 8.7 Inspecting TermDocumentMatrix

Sonali. Parab.

Page 45: Data Mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

Step 3:Finding Frequent Terms in Document

Fig 8.7 Find Frequent Terms From Document

Step 4:Finding Association among terms

Function to find Association among Different terms in Document

Fig 8.8 Result of How Strongly Terms Are Associated with Term “information”

Sonali. Parab.