data mining

MSc IT Part – I, Semester-1 Page No:- ________DATA MINING Date:- ____________

PRACTICAL NO: 1

Aim:Build the data mining model structure and built the decision tree with proper decision nodes and infer at least five different types of reports. Implement Using RTool.

Solution:

Dataset Used :Iris

Step 1:Display the Structure of iris data.

Fig 1.1: Structure of iris data

Step 2:The random seed is set to a fixed value below to make the results reproducible.

Fig 1.2:Random Seed Set

Sonali. Parab.


Step 3:Install the party package if it is not installed. Load the party package, build adecision tree, and check the prediction result.

Fig 1.3: Load Party library

Fig 1.4: iris table

Step 4:printing the rules and plot the tree

Fig 1.5: Rules of data

Sonali. Parab.


A. Report 1

Fig 1.6: Decision Tree

Sonali. Parab.


Step 5:Plot Decision tree in simple style

Fig 1.7: Command to plot decision tree in simple style

B. Report 2

Fig 1.8: Decision tree (Simple Style)

Sonali. Parab.


Step 6:Plot iris species in bar plot

Fig 1.9: bar plot command

C. Report 3

Fig 1.10:Barplot of Species

Sonali. Parab.


Step 7:Plot iris Species in pie chart

Fig 1.11: Command for pie chart

D. Report 4

Fig 1.12: Pie Chart

Sonali. Parab.


Step 8:Plot histogram of iris Petal Length

Fig 1.13: Command to plot histogram

E. Report 5

Fig 1.14: Histogram of iris Petal Length

Sonali. Parab.


PRACTICAL NO: 2

Aim:Build the data mining model structure and Implement Naïve Bayes Algorithm.

Implement Using WEKA.

Solution:

Dataset Used :Diabetes.arff

Step 1:Pre-processing

Go to WekaOpen file go to weka folder select diabetes.arff dataset open

Fig 2.1 Choosing diabetes.arff dataset

Sonali. Parab.


Step 2:Filter the data

FilterssuperviseddiscretizeApply

Fig 2.2 Selecting the Filter

Fig 2.3 Structure of Filtered Diabetes.arff Dataset

Sonali. Parab.


Step 3:Classify the data using Naïve Bayes Algorithm

Fig 2.4 Select Classification Algorithm

Fig 2.5 Running and Displaying Result

Sonali. Parab.


=== Run information ===

Scheme:weka.classifiers.bayes.NaiveBayes

Relation: pima_diabetes-weka.filters.supervised.attribute.Discretize-Rfirst-last

Instances: 768

Attributes: 9

preg

plas

pres

skin

insu

mass

pedi

age

class

Test mode:10-fold cross-validation

=== Classifier model (full training set) ===

Naive Bayes Classifier

Class

Attribute tested_negative tested_positive

(0.65) (0.35)

====================================================

preg

Sonali. Parab.


'(-inf-6.5]' 427.0 174.0

'(6.5-inf)' 75.0 96.0

[total] 502.0 270.0

plas

'(-inf-99.5]' 182.0 17.0

'(99.5-127.5]' 211.0 79.0

'(127.5-154.5]' 86.0 77.0

'(154.5-inf)' 25.0 99.0

[total] 504.0 272.0

pres

'All' 501.0 269.0

[total] 501.0 269.0

skin

'All' 501.0 269.0

[total] 501.0 269.0

insu

'(-inf-14.5]' 237.0 140.0

'(14.5-121]' 165.0 28.0

'(121-inf)' 101.0 103.0

[total] 503.0 271.0

mass

'(-inf-27.85]' 196.0 28.0

Sonali. Parab.


'(27.85-inf)' 306.0 242.0

[total] 502.0 270.0

pedi

'(-inf-0.5275]' 362.0 149.0

'(0.5275-inf)' 140.0 121.0

[total] 502.0 270.0

age

'(-inf-28.5]' 297.0 72.0

'(28.5-inf)' 205.0 198.0

[total] 502.0 270.0

Time taken to build model: 0 seconds

Sonali. Parab.


Step 4: Visualize classifiers errors

Fig 2.6 Visualization of Classification Errors

Sonali. Parab.


PRACTICAL NO: 3

Aim:Implement the clustering Algorithm By Using Weka Tool.

Solution:

Dataset Used :Iris.arff

Step 1:Preprocess

Open file go to weka folder select iris dataset Choose Filterssuperviseddiscretize

Fig 3.1: Structure of iris data

Sonali. Parab.


Fig 3.2: Filtering the Data

Fig 3.3: Filtered Dataset

Sonali. Parab.


Step 2:Cluster

Select cluster tabchoose button clusterers select simplekmeans click radio button use training setright click “Poperties” numClusters= 3click start button.

Fig 3.4 Configuring Clustering Algorithm

Fig 3.5 Generating Result

Sonali. Parab.



Scheme:weka.clusterers.SimpleKMeans -N 3 -A "weka.core.EuclideanDistance -R first-last" -I 500 -S 10

Relation: iris-weka.filters.supervised.attribute.Discretize-Rfirst-last

Instances: 150

Attributes: 5

sepallength

sepalwidth

petallength

petalwidth

class

Test mode:evaluate on training data

=== Model and evaluation on training set ===

kMeans

======

Number of iterations: 5

Within cluster sum of squared errors: 109.0

Missing values globally replaced with mean/mode

Cluster centroids:

Cluster#

Sonali. Parab.


Attribute Full Data 0 1 2

(150) (50) (50) (50)

=====================================================

sepallength '(-inf-5.55]' '(-inf-5.55]' '(5.55-6.15]' '(6.15-inf)'

sepalwidth '(-inf-2.95]' '(3.35-inf)' '(-inf-2.95]' '(2.95-3.35]'

petallength '(4.75-inf)' '(-inf-2.45]' '(2.45-4.75]' '(4.75-inf)'

petalwidth '(0.8-1.75]' '(-inf-0.8]' '(0.8-1.75]' '(1.75-inf)'

class Iris-setosa Iris-setosa Iris-versicolor Iris-virginica

Time taken to build model (full training data) : 0 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 50 ( 33%)

1 50 ( 33%)

2 50 ( 33%)

Sonali. Parab.


Step 4:Visualizing the Result

Right click on resultvisualize cluster assignments

Fig 3.6 Selecting Visualization

Fig 3.7 Displaying Visualization Result

Sonali. Parab.


PRACTICAL NO: 4

Aim :Build the basic Time series model structure and create the predictions BodyFatDataset.By Using RTool.

Solution:

Dataset Used :BodyFat

Step 1 :load Package mboost.

Fig 4.1 : Show the load Of Package mboost.

Sonali. Parab.


Step2 :To Show Data stored in BodyFat Dataset.

Fig 4.2 : Show The Data stored in BodyFat Dataset.

Step 3 :Select the Summary Of BodyFat Dataset.

Fig 4.3 :Show The Summary Of BodyFat Dataset.

Sonali. Parab.


Step4 :Using Predication Method And Plot Graph On BodyFat Dataset.

Fig 4.4 : Show Predication Method And Plot Graph Formula ApplyOn BodyFat Dataset.

Step5 :Predication Graph For BodyFat Dataset.

Fig 4.5 :Show The Predication Graph For BodyFat Dataset.

Sonali. Parab.


PRACTICAL NO: 5

Aim:Build the data mining model and implement k-nearest neighbor By Using Weka Tool.

Solution:

Dataset Used:ContactLenses.arff

Step 1:Preprocess

Open file go to weka folder select contact lens dataset Choose Filterssuperviseddiscretize

Fig 5.1: Structure of contact lens dataset

Sonali. Parab.



Fig 5.3:Filtered Dataset

Sonali. Parab.


Step 2:Classify

Select classify tabchoose buttonexpand Lazy folderselect IBKclick radio button use training setclick start button.

Fig 5.4 Choosing K-nearest neighbour algorithm

Fig 5.5 Generating Result

Sonali. Parab.



Scheme:weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""

Relation: contact-lenses-weka.filters.supervised.attribute.Discretize-Rfirst-last

Instances: 24

Attributes: 5

age

spectacle-prescrip

astigmatism

tear-prod-rate

contact-lenses

Test mode:evaluate on training data

=== Classifier model (full training set) ===

IB1 instance-based classifier

using 1 nearest neighbour(s) for classification

Time taken to build model: 0 seconds

=== Evaluation on training set ===

=== Summary ===

Correctly Classified Instances 24 100 %

Sonali. Parab.


Incorrectly Classified Instances 0 0 %

Kappa statistic 1

Mean absolute error 0.0494

Root mean squared error 0.0524

Relative absolute error 13.4078 %

Root relative squared error 12.3482 %

Total Number of Instances 24

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class

1 0 1 1 1 1 soft

1 0 1 1 1 1 hard

1 0 1 1 1 1 none

Weighted Avg. 1 0 1 1 1 1

=== Confusion Matrix ===

a b c <-- classified as

5 0 0 | a = soft

0 4 0 | b = hard

0 0 15 | c = none

Sonali. Parab.


PRACTICAL NO: 6


Solution:

Dataset Used:Supermarket.arff

Step 1:Preprocess

Open file go to Weka folder select Supermarket dataset Choose FiltersAll Filter

Fig 6.1: Structure of Supermarket dataset

Sonali. Parab.



Fig 6.3: Filtered Dataset

Sonali. Parab.


Step 2:Associate

Select Associate tabchoose apriori algorithmpropertiesconfigure algorithm according to requirementsclick ‘start’

Fig 6.4 Choosing Apriori Algorithm

Fig 6.5 Configuring Algorithm

Sonali. Parab.


Fig 6.6 Displaying Association Results


Scheme: weka.associations.Apriori -N 12 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1

Relation: supermarket-weka.filters.AllFilter-weka.filters.AllFilter-weka.filters.AllFilter-weka.filters.AllFilter-weka.filters.MultiFilter-Fweka.filters.AllFilter-weka.filters.AllFilter-weka.filters.AllFilter

Instances: 4627

Attributes: 217

[list of attributes omitted]

=== Associator model (full training set) ===

Sonali. Parab.


Apriori

=======

Minimum support: 0.15 (694 instances)

Minimum metric <confidence>: 0.9

Number of cycles performed: 17

Generated sets of large itemsets:

Size of set of large itemsets L(1): 44






Best rules found:

1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 conf:(0.92)

2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696 conf:(0.92)

3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705 conf:(0.92)

Sonali. Parab.


4. biscuits=t fruit=t vegetables=t total=high 815 ==> bread and cake=t 746 conf:(0.92)

5. party snack foods=t fruit=t total=high 854 ==> bread and cake=t 779 conf:(0.91)

6. biscuits=t frozen foods=t vegetables=t total=high 797 ==> bread and cake=t 725 conf:(0.91)

7. baking needs=t biscuits=t vegetables=t total=high 772 ==> bread and cake=t 701 conf:(0.91)

8. biscuits=t fruit=t total=high 954 ==> bread and cake=t 866 conf:(0.91)

9. frozen foods=t fruit=t vegetables=t total=high 834 ==> bread and cake=t 757 conf:(0.91)

10. frozen foods=t fruit=t total=high 969 ==> bread and cake=t 877 conf:(0.91)

11. baking needs=t fruit=t vegetables=t total=high 831 ==> bread and cake=t 752 conf:(0.9)

12. biscuits=t milk-cream=t total=high 907 ==> bread and cake=t 820 conf:(0.9)

Sonali. Parab.


PRACTICAL NO: 7


Solution:

Dataset Used:Titanic

Step 1:Preprocess

Loading the Data in Data Frame

Transforming the Data into Suitable Format

Fig 7.1: Structure of Titanic dataset

Sonali. Parab.


Fig 7.2 Summary of Titanic Dataset

Step 2:Associate

Loading library ‘arules’ that contains functions for Association mining

Function used to apply Apriori Algorithm with Default Configuration

Fig 7.3 Choosing Apriori Algorithm

Sonali. Parab.


Fig 7.4 Inspecting the Results of Apriori Algorithm

Fig 7.5 Applying Settings to Display Rules with RHS containing survived only

Sonali. Parab.


Step 3:Finding and Removing Redundant Rules

Code to Find Redundant Rules

Code to Remove Redundant Rules

Fig 7.6 Finding & Removing Redundant Rules

Sonali. Parab.


Step 4:Visualizing:

Loading library aulesViz which contains functions for Visualizing Assoication Results

Function to plot Results Using Scatter Plot

X axis: Support

Y axis:Confidence

Fig 7.7 Scatter Plot

Sonali. Parab.


Function to plot Association Results as Graph Plot

Fig 7.8 Graph Plot Showing How Data Items are Assoicated

Sonali. Parab.


PRACTICAL NO: 8

Aim:Consider the suitable data for text mining and Implement the Text Mining technique using R-Tool.

Solution:

Dataset Used:Plain Text File (www.txt)

Step 1:Loading the Text File

Loading Essential Libraries for Text Mining tm, SnowballC and twitteR

Loading The Data From Text File Into RTool Using readLines()

Fig 8.1: Using tail() and head() functions to display start and of paragraphs

Sonali. Parab.


Step 2:Transforming

Loading tm library and transforming document to Corpusdoc

Fig 8.2 Inspecting Corpusdoc

Function to Remove Punctuations

Fig 8.3 Removing Punctuations

Sonali. Parab.


Function to Strip White Spaces

Fig 8.4 Stripping White Spaces

Function to Remove Stop Words from Document

Fig 8.5 Removing Stop Words From Document

Sonali. Parab.


Function to Stem the Document

Fig 8.6 Stemming the Document

Function to Convert corpusdoc to TermDocumentMatrix

Fig 8.7 Inspecting TermDocumentMatrix

Sonali. Parab.


Step 3:Finding Frequent Terms in Document

Fig 8.7 Find Frequent Terms From Document

Step 4:Finding Association among terms

Function to find Association among Different terms in Document

Fig 8.8 Result of How Strongly Terms Are Associated with Term “information”

Sonali. Parab.

data mining

Education

data mining date

data mining model structure

rules of data

iris dataset

plot iris species

structure of iris datastep

iris tablestep

plot decision tree