data mining using neural networks

7/30/2019 data mining using neural networks

1/48

Major StepsNetwork Construction and Training

Network PruningRule extraction

Traditional approaches VS. NEURAL NETWORKSConclusionReferences

Neural Network Model For Data Mining

Lubna Shaikh

Guided By- Prof Jitali Patel

January 2013

Computer Science Department Seminar
http://find/http://goback/


2/48




1 Major Steps2 Network Construction and Training

Is Neural Network Approach Appropriate?Select Appropriate ParadigmSelect Input Data and FactsData PreparationTraining Strategies

3 Network Pruning

4 Rule extraction

Rule Extraction Algorithm

5 Traditional approaches VS. NEURAL NETWORKS

6 Conclusion

7 References

http://find/


3/48




The following are the major steps required to develop aNeural Network Model for Data Mining:

Network Construction and Training

Network Pruning

Rule Extraction

Knowledge Representation



4/48





Outline1 Major Steps

2 Network Construction and TrainingIs Neural Network Approach Appropriate?

Select Appropriate ParadigmSelect Input Data and FactsData PreparationTraining Strategies

3 Network Pruning

4 Rule extractionRule Extraction Algorithm


6 Conclusion

7 ReferencesComputer Science Department Seminar

M j S
http://find/


5/48





Steps Involved

Is Neural Network approach appropriate?

Select appropriate Paradigm

Select input data and facts

Prepare data

Train and test networkUse the Network for Data Mining


M j St
http://goforward/http://find/http://goback/


6/48





Is Neural Network Approach Appropriate?

Inadequate knowledge base

Volatile knowledge basesData-intensive system

Standard technology is inadequate

Qualitative or complex quantitative reasoning is required

Data is intrinsically noisy and error-proneProject development time is short and training time for theneural network is reasonable.


Major Steps
http://find/


7/48





Select Appropriate Paradigm

Decide Network architecture according to general problemarea

Classification

FilteringPattern recognitionOptimizationData compressionPrediction

Select network sizeNo. of inputsNo. of outputsNo. of hidden layersNo. of neurons per layer


Major Steps


8/48





Select Appropriate Paradigm

Decide on learning method

Decide on transfer function

Decide on nature of input/output

Decide on type of training used.


Major Steps


9/48





Data Set Considerations

Size

Noise

Knowledge domain representation

Training set and test set

Insufficient dataCoding the input data


Major Steps


10/48





Data Set Size

Optimal size of the training set depends on the type ofnetwork used.

Size is relatively large

Rule of thumb for backpropagation networks:

Training Set Size = Number of hidden layers/Testing

Tolerance + Number of input neuron


Major Steps
http://find/


11/48





Noise and Knowledge Domain Representation

For back propagation networks, the training is more successfulwhen the data contain noise.

Training set should contain a good representation of theentire universe of the domain

May result in an increase in number of training facts, causingthe network size to change.


Major Steps
http://find/


12/48

j pNetwork Construction and Training




Selection of Variables

Reduce the size of input data without degrading theperformance of the network

Principle Component AnalysisManual Method


Major Steps
http://find/


13/48

Network Construction and TrainingNetwork Pruning

Rule extractionTraditional approaches VS. NEURAL NETWORKS

ConclusionReferences


Insufficient Data

Scarce data makes the allocation of the data into training anda testing set critical

Rotation Scheme:

Data set has N facts

Set aside one of the facts, training the system with N-1 facts.

Then set aside another fact and retrain the network with the

other N-1 facts.

Repeat the process N times.

Made-up Data:Include made up-data, idea of BOOTSTRAPPING is also used

The decision should be made as whether the distribution of

data should be maintained.

Expert-made Data


Major Steps
http://find/


14/48





Coding the Input Data

The training data set should be properly normalized. andmatch the design of the network.

Functions used:

Zero-mean-unit Variant (Zscore)Min-Max

Cut offSigmoidale


Major StepsN k C i d T i i I N l N k A h A i ?
http://find/


15/48





Data Preparation

In Distributed data set, the qualities that define a uniquepattern are spread out over more than one neuron

For example, a purple object can be represented described asbeing half red and half blue

The two neurons assigned to red and blue can together define

purple, eliminating the need to assign a third purple neuron


Major StepsN t k C t ti d T i i I N l N t k A h A i t ?
http://find/


16/48





Training StrategiesThe main intention of training is not to memorize theexamples of the training set, but to build a general model ofthe input/output relationships based on the training examples.

GeneralisationA general model means that the set of input/outputrelationships, derived from the training set, apply equally wellto new sets of data from the same problem not included in thetraining set.The main goal of a neural network is thus the generalization tonew data of the relationships learned on the training set.

OverfittingA too large amount of training can memorize all the examplesof the training set with the associated noise, errors, andinconsisten- cies, and therefore perform a poor generalization

on new data.Computer Science Department Seminar

Major StepsNetwork Construction and Training Is Neural Network Approach Appropriate?
http://find/


17/48





Network Dimension

The overfitting problem depends on the model size, the

number of free parameters, the number of constraints, thenumber of independent training examples.

A rule of thumb for obtaining good generalization is to usethe smallest network that fits the training data.

A small network, besides the better expected generalization, isalso faster to train.


http://find/


18/48





2 Network Construction and TrainingIs Neural Network Approach Appropriate?Select Appropriate ParadigmSelect Input Data and FactsData PreparationTraining Strategies

3 Network Pruning



6 Conclusion

7

ReferencesComputer Science Department Seminar



19/48




Network pruning

Pruning Algorithms

The general pruning approach consists of training a relativelylarge network and gradually removing either weights orcomplete units that seem not to be necessary.

The large initial size allows the network to learn quickly andwith a lower sensitivity to initial conditions and local minima.

The reduced final size helps to improve generalization. Thereare basically two ways of reducing the size of the originalnetwork.




20/48




Network pruning

Sensitivity methods: After learning the sensitivity of the errorfunction to the removal of every element (unit or weight) is

esti- mated: the element with the least effect can be removed.Penalty-term methods: Weight decay terms are added to theerror function, to reward the network for choosing efficientsolutions. That is networks with small weight values areprivileged. At the end of the learning process, the weightswith smallest values can be removed, but, even in case theyare not, a network with several weights close to 0 already actsas a smaller system.




21/48

gNetwork Pruning





2 Network Construction and TrainingIs Neural Network Approach Appropriate?Select Appropriate ParadigmSelect Input Data and FactsData PreparationTraining Strategies

3 Network Pruning



6 Conclusion

7 References


http://find/


22/48

gNetwork Pruning




Rule extraction

Extracts classification rules from pruned network

Rules generated are in the form of if (a1, v1,) and (x1, v1,)and ... and (xn, vn,) then Cj

Where a is are the attributes of an input tuple

vis are constants

s are relational operators (=,, !=)Cj is one of the class labels


http://find/


23/48




Difficulties in defining relationships:

Links may be still too many to express the relationshipbetween an input tuple and its class label in the form of if ...then rules.

If a network has n input links with binary values, there couldbe as many as 2n, distinct input patterns.

The rules could be quite lengthy or complex even for a small n.

The activation values of a hidden unit could be anywhere in

the range [-1, 1] depending on the input tuple.

Difficult to derive an explicit relationship between thecontinuous activation values of the hidden units and theoutput values of a unit in the output layer.




24/48




Rule Extraction Algorithm, (RX)

1 Apply a clustering algorithm to find clusters of hidden nodeactivation values.

2 Enumerate the discretized activation values and compute thenetwork outputs.

3 Generate rules that describe the network outputs in terms ofthe discretized hidden unit activation values.

4 For each hidden unit, enumerate the input values that lead to

them and generate a set of rules to describe the hidden unitsdiscretized values in terms of the inputs.

5 Merge the two sets of rules obtained in the previous two stepsto obtain rules that relate the inputs and outputs.



N k P i
http://find/


25/48




Clustering1st step of RX clusters the activation values of hidden unitsinto a manageable number of discrete values

Without sacrificing the classification accuracy of the network

Neural Network based clustering method

Represents each cluster as an exemplar which acts as aprototype

New objects can be distributed to the cluster whose exemplar

is the most similar based on some distance measure.Neural network approach has strong theoretical links withactual brain processing.

Competitive LearningSelf-organizing feature maps



N t k P i
http://find/


26/48




Competitive Learning

Hierarchical architecture of several artificial Neurons

Winner-takes-all fashionWinning unit within each cluster becomes active(filled circles)while others are inactive.

Connections between layers are excitatory- inputs are receivedfrom lower levels.

The units within a cluster compete to responds to the patternthat is output from the layer below.



Network Pruning
http://find/


27/48





Connections within layers are inhibitory so that only 1 unit in

a given cluster may be active.The winning unit adjusts the weights on its connectionsbetween other units in the cluster so that it will respond morestrongly in future.

The number of clusters and the number of units per clusterare input parameters.



Network Pruning


28/48





Figure: Competitive LearningComputer Science Department Seminar


Network Pruning


29/48




Self Organising feature maps(SOMS)

The learning algorithm in SOM still follows the competitivemodel, but the updating rule produces an output layer, wherethe topology of the patterns in the input space is preserved.

That means that if patterns xr and xs are close in the inputspace - close on the basis of the similarity measure adopted inthe winner-take-all rule- the corresponding firing neural unitsare topologically close in the network layer.

A network that performs such a mapping is called a featuremap. Feature maps not only group input patterns intoclusters, but also visually describe the relationships amongthese clusters in the input space.



Network Pruning
http://find/


30/48




Self Organising feature maps(SOMS)A Kohonen map consists usually of a two-dimensional array ofneurons fully connected with the input vector, without lateralconnections, arranged on a squared or hexagonal lattice.

The topology preserving property is obtained by a learningrule that involves the winner unit and its neighbors in theweight updating process.

As a consequence, close neurons in the output layer learn to

fire for input vectors with similar characteristics.During training the network assigns to firing neurons aposition on the map, based on the dominant feature of theactivating input vector. For this reason Kohonen maps arealso called Self-Organizing Maps(SOM).



Network Pruning
http://find/


31/48

gRule extraction




Figure: Self Organizing Maps (SOMS)Computer Science Department Seminar


Network Pruning
http://find/


32/48

gRule extraction




2nd step is to relate these discretized activation values withthe output layer activation values, i e, the class labels.

3rd step is to relate them with the attribute values at thenodes connected to the hidden node.

Input:

Set of discrete patterns with the class labels and produces therules describing the relationship between the patterns andtheir class labels



Network Pruning
http://find/


33/48




Knowledge Extraction

One of the reasons of Neural Networks success is its ability todevelop an internal representation of the knowledge necessaryto solve a given problem.

However, such internal knowledge representation is verydifficult to understand and to translate into symbolicknowledge, due to its distributed nature.

At the end of the learning process, the networks knowledge is

spread all over its weights and units.In addition, even if a translation into symbolic rules is possibleit might not have physical meaning, because the networkscom putation does not take into account the physical rangesof the input variables.



Network Pruningi i A i
http://find/


34/48




Knowledge Extraction

Generally the internal knowledge representation of neuralnetworks presents a very low degree of human

comprehensibility and, for this reason, it has often beendescribed as opaque to the outside world.

This lack of comprehension of how decisions are made inside aneural network definitely represents a strong limitation for the

application of ANNs to intelligent data analysis.Several real world applications need an explanation of how agiven decision is reached.



Network PruningR l t ti R l E t ti Al ith
http://find/


35/48




Advantages Of Neural Networks?

High Accuracy: Neural networks are able to approximatecomplex non-linear mappings.

Noise Tolerance: Neural networks are very flexible withrespect to incomplete, missing and noisy data.

Independence from prior assumptions:

Neural networks can be updated with fresh data, making themuseful for dynamic environments.

Hidden nodes, in supervised neural networks can be regardedas latent variables.

Neural networks can be implemented in parallel hardware



Network PruningR le extraction R le Extraction Algorithm
http://find/


36/48




Disadvantages Of Neural Networks?

One of the main drawbacks of ANN paradigms consists of thelack of criteria for the a priori definition of the optimal

network size for a given task.The space generated by all possible ANN structures withdifferent size for a selected ANN paradigm can then becomethe object of other data analysis techniques.

Genetic algorithms, for example, have been recently applied tothis problem, to build a population of good ANN architectureswith respect to a given task.


Major StepsNetwork Construction and TrainingNetwork Pruning

Rule extraction Rule Extraction Algorithm
http://find/


37/48




Disadvantages Of Neural Networks?

ANNs decision processes remain still quite opaque and atrans- lation into meaningful symbolic knowledge hard to

perform.On the contrary, fuzzy systems are usually appreciated for thetransparency of their decisional algorithms.

The combination of the ANN approach and of fuzzy logic has

produced hybrid architectures, called neuro-fuzzy networks.Learning rules are no more constrained into the traditionalcrisp logic, but exploit the linguistic power of fuzzy logic.



Rule extraction
http://find/


38/48



Outline

1 Major Steps

2 Network Construction and TrainingIs Neural Network Approach Appropriate?Select Appropriate Paradigm

Select Input Data and FactsData PreparationTraining Strategies

3 Network Pruning



6 Conclusion

7 References



Rule extraction
http://find/


39/48



Traditional approaches of Data Mining VS. NeuralNetworks

Foundation: Logic vs. Brain

Traditional Approach: Simulate and formalize human reasoningand logic process. TA treats the brain as a black box.TA focuses on how the elements are related to each other andhow to give the machine the same capabilities.Neural Networks: Simulate the intelligence functions of the

brain. NN focus on modeling the brain structure.NN attempts to create a system that functions like the brainbecause it has a structure similar to the structure of the brain



Rule extraction
http://find/


40/48



Processing Techniques: Sequential vs. Parallel

Traditional Approach: The processing method of TA is

inherently sequential.

Neural Networks: The processing method of NN is inherentlyparallel.

Each neuron in a neural network system functions in parallel

with others



Rule extraction


41/48


Learning: Static and External vs. Dynamic and Internal

Traditional Approach: Learning takes place outside of thesystem.

The knowledge is obtained outside the system and then codedinto the system.

Neural Networks: Learning is an integral part of the systemand its design.

Knowledge is stored as the strength of the connections amongthe neurons and it is the job of NN to learn these weightsfrom a data set presented to it.



Rule extraction
http://find/


42/48


Reasoning Method: Deductive vs. Inductive

Traditional Approach: Is deductive in nature.

The use of the system involves a deductive reasoning process,

applying the generalized knowledge to a given case.Neural Networks: Is inductive in nature.

It constructs an internal knowledge base from the datapresented to it.

It generalizes from the data, such that when it is presented anew set of data, it can make a decision based on thegeneralized internal knowledge.



Rule extraction
http://find/


43/48


Knowledge Representation: Explicit vs. Implicit

Traditional Approach: It represents knowledge in an explicit

form.Rules and relationships can be inspected and altered.

Neural Networks: The knowledge is stored in the form ofinterconnections strengths among neurons

Nowhere in the system, can one pick up a piece of computercode or a numerical value as a discernible piece of knowledge.



Rule extractionT di i l h VS NEURAL NETWORKS
http://find/


44/48


Outline

1 Major Steps



3 Network Pruning



6 Conclusion

7 References



Rule extractionT diti l h VS NEURAL NETWORKS
http://find/


45/48


Conclusion

A study of the neural network based data mining process show thatneural network is very suitable for solving the problems of datamining because its characteristics of good robustness,

self-organizing adaptive, parallel processing, distributed storage andhigh degree of fault tolerance. The combination of data miningmethod and neural network model can greatly improve theefficiency of data mining methods, and it has been widely used.One of the issues is to reduce the training time of neural networks.

The speed of network training by developing fast algorithms can beimproved. Tthe time required to extract rules by neural networkapproach is still longer than the time needed by the decision treebased approach.



Rule extractionTraditional approaches VS NEURAL NETWORKS
http://find/


46/48


Outline

1 Major Steps



3 Network Pruning



6 Conclusion

7 References



http://find/


47/48


References

Effective Data Mining Using Neural Networks by HongjunLu, Member, IEEE Computer Society, Rudy Setiono, andHuan Liu, Member, IEEE

Introduction to Data Mining Using Artificial Neural Networks,Dr. Hamid Nemati

An Introduction to Data Mining by Kurt Thearling, Ph.D.,www.thearling.com

Data Mining: Concepts and Techniques Jiawei Han andMicheline Kamber, Morgan Kaufmann, 2001.

Anderson, J. A. (2003). An introduction to neural networks,Prentice Hall.



http://find/


48/48


References

http://www.mathworks.in/products/neuralnetwork/description2.html


data mining using neural networks

Documents