chapter 7 neural networks in data mining

18
Chapter 7 Chapter 7 Neural Networks in Data Neural Networks in Data Mining Mining Automatic Model Building (Machine Learning) Artificial Intelligence

Upload: tyrone-strong

Post on 31-Dec-2015

54 views

Category:

Documents


3 download

DESCRIPTION

Chapter 7 Neural Networks in Data Mining. Automatic Model Building (Machine Learning) Artificial Intelligence. Contents. Describe neural networks as used in Data mining Reviews real applications of each model Shows the application of models to larger data sets. High-Growth Product. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 7 Neural Networks in Data Mining

Chapter 7Chapter 7Neural Networks in Data MiningNeural Networks in Data Mining

Automatic Model Building

(Machine Learning)

Artificial Intelligence

Page 2: Chapter 7 Neural Networks in Data Mining

結束

7-2

ContentsContents

Describe neural networks as used in Data mining

Reviews real applications of each model

Shows the application of models to larger data sets

Page 3: Chapter 7 Neural Networks in Data Mining

結束

7-3

High-Growth ProductHigh-Growth Product

There are some types of data where neural network models usually outperform better when there are complicated relationships (nonlinearity) in the data.

Used for classifying data target customersbank loan approvalhiring stock purchaseDATA MINING

Used for prediction

Page 4: Chapter 7 Neural Networks in Data Mining

結束

7-4

Neural NetworkNeural Network

Neural networks are the most widely used method in data mining.

The idea of neural networks was derived from how neurons operate in the brain.

Real neurons are connected to each other, and accept electrical charges across synapses and pass on the electrical charge to other neighboring neurons.

ANN is usually arranged in at least three layers, have a defined and constant structure to reflect complex nonlinear relationships. (at least one hidden layer)

Page 5: Chapter 7 Neural Networks in Data Mining

結束

7-5

NetworkNetwork

Input Hidden Output

Layer Layers Layer

Good

Bad

Page 6: Chapter 7 Neural Networks in Data Mining

結束

7-6

Neural NetworkNeural Network

For classification neural network models, the output layer has on node for each classification category (true or false).

Each node is connected by an arc to nodes in the next layer. These arcs have weights, which are multiplied by the value of incoming nodes and summed.

Middle layer node values are the sum of incoming node values multiplied by the arc weights.

ANN learn through feedback loops. Output is compared to target values, and the difference between attained and target output is fed back to the system to adjust the weights on arcs.

Measure fit fine tune around best fit

Page 7: Chapter 7 Neural Networks in Data Mining

結束

7-7

Neural NetworkNeural Network

ANN can apply learned experience to new cases, for decision, classifications, and forecasts.ANN modeling should consider:Input variable selection and manipulation Select learning parameter, such as the no. of hidden

layers, learning rate, momentum, activation function…

About 95% of business applications were reported to use multilayered feedforward neural network with backpropagation learning rule.Supervised learning Each element in each layer is connected to all elements

of the next layer.

Page 8: Chapter 7 Neural Networks in Data Mining

結束

7-8

Neural NetworkNeural Network

Multilayered feedforward neural networks are analogous to regression and discriminant analysis in dealing with cases where training data is available.

Self-organizing map (SOM) is analogous to clustering technique used there is no training data.To classify data to maximize the similarity of patterns

within clusters while minimizing the similarity to patterns of different clusters.

Kohonen SOM were developed to detect strong features of large data sets.

Page 9: Chapter 7 Neural Networks in Data Mining

結束

7-9

Neural Network TestingNeural Network Testing

Usually train on part of available data package tries weights until it successfully categorizes a selected

proportion of the training data

When trained, test model on part of dataif given proportion successfully categorized, quitsif not, works some more to get better fit

The “model” is internal to the package

Model can be applied to new data

Page 10: Chapter 7 Neural Networks in Data Mining

結束

7-10

Neural Network ProcessNeural Network Process

1. Collect data

2. Separate into training, test sets

3. Transform data to appropriate units• Categorical works better, but not necessary

4. Select, train, & test the network• Can set number of hidden layers

• Can set number of nodes per layer

• A number of algorithmic options

5. Apply (need to use system on which built)

Page 11: Chapter 7 Neural Networks in Data Mining

結束

7-11

Loan ApplicationsLoan Applications

Loan decision is repetitive and time consuming, and every attempt should be made the decision that is fair to the applicant while reducing the risk of default to the lender.

1. Data collection: sex, marital status, No. of dependent children, occupation, …

2. Separating data: learning data (at least 100 sets) and testing data (100 sets)

3. Transform the inputs: ANN requires numeric data. See page 125.

Page 12: Chapter 7 Neural Networks in Data Mining

結束

7-12

Loan ApplicationsLoan Applications

4. Select, train and test the network: 1. The number of middle layer nodes, transfer function,

learning algorithms.

2. Too many hidden layer nodes results in the ANN memorizing the input data, without learning a generalizable pattern for the accurate analysis of new data. Too few nodes, requires more training time and result in less accurate models.

5. Repeat step 1 through 4 until the prescribed tolerance reached.

Page 13: Chapter 7 Neural Networks in Data Mining

結束

7-13

Neural Nets to Predict BankruptcyNeural Nets to Predict Bankruptcy

Wilson & Sharda (1994)

Monitor firm financial performanceUseful to identify internal problems, investment evaluation, auditing

Predict bankruptcy - multivariate discriminant analysis of financial ratios (develop formula of weights over independent variables)

Neural network - inputs were 5 financial ratios - data from Moody’s Industrial Manuals (129 firms, 1975-1982; 65 went bankrupt)

Tested against discriminant analysis

Neural network significantly better

Page 14: Chapter 7 Neural Networks in Data Mining

結束

7-14

Ranking Neural NetworkRanking Neural Network

Wilson (1994)

Decision problem - ranking candidates for position, computer systems, etc.

INPUT - manager’s ranking of alternatives

Real decision - hire 2 sales people from 15 applicants

Each applicant scored by manager

Neural network took scores, rank ordered

best fit to manager of alternatives compared (AHP)

Page 15: Chapter 7 Neural Networks in Data Mining

結束

7-15

Application resultsApplication results

Page 16: Chapter 7 Neural Networks in Data Mining

結束

7-16

Application resultsApplication results

Page 17: Chapter 7 Neural Networks in Data Mining

結束

7-17

Application resultsApplication results

Page 18: Chapter 7 Neural Networks in Data Mining

結束

7-18

ExerciseExercise

Data coding refers to page 117. Age <20 0

20~50 (age-20)/30> 50 1.0

State CA 1.0Rest 0

Degree Cert 0UG 0.5Rest 1.0

Major IS 1.0Csci, Engr Sci 0.9BusAd 0.7Other 0.5None 0

Experience Max Years/5 Minimal 2 Adequate 3