threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural...

15
Journal of Manufiwturing Systems ,,~ Vol. 24/No. 2 2005 Threefold vs. Fivefold Cross Validation in One-Hidden-Layer and Two-Hidden-Layer Predictive Neural Network Modeling of Machining Surface Roughness Data Chang-Xue Jack Feng (E-mai/: cfeng @bradley.edu), Zhi-Guang (Samuel)Yu*, Unnati Kingit, and M. Pervaiz Baig'~', Dept. of Industrial & Manufacturing Engineering & Technology, Bradley University, Peoria, Illinois, USA Abstract Predictability of a manufacturing process or system is vital in virtual manufacturing. Various data mining techniques are available in developing predictive models. Cross validation is critical in determining the quality of a predictive model and the costs in data collection and data mining. Several cross- validation (CV) techniques are available, including the v-fold CV, leave-one-out CV, and the bootstrap type of CV. Some past studies have not revealed any statistical advantages of using tenfold cross validation over fivefotd cross validation. Determining the number of hidden layers is important in pre- dictive modeling with neural networks. This study attempts to compare the performance of fivefold over threefold CV and that of one-hidden-layer over two-hidden-layer neural nets in predictive modeling for surface roughness parameters defined in ISO 13565 for turning and honing. Statistical hy- pothesis tests and different prediction errors are employed to compare the competitive models. This study does not re- veal any significant statistical advantages of using fivefold CV over threefold CV and of using two-hidden-layer neural nets over one-hidden-layer neural nets for the cases under study. Furthermore, the procedure presented here is appli- cable in comparing competitive data modeling or data min- ing methods. Keyword$: Data Mining; Cross Validation; Neural Networks; Predictive Modeling; Machining Surface Roughness; ISO 13565 Introduction Predictability of a manufacturing process or sys- tem is vital in moving toward virtual manufacturing. Without a proper model, it is not possible to predict the outcome of a manufacturing process or system. *Z-G. Yu is currently a project manager at Supply Chain Services International Inc., Peoria. Illinois, USA. tUnnati Kingi is a manufacturing engineer and M. Pervaiz Baig is a design engineer at CBT Companies Inc., Peoria, Illinois, USA. Various data mining techniques are available in de- veloping predictive models. Finding such a model is difficult, interesting, and sometimes rewarding. A model may originate from introspection or observa- tion (or both) (Gershenfeld 1999). Although devel- oping an analytical model is feasible in some simplified situations, most manufacturing processes are complex and, therefore, empirical models that are less general, more practical, and less expensive than the analytical models are of interest. Box and Draper (1987) and Gershenfeld (1999) classified mathematical models into analytical (or mechanistic) and empirical (or observational). Both regression analysis (RA) and neural networks (NN) have been used for years in empirical modeling and have only recently been termed data mining (Witten and Frank 2000; Groth 1998). Neural networks possess a number of attractive properties for modeling a complex product system and manufacturing process or system: universal func- tion approximation capability, resistance to noisy or missing data, accommodation of multiple nonlinear variables for unknown interactions, and good gen- eralization capability (Coit, Jackson, and Smith 1998; Twomey and Smith 1998). For manufacturing pro- cesses where no satisfactory analytical model ex- ists, or where a low-order empirical polynomial model is inappropriate, neural networks offer a good alternative approach. Among the various neural network models, back propagation is the best general-purpose model and is probably the best at generalization (Mitchell 1997; Lawrence 1994). Back propagation is a supervised learning scheme by which a layered feed-forward 93

Upload: chang-xue-jack-feng

Post on 03-Jul-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal of Manufiwturing Systems , , ~ Vol. 24/No. 2

2005

Threefold vs. Fivefold Cross Validation in One-Hidden-Layer and Two-Hidden-Layer Predictive Neural Network Modeling of Machining Surface Roughness Data

Chang-Xue Jack Feng (E-mai/: cfeng @ bradley.edu), Zhi-Guang (Samuel)Yu*, Unnati Kingit , and M. Pervaiz Baig'~', Dept. of Industrial & Manufacturing Engineering & Technology, Bradley University, Peoria, Illinois, USA

Abstract Predictability of a manufacturing process or system is vital

in virtual manufacturing. Various data mining techniques are available in developing predictive models. Cross validation is critical in determining the quality of a predictive model and the costs in data collection and data mining. Several cross- validation (CV) techniques are available, including the v-fold CV, leave-one-out CV, and the bootstrap type of CV. Some past studies have not revealed any statistical advantages of using tenfold cross validation over fivefotd cross validation. Determining the number of hidden layers is important in pre- dictive modeling with neural networks. This study attempts to compare the performance of fivefold over threefold CV and that of one-hidden-layer over two-hidden-layer neural nets in predictive modeling for surface roughness parameters defined in ISO 13565 for turning and honing. Statistical hy- pothesis tests and different prediction errors are employed to compare the competitive models. This study does not re- veal any significant statistical advantages of using fivefold CV over threefold CV and of using two-hidden-layer neural nets over one-hidden-layer neural nets for the cases under study. Furthermore, the procedure presented here is appli- cable in comparing competitive data modeling or data min- ing methods.

Keyword$: Data Mining; Cross Validation; Neural Networks; Predictive Modeling; Machining Surface Roughness; ISO 13565

Introduction Predictability of a manufacturing process or sys-

tem is vital in moving toward virtual manufacturing. Without a proper model, it is not possible to predict the outcome of a manufacturing process or system.

*Z-G. Yu is currently a project manager at Supply Chain Services International Inc., Peoria. Illinois, USA.

tUnnati Kingi is a manufacturing engineer and M. Pervaiz Baig is a design engineer at CBT Companies Inc., Peoria, Illinois, USA.

Various data mining techniques are available in de- veloping predictive models. Finding such a model is difficult, interesting, and sometimes rewarding. A model may originate from introspection or observa- tion (or both) (Gershenfeld 1999). Although devel- oping an analytical model is feasible in some simplified situations, most manufacturing processes are complex and, therefore, empirical models that are less general, more practical, and less expensive than the analytical models are of interest.

Box and Draper (1987) and Gershenfeld (1999) classified mathematical models into analytical (or mechanistic) and empirical (or observational). Both regression analysis (RA) and neural networks (NN) have been used for years in empirical modeling and have only recently been termed data mining (Witten and Frank 2000; Groth 1998).

Neural networks possess a number of attractive properties for modeling a complex product system and manufacturing process or system: universal func- tion approximation capability, resistance to noisy or missing data, accommodation of multiple nonlinear variables for unknown interactions, and good gen- eralization capability (Coit, Jackson, and Smith 1998; Twomey and Smith 1998). For manufacturing pro- cesses where no satisfactory analytical model ex- ists, or where a low-order empirical polynomial model is inappropriate, neural networks offer a good alternative approach.

Among the various neural network models, back propagation is the best general-purpose model and is probably the best at generalization (Mitchell 1997; Lawrence 1994). Back propagation is a supervised learning scheme by which a layered feed-forward

93

Page 2: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal of Mam([acturing Systems Vol. 24/No. 2 2005

network is trained to become a pattern-matching engine. This research uses the commercial package BrainMaker (Lawrence and Fredrickson 1998) with a back-propagation-based, feed-forward model from California Scientific Software for NN modeling. For an introduction of neural networks, refer to Bishop (1995), Lawrence (1994), Swingler (1996), and Wasserman (1989).

This paper is an extension of Feng and Yu (2003) in applying neural networks for predictive model- ing of the machining surface roughness parameters defined in ISO 13565 (ISO 1996). While Feng and Yu (2003) demonstrated the feasibility and good- ness of using NN in predictive modeling of these new surface roughness parameters, M~2, R~, Rvk, Rpk, and M~, this paper uses the same dataset and a new dataset to conduct a comparative study of the best performances in using fivefold CV vs. threefold CV and in using one-hidden-layer (OHL) vs. two-hid- den-layer (THL) neural networks. The following definition of notation is presented on the basis of ISO 13565-2 (ISO 1996). Detailed calculations are referred to the illustration in Figure 1 and the stan- dard itself.

R~ = Depth of roughness core profile in mi- crometers (pm)

Mrl = Material portion, in percent (%), determined for intersection line that separates the pro- truding peaks from the roughness core pro- file

Mr2 = Material portion, in percent (%), determined for intersection line that separates the deep valley from the roughness core profile

Rpk = Reduced peak height in micrometers (~rn), average height of protruding peaks above roughness core profile

Rv~ = Reduced valley depths in micrometers (pm), average depth of profile valleys pro- jecting through roughness core profile

FJvefold vs.Threefold Cross Validation When the amount of data for training and testing

is limited, the holdout method is used. This method reserves a certain amount of data for validation and uses the remainder for training (and sets part of that aside for testing, if required). In engineering prac- tice, it is common to hold one-third of the data out for validation and use the remaining two-thirds for training and testing (Witten and Frank 2000). Bal-

20 40 ~ 80 % 1~0

o :~o ,to so ~ % 'e~ Mr~ Mr~

~ ; e r i a l ra~o Mr

(a) Calculation of R,, Mrl, and M~z

0 20 4D 60 80 % ~ 0

&b-'J Mr2

K~mriat t ~ M t ~

Oh) Calculation of R,.k and Rpk

Figure 1 Illustration of ISO 13565 Surface Roughness Parameters

(courtesy of ISO 13565-2 from ISO 1996).

ancing the data set in each fold is important so that the data in each of the training, testing, and valida- tion sets are representative. If stratification is used in partitioning the data to balance the data in each set, it is called the stratified-holdout.

A more general way to mitigate any bias caused by the particular sample chosen for holdout is to re- peat the whole process of train-test-validate several times with different random samples. In each of the iterations, a certain proportion, say two-thirds of the data, is randomly selected for training (90% of the two-thirds for training) and testing (10% of the two- thirds for testing), possibly with stratification, and the remainder used for validation. The error rates on the different iterations are averaged to yield an overall error rate. This is the repeated holdout or repeated cross-validation method of error rate estimation. This

94

Page 3: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal of ManuJacturing Systems Vol. 24/No. 2

2005

1 2 3

2 3 1

I Train i Train ] Validate ! 3 1 2

I !,, .... i, ! Figure 2

Illustration of (Repeated) Threefold Cross-Validation Scheme

process is illustrated in Figure 2, where the numbers 1, 2, and 3 indicate the data fold. In this train-test- validate procedure, the test set of data is used to op- timize the generalization ability of the net, and the final cross-validation set is used as an ultimate test to evaluate the prediction error rates (Swingler 1996). This procedure has been used in Feng and Wang (2004, 2003), Feng,Yu, and Kusiak (2006), and Feng and Yu (2003).

Witten and Frank (2000) indicated that the stan- dard way of predicting the error rate of a learning technique given a single, fixed sample of data is to use stratified tenfold cross validation. In this method, the data are divided randomly into 10 parts, in each of which the class is represented in approximately the same proportions as in the full dataset. Each part is held out in turn and the learning scheme trained on the remaining nine-tenths; then its error rate is calculated on the holdout set. Thus, the learning pro- cedure is executed a total of 10 times, on different training sets (each of which have a lot in common). Finally, the 10 error estimates are averaged to yield an overall error estimate.

Witten and Frank (2000) suggested that extensive tests on numerous different data sets, with different learning techniques, have shown that 10 is about the right number of folds to get the best estimate of error, and some theoretical evidence is also avail- able to back this up. Although these arguments are by no means conclusive, and debate continues to rage in machine learning and data mining circles about what is the best scheme for evaluation, ten-

fold cross val idat ion has become the s tandard method in practical terms. Tests have also shown that the use of stratification improves results slightly. Thus, the standard evaluation technique in situations where only limited data are available is repeated, stratified tenfold cross validation.

However, Breiman and Spector (1992), Breiman (1992), and Zhang (1993) did not reveal any statis- tical advantages of tenfold CV over fivefold CV. For example, Breiman and Spector (1992) con- cluded that, when the predictors X are randomly selected, the v-fold cross validation and bootstrap error estimation perform almost the same, and they both perform better than the leave-one-out method in terms of the prediction error rates. Bootstrap is sampling with replacement, which is most useful when data sets are small . Refer to Efron and Tibshirani (1998) for more details about bootstrap. Furthermore, no evidence is available to support that the tenfold cross validation performs better than the fivefold cross validation. Zhang (1993) con- cluded that a twofold CV would lead to the worst prediction errors. Twomey and Smith (1998) and Ingrassia and Morlini (2005) compared the perfor- mance of different CV methods in neural networks modeling of sparse data or small datasets for dif- ferent purposes.

Early references for cross validation are Stone (1974, 1977), Allen (1974), and Snee (1977). See Hastie, Tibshirani, and Friedman (2001) and George (2000) for a recent review of different cross-valida- tion methods. Additional work on comparing differ- ent CV methods can be found in Burman (1990, 1989), Shao (1993), Zhang (1993), Li (1987), and Efron (1983). Studies of cross-validation methods and error estimates applied to neural networks were reported in Tibshirani (1996), Zhu and Rohwer (1996), Burke (1993), and Geman, Bienenstock, and Doursat (1992), but none of them compared five- fold vs. threefold CV methods or OHL vs. THL nets.

One-Hidden-Layer vs.Two-Hidden-Layer Neural Networks

Determining the number of hidden layers and the number of neurons in each hidden layer is a consid- erable task. The number of hidden layers is usually determined first. The number of hidden layers re- quired depends on the complexity of the relation- ship between the inputs and the outputs. Most

95

Page 4: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal o/" Mam(facturing Systems Vol. 24/No. 2 2005

problems may require only one hidden layer. If the input/output relationship is linear, that is, it is able to be approximated by a straight-line graph, the net- work does not even need any hidden layer. It is un- likely that a practical problem would require more than two hidden layers. A network that includes one input layer and one output layer with no hidden lay- ers is k n o w n as a l inear pe rcep t ron , or jus t a perceptron. Perceptrons are only capable of model- ing linear functions and are consequently rarely used (Swingler 1996, p. 61).

Cybenko (1989) stated that one-hidden- layer (OHL) is enough to classify input patterns into dif- ferent groups. Experiments conducted by Bounds et al. (1988) revealed no obvious evidence that a net- work with two-hidden-layers (THL) performed bet- ter than with one-hidden-layer only.

However, Chester (1990) argued that a THL net- work should perform better than a OHL network. The more layers you add, the longer it takes to make cor- rections, and the longer each run through the training set will take. The rule of thumb is normally to start with one layer (Lawrence 1994, p. 201). ff OHL nets do not train well, then try to change the number of neurons or the training/testing tolerance or both. Add- ing more layers should be the last option. In this re- search, neural network analysis was conducted based on experimental data from two of the authors' ma- chining surface roughness studies. The model is de- veloped with OHL and TILL, and the comparisons between threefold and fivefold CV and between OHL and THL are made for these models.

Computational Study Design of Computational Experiments

In designing the neural networks, the first step was to identify the type of networks and develop- ment software to be used, identify the input and out- put parameters, gather data, and sort into folds. In this research, the threefold cross-validation and five- fold cross-validation techniques were applied to the turning surface roughness parameters defined in ISO 13565 (ISO 1996). ISO 13565 recommends the use of five surface roughness parameters, Mrs, Rk, gvk, Rpg, and Mr1, instead of the single arithmetic aver- age, R a, in ultra-precision applications, such as for engine liners and fuel injectors. This makes not only the process planning much more challenging, but also the process modeling task much more difficult

because it becomes a multi-attributes decision-mak- ing problem. Some past studies about critical issues in implementing the new surface roughness standard are reviewed and reported in Feng, Wang, and Yu (2002) and Feng and Yu (2003).

In constructing any neural networks, selection of the hidden layer is determined fu'st. In this research, one-hidden-layer and two-hidden-layer are used to compare the threefold and fivefold CV methods, re- spectively. For each of the threefold cross-valida- tion and f ivefold cross-validat ion schemes, two different plans of the free parameters were used in training the neural networks. The first plan used OHL for network design with two levels of each of the following parameters: the number of hidden neu- rons, training tolerances, and testing tolerances, while the second plan used a THL net with also the two levels of the above parameters. As a result, a total of 24 neural networks were developed for the three- fold cross validation, that is, 2 x 2 x 2 x 3 = 24, and a total of 40 neural networks for fivefold cross vali- dation, that is, 2 x 2 x 2 x 5 = 40, in the OHL nets. For the THL networks, a total of 48 neural networks were constructed for the threefold cross validation, that i.e., 2 x 2 x 2 x 2 x 3 = 48, and a total of 80 neural networks for fivefold cross validation, that is, 2 x 2 x 2 x 2 x 5 = 8 0 .

The hidden neuron size, h, can be determined by a number .of approaches. One commonly used ap- proach is to haft the sum of input and output neu- rons (Lawrence 1994). For an input size i = 5 and output size o = 6 in the first case, the number of hidden neurons can be approximated by 5 or 6 (5 was selected). Another way of determining h is to use h = flog~n (Marchandani and Cao 1989), where n is the number of training patterns or training data. Feng, Yu, and Kusiak (2006) reviewed more meth- ods for estimating the hidden neurons.

A back-propagation network (BPN) is sensitive to initial values of weights (Kolen and Pollack 1990). Properly selected initial weights through the selec- tion of training tolerances can shorten learning time and result in stable weights. Initial weights that are too small increase the learning time, which may cause difficulties in converging to an optimal solution (Kusiak 2000, p. 365). If initial weights (training tolerances) are too large, the network may get un- stable weights (Wasserman 1989). The software package BrainMaker does not give the option to change the weights. Instead, it provides the options

96

Page 5: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal of Mam(t~wturing Systems Vol. 24/No. 2

2005

of changing the training and testing tolerances to achieve the change to different weights. BrainMaker has a default training tolerance of 0.10 and a default testing tolerance of 0.40. For example, after some trial runs, the same two tolerances of 0.15 and 0.20 for training and of 0.20 and 0.40 for testing were used in the first case study in both the threefold CV and in the fivefold CV of the OHL and THL neural networks to facilitate the comparison.

How to Compare: Hypothesis Tests and Prediction Errors

In this study, hypothesis tests are used first to qualify models. For data following normality, the F- and t- tests (or paired t-test for the same input) are used to test the hypothesis of equal variances and equal means, respectively, of the predictions and the correspond- ing experimental data. For data with unknown distri- butions or nonnormal distributions, the counterpart nonparametric tests are used, that is, Levene's test is used to test the hypothesis of equal variances and the Mann-Whitney test is used to test the hypothesis of equal means/medians. Due to space limits, the Mann- Whitney test is abbreviated as MWT in the tables. For details about each hypothesis test, the reader is re- ferred to Montgomery and Runger (2003).

When more than one model is qualified as a good model, the mean absolute deviation (MAD) or mean absolute error (MAE) and the mean absolute per- cent error (MAPE) are used to select the best model. Mathematically,

MAD I rl i=l "

(1)

n ^ M E I-zlY, Y, IxI® ~.n;=, I Y, J

(2)

where y; is the observed response, Yi is the predicted response, and n is the number of data points used for validation. If a tie would exist, then the relevant domain knowledge in the machining surface rough- ness study is used to break the tie. Refer to Feng, Yu, and Kusiak (2006) for more detailed discussion of the two topics briefed in this section. Feng and Wang (2004, 2003, 2002), Feng and Yu (2003), Feng, Yu, and Kusiak (2006), and Feng, Wang, and Yu (2002) used the above procedures in selection of predictive regression or neural networks models, or both.

Case Study 1: Predictive Modeling of Turning Surface Roughness Data

Table A1 in the Appendix displays the design of the screening and confirmation experiments. The screening experiment is a 25 full-factorial design with two rcplicatcs. The order of the 64 experiments was randomized, and they were performed on a produc- tion-type CNC turning center. The two materials under consideration are steel 8620 and aluminum 6061T. To quantify the material, the Rockwell B hardness measure HRB is used.

After analysis of the screening experiment, a fol- low-up confirmation experiment was conducted based on a 25.2 fractional factorial design with two replicates. In the confirmation experiment, D = AB and E = AC. The confirmation experiment added two levels of feed, depth of cut, and speed while main- raining the material and cutter nose radius at the same two levels given in the screening experiment. Ex- ecution of these 16 confu'mation experiments also followed a randomized order. Refer to Box, Hunter, and Hunter (2005), Montgomery (2005), and Wu and Hamada (1998) for more details about design and analysis of experiments. Each of the 80 samples in the experiments was measured three times with each about 120 degrees apart along the axis by us- ing the Mitutoyo surface profilometcr S J-301, and their averages are taken and used in the neural net- works modeling. Due to space limits, more details about the experimental design and data, data divi- sions, and the computational results are referred to in Yu (2003).

Threefold vs. Fivefold in One-Hidden-Layer Net

In this study, the two hypothesis tests are used to qualify a model. It follows that if a model's predic- tions are statistically the same as the respective ac- tual observations in terms of their mean and variance, it is a candidate for further evaluation on the basis of MAD and MAPE. Table 1 summarizes the free pa- rameters of the two best models in OHL nets with threefold and fivefold CV. On the basis of the above procedure and the prediction error statistics in Tables 2 and 3, model III-8 from the threefold CV in Table 2 and model III-8 from the fivefold CV in Table 3 are the respective best NN models.

The model notation is as follows. The first Ro- man letter stands for a specific fold. For instance, the number III means data from folds 2 and 3 are

97

Page 6: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal o]" Manufacturing Systems Vol. 24/No. 2 2005

Table 1 Summary of Free Parameters of Best Model in OHL Nets

# of Training Testing Neurons Tolerance Tolerance

3FCV Model 111-8 10 0.20 0.40 5FCV Model 11I-8 11 0.15 0.40

used in training/testing while data from fold 1 for validation. The second letter stands for the combi- nation number of the three free parameters. For ex- ample, the letter 8 in model 111-8 means each of the three parameters was set to its high level, that is, 10 neurons for hidden neuron size as opposed to 5, 0.20 for training tolerance as opposed to 0.15, and 0.40 for testing tolerance as opposed to 0.20. This is shown in Table 1.

In model selection, the following order of impor- tance of the five new roughness parameters is used based on the authors' past experience working with

industry--M,._,, Rk, R,.~, Rt, k, and M,~, if there is a tie because this is a multi-attributes decision-making problem. Feng et al. (2002) provides more back- ground about this sequence and the associated weight in use for each parameter.

Hypothesis tests are conducted to compare the best models from threefold CV vs. fivefold CV of OHL networks, and a summary of their P values is pre- sented in 7"able 4. Minitab (Minitab 2000) was used to perform the hypothesis tests and generate the P values. Table 4 shows that both threefold CV vs. five- fold CV performed statistically the same in OHL networks based on comparison of the best perform- ing models because 16 out of the 18 P values are greater than 0.05, except for the two in bold type. Among the two F-tests showing a statistical differ- ence in Table 4, the standard deviation of absolute prediction errors is 9.480 from threefold CV as op-

Table 2 Prediction Error Statistics for 8 of the 24 NN Models of OHL (Threefold CV)

MAD MAPE

R~(pm) R~(pm) R~(lam) Mrl(°/o) M~(%) Ra(prn) R k R~, R, M~ M~ R

DATA SETS 2 and 3 for Training and Testing, and DATA SET 1 for Validation

Model III-1 2.119 0.549 1.324 2.288 4.092 0.484 43.66 39.34 91.06 25.89 4.67 44.13 Model 111-2 1.738 0.679 1.325 3.57 4.465 0.424 29.94 54.04 91.56 45.86 5.06 36.22 Model I11-3 1.916 0.544 1 . 5 3 1 2.702 4.493 0.442 38.34 53.35 115.5 29.91 5.13 42.79 Model 11I-4 1.903 0.633 1 . 3 6 1 3.576 4.722 0.43 36.45 44.39 97.78 45.3 5.34 37.41 Model 11I-5 2.01 0.689 1 . 4 0 5 3.477 3.803 0.442 39.15 41.82 85.29 40.67 4.34 34.90 Model Ili-6 1.627 0.645 1.163 2.805 4.134 0.306 29.04 52.98 79.89 35.74 4.73 25.40 Model 111-7 1.952 0.55 1.289 2.278 4.765 0.459 44.03 45.95 62.63 25.82 5.52 32.97 Model 111-8 2.089 0 .659 1.307 3.383 4.547 0.345 40.8 56.41 71.78 42.38 5.23 29.00

Table 2 cont inued Prediction Error Statistics for 8 of the 24 NN Models of OHL (Threefold CV)

Model Test P Value for t-test/Mann-Whitney Test P Value for F-TesffLevene's Test No. Type R, Rl~ gvt Mr1 M2 R U Type e~ Rp, R ~ Mr, Mr2 R

Model t-test 0.03 0.024 0.007 0.015 F-test 0.815 0.465 0.192 0.699 IlI-1 MWT 0.7035 0.6907 Levene 0.149 0.318

Model t-test 0.437 0.009 0.021 0.123 F-test 0.773 0.833 0.355 0.929 1II-2 MWT 0.3966 0.8221 Levene 0.154 0.985

Model t-test 0.872 0 0,143 0.033 F-test 0.793 0.486 0.762 0.216 Ill-3 MWT 0.7164 0.3326 Levene 0.085 0.013

Model t-test 0.584 0.006 0.051 0.001 0.099 F-test 0.907 0.899 0.482 0.521 0.524 1114 MWT 0.4999 Levene 0.404

Model t-test 0.766 0.39 0.227 0.017 F-test 0.921 0.582 0.811 0.802 111-5 MWT 0.7555 0.1076 Levene 0.117 0.807

Model t-test 0.116 0.013 0.869 0.053 0.14 F-test 0.813 0.209 0.874 0.232 0.949 111-6 MWT 0.6038 Levene 0.564

Model t-test 0.82 0.04 0.74 0.17 F-test 0.52 0.78 0.12 0.74 HI-7 MWT 0.80 0.70 Levene 0.08 0.89

Model t-test 0.94 0.45 0.35 0.84 0.29 F-test 0.51 0.99 0.48 0.78 0.56 III-8 M W T 0.28 Levene 0.40

98

Page 7: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal of Manufktcturing Systems Vol. 24/No. 2

2005

posed to 0.559 from fivefold CV for R~, while the standard deviation of absolute percent error of pre- dictions is 1.463 from threefold CV as opposed to 0.489 from fivefold CV for Rpk. That is, the fivefold CV resulted in smaller variances of errors for R k and relative errors for Rt,~ in these two particular cases.

Threefold vs. Fivefold in Two-Hidden-Layer net

Based on the above procedure and the prediction error statistics in Tables 5 and 6, model 111-9 from threefold CV and model III-13 from fivefold CV are the best NN models, respectively. Table 7 provides a summary of the free parameters from the above two best models. Hypothesis tests are conducted to compare the best models from threefold CV vs. five- fold CV of THL networks, and a summary of their P values was presented in Table 8.

Table 8 shows that both threefold CV and five- fold CV of THL networks are almost statistically the same based on the comparison of their best perform- ing models because only one (in bold type) of the 36 P values is less than 0.05. In this exceptional case, as shown in bold in Table 8 for THL nets, the stan- dard deviation of absolute prediction errors from the threefold CV designated as model 111-9 is 2.748, which is larger than 1.312 from the fivefold CV des- ignated as model III-13.

Comparison of Best Performing OHL and THL Neural Nets

Hypothesis tests are used to compare the perfor- mances of the best model from the OHL net and that from THL under threefold and fivefold CV tech- niques, respectively. Table 9 shows the P values of

Table 3 Prediction Error Statistics for 8 of the 40 NN Models of OHL (Fivefold CV)

MA D MAPE

R~ (lam) R , (lam) R , (lam) Mr~ (%) Ms. - (%) R a (lam) R k R k R~ Mr1 Mr. - R

DATA SETS 1.2, 4, and 5 for Training and Testing, and DATA SET 3 for Validation

Model III-1 2.071 0.565 1.068 3.271 3.718 0.448 52.56 40.15 69.61 52.88 4.19 32.02 Model 111-2 1.578 0.745 0.823 3.386 3.880 0.396 45.28 58.34 64.08 55.9 4.37 35.26 Model III-3 1.274 0.618 0.919 3.45 3.508 0.333 28.43 44.31 66.46 68.05 4.07 23.93 Model III-4 1.188 0.647 1.031 2.969 4.085 0.312 35.57 48.76 76.77 42.43 4.67 30.85 Model III-5 1.122 0.617 1.313 3.078 5.812 0.375 31.11 29.12 106.15 43.45 6.67 26.74 Model III-6 2.259 0.59 1.524 3.559 6.455 0.429 41.76 39.93 118.01 66.58 7.33 28.61 Model III-7 1.679 0.723 1.183 3.487 4.621 0.393 42.45 32.23 92.51 42.80 5.32 23.41 Model 111-8 1.735 0.735 1.072 3.712 3.935 0.294 41.83 37.58 67.12 50.88 4.62 21.13

Table 3 continued Prediction Error Statistics for 8 of the 40 NN Models of OI-IL (Fivefold CV)

Model Test P Value for t-test/Mann-Whitney Test P Value for F- Test/Levene's Test NO. Type R k R k R~ M,, Mr2 R,, Type R k R~ R~ M , M 2 R,,

Model t-test 0.59 0.39 0.253 0.929 0.021 0.557 F-test 0.112 0.207 0.001 0.12 0.072 0.45

III- 1

Model t-test 0.513 0.757 0.013 0.847 0.065 0.625 F-test 0.039 0.234 0.161 0.198 0.104 0.214

III-2

Model t-test 0.972 0.027 0.288 0.769 0.038 F-test 0.152 0.409 0.134 0.087 0.954 III-3 MWT 0.611 Levene 0.767

Model t-test 0.565 0.562 0 0.628 0.183 0.558 F-test 0.152 0.094 0.452 0.254 0.369 0.393

III-.4

Model t-test 0.064 0.337 0.195 F-test 0.436 0.456 0.484 III-5 MWT 0.318 0.51 0.418 Levene 0.33 0.508 0.47

Model t-test 0.557 0.628 0.132 0.565 0.027 F-test 0.324 0.667 0.631 0.356 0.553 III-6 MWT 0.4856 Levene 0.754

Model t-test 0.915 0.494 F-test 0.457 0.86 111-7 MWT 0.836 0.194 0.418 0.6647 Levene 0.273 0.968 0.437 0.796

Model t-test 0.44 0.08 0.71 0.63 0.59 0.82 F-test 0.80 0.18 0.36 0.15 0.16 0.87

111-8

99

Page 8: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal of ManuJkwturing Systems Vol. 24/No. 2 2005

Table 4 Comparison of Best Models from Threefold and Fivefold CV (OHL)

P Value (2-sample t-test) P Value (F-Test)

Models R, R k R, M, Mr_, R,, e, R j, k R,, M, Mr_, R

Prediction 0.16 0.41 0.76 0.90 0.71 0.75 0.34 0.90 0.54 0.71 0.30 0.34 Error 0.60 0.73 0.47 0.71 0.50 0.53 0.01 0.26 0.93 0 . 1 6 0.45 0.91 Relative Error 0.94 0.15 0.85 0.57 0.57 0.37 0.28 0.00 0.10 0 . 9 7 0 . 3 9 0.12

hypothes i s tests o f equal means and var iances for the best mode l s f r o m the O H L and T H L nets under

threefold CV, while Table 10 shows the P values o f

those under f ivefold CV. Because P values f rom the two tables are all greater than 0.05, no statistical evi- dences can be found to support the hypothes is that

neural nets o f O H L and T H L pe r fo rmed different ly in the threefold and f ivefold C V techniques , respec-

tively. For the reader ' s informat ion, Figure A1 in the Append ix pairs the predict ion errors and relative er-

rors o f the four best models for the six surface rough - ness parameters under study, respectively. Note that

each point in the horizontal axis represents the m o d e l

n u m b e r that differs f r o m mode l to mode l as each mode l was der ived f r o m different inputs.

C a s e S t u d y 2: P r e d i c t i v e M o d e l i n g o f H o n i n g S u r f a c e R o u g h n e s s D a t a

The factors and levels o f the industrial exper iment

are provided in Table A2 in the Appendix . C o m p l y -

Table 5 Prediction Error Statistics for 16 of the 48 NN Models of Ti l l , (Threefold CV)

MAD MAPE R k (pro) R,k (~tm) R~ (lain) M~ (%) M~ (%) R (lain) R~ R k R~ n~ M,_ R

DATA SETS 2 and 3 for Training and Testing, and DATA SET 1 for Validation Model 111-9 1.67 0.73 1.31 3.17 3.82 0.37 36.88 52.46 84.81 37.06 4.36 33.04 Model III- 10 1.787 0 . 7 1 9 1.471 3.11 4.564 0.355 30.6 52.94 76.51 37.7 5.23 26.36 Model Ill- 11 1.837 0 . 5 6 4 1 . 3 2 7 2 . 7 6 9 3.026 0.26 3 5 . 7 5 38.79 6 7 . 5 1 28.88 3.46 19.93 Model III-12 1.679 0.591 1.47 3 . 5 7 3 4.272 0 . 3 4 2 3 1 . 7 1 37.56 9 4 . 7 9 41.44 4.93 28.23 Model III-13 1.757 0 . 7 1 6 1 . 1 9 8 2 . 9 8 1 3 . 8 4 6 0.423 36.4 55.14 8 8 . 3 4 36.66 4.36 41.06 Model III-14 1.876 0.9 1.321 3 . 6 4 6 4.196 0.486 43.9 79.08 9 2 . 2 1 44.51 4.79 41.43 Model II1-15 1.756 0.74 1.468 3 . 8 1 6 4.943 0.434 3 6 . 1 3 60.92 7 7 . 7 1 43.86 5.6 29.99 Model 111-16 1.935 0.695 1.44 3 . 7 5 6 4.692 0.404 3 8 . 8 4 62.47 8 9 . 1 5 45.22 5.38 35.19

Table 5 continued Prediction Error Statistics for 16 of the 48 NN Models of THL (Threefold CV)

'Modei Test P Value for t-test)Mann-Whitney Test P Value for F-Test/Levene's Test No. Type R~ Rpk g ~¢ Mrl Mr2 R a Type R~ R ~ Rvk Mrl Mr2 R a

Model t-test 0.11 0.38 0.31 0.48 F-test 0.84 0.48 0.58 0.40 Ill-9 MWT 0.81 0.44 Levene 0.32 0.63

Model t-test 0.403 0.419 0.346 0.506 F-test 0.071 0.386 0.08 0.941 III-10 MWT 0.291 0.522 Levene 0.058 0.48 Model t-test 0.444 0.763 0.078 0.6 F-test 0.542 0 . 3 5 0.308 0.046 III-11 MWT 0.691 0.904 Levene 0.756 0.732 Model t-test 0.227 0.075 0.449 0.197 F-test 0.454 0.441 0.372 0.747 1II-12 MWT 0.704 0.522 Levene 0.284 0.454 Model t-test 0 0.264 0.128 F-test 0.313 0.458 0.358 II1-13 MWT 0.986 0.959 0.307 Levene 0.03 0.848 0.954 Model t-test 0.067 0.367 0.184 0.144 F-test 0.25 0.913 0.959 0.488 III-14 MWT 0.478 0.146 Levene 0.023 0.261 Model t-test 0.36 0.834 0.049 0.042 0.633 F-test 0.565 0.949 0.986 0.895 0.908 III-15 MWT 0.986 Levene 0.15 Model t-test 0.767 0.051 0.222 0.719 0.18 F-test 0.288 0.471 0.367 0.998 0.63 1II-16 MWT 0.233 Levene 0.223

100

Page 9: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal ~[" Mamgt?tcturing Systems Vol. 24/No. 2

2005

nonpa rame t r i c M a n n - W h i t n e y test and L e v e n e ' s test we re used to data that do not fo l low normal i ty . Ex- cept fo r the m e a n tests o f e r ror and relat ive e r ro r fo r p a r a m e t e r R k in Table 11, the P values f r o m the re- m a i n i n g 58 tests are all g rea te r than 0.05. It impl ies that v i r tual ly no stat is t ical advan tages are r evea l ed by us ing 5 F C V o v e r 3 F C V in bo th O H L and T H L nets. Th is confu 'rns the resul ts f r o m case s tudy 1.

Comparing the Best Performing OHL and THL Nets in 3 F C V and 5 F C V

Table 13 prov ides a s u m m a r y o f free p a r a m e t e r s in the bes t neura l m o d e l s f r o m the honing sur face roughness data. Tables 14 and 15 s u m m a r i z e the P values in c o m p a r i n g the bes t pe r fo rming mode l s f r o m O H L a n d T H L nets r e su l t i ng f r o m t h r e e f o l d C V ( 3 F C V ) and f ive fo ld C V (5FCV) , respect ively . S imi- larly, fo r data fo l lowing the n o r m a l dis tr ibut ion, the t - tes t and F- t e s t w e r e used , whi le the c o u n t e r p a r t

ing to the conf ident ia l i ty a g r e e m e n t wi th the pro jec t sponsors , the process pa r ame te r s p rov ided in Table A2 have been scaled. The 26--" fract ional factorial ex- per iment design was used, leading to 16 different com- binat ions o f the six honing parameters . In Table A2, RS stands for roughing stone, FS for f inishing stone, RT for roughing t ime, F T for f inishing t ime, RP for roughing pressure, and FP fo r f inishing pressure.

ThreefoM vs. FivefoM in O H L and THL Neural Nets

The s a m e p rocedures f r o m the p rev ious sect ion are used to genera te the bes t p red ic t ive mode l s in each o f the four ca tegor ies . Tables 11 and 12 sum- mar i ze the P va lues in c o m p a r i n g the best p e r f o r m - ing m o d e l s f r o m threefo ld C V (3FCV) and f ive fo ld C V (5FCV) appl ied in O H L and T H L neural nets, respect ively . For da ta fo l l owing no rma l distr ibution, the t-test and F- t e s t were used , whi le the counterpar t

Table 6 Predic t ion Error Statistics for 8 of the 80 NN Models of THL (Fivefold CV)

MAD MAPE R~ (tam) R~ (~m) R.~ (tan) Mr~ (%) M_, (%) R,, (lam) R k R~ R~ Mrl Mr2 R a

DATA SETS 1, 2, 4, and 5 for Training and Testing, and DATA SET 3 for Validation Model 111-9 1.42 0.66 0.99 3.25 4.79 0.43 33.58 49.1 84.47 62.2 5.6 36.50 Model III-10 2.31 0.51 0.87 3.88 4.16 0.3 59.86 30.4 53.64 53.6 4.9 21.00 Model III- 11 1.79 0.55 1.08 3.08 5.35 0.23 46.37 31.3 73.2 51.4 6.3 17.70 Model 111-12 1.27 0.54 0.87 2.72 4.82 0.28 30.27 29.3 58.69 40.7 5.6 22.20 Model III-13 1.12 0.83 0.98 3.48 3.52 0.38 33.56 55.16 68.09 42.89 4.05 29.09 Model 111-14 1.75 0.61 1.01 3.43 3.62 0.33 47.93 40.7 65.92 54.5 4.3 27.50 Model 111-15 1.9 0.71 1.12 2.84 4.87 0.32 47.79 54.4 78.41 40.3 5.7 28.60 Model 1II-16 1.86 0.58 1.4 2.89 5.51 0.28 47.42 38.6 88.91 38.6 6.3 21.00

Table 6 continued Prediction Error Statistics for 8 of the 80 NN Models of THL (Fivefold CV)

Model Test P Value for t-test/Mann-Whitney Test P Value for F-Test/I_~vene's Test No. Type e k R k R ~ M . 1 Mr2 R ° Type R~ epk e t Mr l Mr2 R a

Model t-test 0.603 0.073 0.4 0.91 0.059 F-test 0.391 0.705 0.237 0.724 0.834 II1-9 MWT 0.692 Levene 0.77

Model t-test 0.886 0.426 0.188 0.552 F-test 0.096 0.283 0.606 0.6 III-10 MWT 0.194 0.925 Levene 0.787 0.921 Model t-test 0.313 0.362 0.455 0.547 0.816 0.62 F-test 0.92 0.371 0.124 0.198 0.619 0.774 III- 11 MWT Levene Model t-test 0.869 0.386 0.927 0.335 0.919 0.256 F-test 0.866 0.482 0.313 0.401 0.586 0.898 III- 12 MWT Levene Modal t-test 0.05 0.35 0.49 0.77 F-test 0.31 0.13 0.40 0.51 II1-13 MWT 0.51 0.67 Levene 0.13 0.22

Model t-test 0.09 0.455 0.607 0.446 0.33 F-test 0.685 0.018 0.257 0.567 0.995 111-14 MWT 0.865 Levene 0.651 Model t-test 0.156 0.88 0.213 0.916 0.769 F-test 0.742 0.305 0.174 0.519 0.615 III- 15 MWT 1 Levene 0.943 Model t-test 0.208 0.345 0.096 0.131 0.928 F-test 0.834 0.424 0.125 0.368 0.816 111-16 MWT 0.585 Levene 0.956

101

Page 10: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal t~[" Manuf~tcturing Systems Vol. 24/No. 2 2005

Table 7 Summary of Free Parameters of Best Model in TIlL Nets

# of # of Neurons Neurons

in in Training Testing Layer I Layer 2 Tolerance Tolerance

3FCV Model 5 5 0.15 0.40 1II-9

5FCV Model 5 5 0.15 0.40 111-13

nonparamet r ic M a n n - W h i t n e y test and L e v e n e ' s test were used fo r da ta that are no t n o r m a l l y dis t r ib- uted. E x c e p t f o r the var iance test in abso lu te e r ro r fo r R k and in absolu te re la t ive e r ro r f o r Mr2 as shown in Tab le I 5 , the P va lues f r o m the r e m a i n i n g 58 tests are all g rea te r than 0.05. Th is impl ies that vir- tua l ly no s ta t is t ical advan tages w e r e r e v e a l e d by app ly ing 5 F C V and 3FCV to cons t ruc t ing the O H L and T H L nets . Th i s conf i rms the obse rva t ions m a d e in case s tudy 1.

Conclusion This paper compared the use o f threefold and f ive-

fold cross-val idat ion techniques in both one-hidden- layer ( O H L ) and two-hidden- layer (THL) of neural networks for predict ive model ing of experimental data instead o f r andom data. Statistical hypothesis tests and predict ion error evaluat ions are employed to select the best models . N o significant statistical advantages have been revealed in this paper in using f ivefold C V over using threefold CV in both the O H L and T H L neural networks for exper imenta l data f rom a turning surface roughness def ined in the new ISO 13565 stan- dard, Similarly, no significant statistical advantages have been revealed in this paper in the use o f T H L nets over O H L nets in both the threefold and f ivefo ld CV methods for the same exper imental data sets.

The proposed procedure in compar ing compet i t ive v-fold C V me thods and d i f fe ren t h idden- layers o f neural nets is illustrated in Figure 3. Further studies

Table 8 Comparison of Best Models from Threefold and Fivefoid CV (THL)

P Value (2-sample t-test) P Value (F-Test) Models U. M: Ro M, eo

Prediction 0.94 0.71 0.90 0.87 0.86 0.84 0.50 0.17 0.56 0.64 0.52 0.82 Error 0.39 0.64 0.20 0.75 0.71 0.90 0.00 0.18 0.11 0.45 0.85 0.64 Relative Error 0.83 0.84 0.55 0.61 0.74 0,72 0.86 0.89 0.24 0.72 0 . 7 7 0.88

Table 9 Comparison of Best Performing OEUL and THL Nets in Threefold CV

PValue (2-sample t-test) P Value (F-Test) Models ek M, Ro Rk R, Mr, Mr, Ro

Prediction 0.21 0.82 0.47 0.99 0.59 0.89 0.84 0,51 0.85 0.99 0 . 7 7 0.79 Error 0.58 0.68 1.00 0.77 0.32 0.71 0,81 1.00 0.42 0.42 0 . 9 7 0.86 Relative Error 0.74 0.78 0.57 0.67 0.31 0.65 0.22 0.12 0.07 0.48 0.84 0.52

Table 10 Comparison of Best Performing OHL and THL Nets in Fivefold CV

P Value (2-sample t-test) F Value (F-Test)

Models g& e k R ~ grl Mr2 g g k Rpt g~. grl Mr2 R a

Prediction 0.74 0.52 0.28 0.79 0.30 0.83 0.12 0.51 0.91 0.93 0.57 0.41 Error 0.22 0.73 0.78 0.84 0,68 0.37 0.62 0.85 0.05 0.98 0.40 0.52 Relative Error 0.63 0.15 0.98 0.63 0.64 0.42 0.86 0.09 0.26 0.41 0.25 0.08

Table 11 P Values of Hypothesis Tests in Comparing 3FCV and 5FCV in OHL

PValue (MWT/t-test) PValue CLevene's Test/F-Test) Models R k R ~ R , Mrl Mr~ - R k Rpk R , Mrl Mr_,

Prediction 0.205 0.243 0.735 0.359 0.105 0.704 . 0.677 0.560 0.509 0.567 Error 0.001 0.315 0,547 0.890 0.522 0.209 0.503 0,785 0 . 9 3 5 0.960 Relative Error 0.003 0,116 0,309 0.890 0,474 0,017 0 , 9 6 1 0.973 0.530 0.951

102

Page 11: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal of Manut~tcturing Systems Vol. 24/No. 2

2005

Table 12 P Values of Hypothesis Tests in Comparing 3FCV and 5FCV in THL

P Value (MWT/t-test) P Value (Levene's Test/F-Test)

Models R, Rp,. R , Mr, M 2 R, Rp, R, M , M,,

Prediction 0.253 0.429 0.506 0.129 0.930 0.691 0.971 0.998 0.713 0.728 Error 0.490 0.459 0.697 0.407 0.930 0.547 0.526 0.808 0.713 0.075 Relative Error 0.233 0.233 0.309 0.589 0.414 0.143 0.45 0.547 0.887 0.049

will be directed to comparing the fivefold and three- fold cross-validation techniques in selection and vali- dation of predictive regression models for data from designed experiments, and more case studies from different structures of data based on designed experi- ments in predictive neural networks modeling.

Acknowledgments

This research was partially funded by the Gradu- ate Research Assistant Sponsored Program (GRASP) award from the Graduate College of Bradley Uni- versity and a Caterpillar Fellowship Award granted to Dr. Jack Feng. Dr. Feng's former graduate stu- dents, Prashant Kapse, Sam Xiao, Grace Hu, and Meraj Khan, measured the surface roughness data at various stages of the projects.

Table 13 Summary of Free Parameters of Best Neural Models

from Honing Data

# of # of Neurons Neurons

in in Training Testing Layer 1 Layer 2 Tolerance Tolerance

3FCV-OHL 10 - 0.15 0.40 3FCV-THL 10 3 0.10 0.30 5FCV-OHL 3 - 0.15 0.30 5FCV-THL 11 11 0. l0 0.40

References Allen, D.M. (1974). "'The relationship between variable selection and

data augmentation and a method for prediction." Technometrics (v16, nl) , pp125-127.

Bishop, C.M. (1995l. Neural Networks for Pattern Recognition. Ox- ford, UK: Clarendon Press.

Bounds, D.G.; Lloyd, P.J.; Mathew. B.: and Waddell. G. (1988l. "A multilayer perceptron network for the diagnosis of low back pain." Proc. of 2nd IEEE Annual Int ' l Conf. on Neural Networks, San Diego. Piscataway, NJ: IEEE Press, pp. II.481-II.489.

Box, G.E.P. and Draper, N.R. (1987). Empirical Model-Building and Response Surfaces. New York: John Wiley & Sons.

Box, G.E.P.: Hunter, J.S." and Hunter, W.G. (2005). Statistics.fbr Experimenters: Design, hmovation, and Discovery,, 2nd ed. New York: John Wiley & Sons.

Breiman, L. (1992). "The little bootstrap and other methods for dimen- sionality selection in regression: X-fixed prediction error." Journal of the American Statistical Association (v87, n419), pp738-754.

Breiman, L. and Spector, P. (1992). "Submodel selection and evalu- ation in regression: the X-random case." lnt'l Statistics Review (v60, n3), pp291-319.

Burke, L. (1993). "Assessing a neural net: validation procedures." PC AI (Mar./Apr.), pp20-24.

Burman, P. (1989). "A comparative study of ordinary cross-valida- tion, v-fold cross-validation and the repeated learning-testing methods,'" Biometrika (v76, n4), pp503-514.

Burman, P. (1990). "Estimation of optimal transformations using v-fold cross validation and repeated learning-testing methodsY Sankhya: The Indian Journal of Statistics (v52, A, Part 3), pp314-345.

Chester, D.L. (1990). "Why two hidden layers are better than one?" Proc. of 4th IEEE Annual Int'l Conf. on Neural Networks, Wash- ington, DC. Piscataway, NJ: IEEE Press, pp. 1.265-I.268.

Coit, D.W.; Jackson, B.T.: and Smith, A.E. (1998). "Static neural network process models: considerations and case studies." lnt'l Journal of Production Research (v36, n i l ) . pp2953-2967.

Table 14 P Values of Hypothesis Tests in Comparing OHL and THL Nets in 3FCV

P Value (MWT/t-test) P Value (Levene's Test/F-Test)

Models R~ R , R~ Mr~ Mr_, R, Rvk R , Mr~ Mr_~

Prediction 0.931 0.093 0.640 0.079 0.426 0.673 0.936 0.976 0.698 0.523 Error 0.088 0.368 0.522 0.250 0.592 0.867 0.952 0.791 0.592 0.533 Relative Error 0.151 0.586 0.436 0.307 0.381 0.680 0.220 0.993 0.208 0.632

Table 15 P Values of Hypothesis Tests in Comparing OHL and THL Nets in 5FCV

PValue (MWT/t-test) P Value (Levene's Test/F-Test)

Models R~ R k R , Mr, Mr2 R, R , R , M , Mr2

Prediction 0.865 0.735 0.610 0.318 0.207 0.495 0.705 0.542 0.984 0.473 Error 0.112 0.274 0.611 0.647 0.176 0.034 0.850 0.852 0.508 0.050 Relative Error 0.250 0.486 0.807 0.985 0.171 0.295 0.968 0.789 0.431 0.045

103

Page 12: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal of Manufacturing Systems Vol. 24/No. 2 2005

Select a CV method [

I Divide data into v folds

I Design the NN computational experiment

L

I rain and test the NN

1 Validate the NN

I

Delete disqualified models I '

'1 T N ,

Are P values of both variance U } and mean tests > criterion? I ~

r Perform mean corresponding test I

T Perform corresponding ]

variance test

r PerfOrmofnOrmalitYdata test [

T Yes

Is experiment done? I

LlCompute prediction errors of I 7 qualified models I

Yes

Select a best model by prediction errors

Break the tie by process knowledge if a tie exists

l Use hypothesis tests to

compare best models from different CV methods and

hidden layers

Report results

Figure 3 Flowchart of Proposed Procedure

Cybenko, G. (1989). "'Approximation by superpositions of a sigmoi- dal function?' Mathematics of Control, Signals and Systems (v2, n4), ppJ03-314.

Efron, B. (1983). "Estimating the error rate of a prediction rule: improvement on cross-validation?' Journal of the American Sta- tistical Association (v78, n382), pp316-331.

Efron, B. and Tibshirani, R.J. (1998). An Introduction to the Boot- strap. London: Chapman & Hall.

Feng, C-X. and Wang, X-F. (2002). "Subset selection in predictive modeling of CMM digitization uncertainty?" Journal of Manufac- turing Systems (v21, n6), pp419-439.

Feng, C-X. and Wang, X-F. (2003). "Surface roughness predictive modeling: neural networks versus regression." HE Trans. (v35, n i l ppl 1-27.

Feng, C-X. and Wang, X-E (2004). "Data mining applied to predictive modeling of the knurling process." liE Trans. (v36, n3), pp253-263.

Feng, C-X. and Yu, Z. (2003). "Neural networks modeling of turning surface roughness parameters defined by ISO 13565." Transac- tions o f NAMRI/SME (v31). Dearborn, MI: Society of Manufac- turing Engineers, pp. 467-474, (Also published as Technical Paper No. MS03-202).

Feng, C-X.: Wang, X-E: and Yu, Z. (2002). "Neural networks model- ing of honing surface roughness parameters defined by ISO135657' Journal o f Manufacturing Systems (v21, n5), pp395-408.

Feng, C-X.; Yu, Z.: and Kusiak, A. (2006). '*Selection and validation of predictive regression and neural networks models for data from designed experiments?' liE Trans. (v38, n i l pp13-24.

Geman, S.; Bienenstock, E.; and Doursat, R. (1992). "Neural net- works and the bias/variance dilemma." Neural Computation (v4, n i l ppl-58.

George. E.I. (2000). "The variable selection problem." Journal of the American Statistical Association (v95, n452), pp1304-1307.

Gershenfeld, N. (1999). The Nature of Mathematical Modeling. Cam- bridge, UK: Cambridge Univ. Press.

Groth. R. (1998). Data Mining: A Hands On Approach for Business Professionals. Upper Saddle River, NJ: Prentice-Hall.

Hastie, T.; Tibshirani, R.; and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer.

Ingrassia, S. and Morlini, I. (2005). "Neural network modeling of small datasets." Technometrics (v47, n3), pp297-311.

ISO (1996). "Geometrical product specifications (GPS) - surface texture: profile method; surface having stratified functional prop- erties - part 2: height characterization using linear material ratio." ISO 13565-2, 1st ed. Geneva, Switzerland,

Kolen, J.E and Pollack, J.B. (1990). "Backpropagation is sensitive to initial conditions." Complex Systems (v4, n3), pp269-280.

Kusiak, A. (2000). Computational Intelligence in Design and Manu- facturing. New York: John Wiley & Sons.

Lawrence, J. (1994). Introduction to Neural Networks: Design, Theo~, and Applications, 6th ed. Nevada City, CA: California Scientific Software.

Lawrence, J. and Fredrickson, J. (1998). BrainMaker User's Guide and Reference Manual, 7th ed. Nevada City, CA: California Sci- entific Software.

Li, K-C. (1987). "'Asymptotic optimality for Cp, CL, cross-validation and generalized cross-validation: discrete index set." The Annals of Statistics (v15, n3), pp958-975.

Marchandani, G. and Cao, W. (1989). "On hidden nodes for neural nets." IEEE Trans. on Circuits amt Systems (v36, nS), pp661-664.

Minitab (2000). Minitab Release 13 User's Manual. State College, PA: Minitab Inc.

Mitchell, T.M. (1997). Machine Learning. New York: McGraw-Hill. Montgomery, D.C. (2005). Design and Analysis o f Experiments, 6th

ed. New York: John Wiley & Sons. Montgomery, D.C. and Runger, C.M. (2003). Applied Statistics and

Probabili~ for Engineers, 3rd ed. New York: John Wiley & Sons. Shao, J. (1993). "Linear model selection by cross-validation:' Journal

of the American Statistical Association (v88, n422), pp486-494. Snee, R.D. (1977). "Validation of regression models: methods and

examples." Technometrics (v19, n4), pp415-428.

104

Page 13: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal of Munufiwturing Systems Vol. 24/No. 2

2005

Stone, M. (1974). "'Cross-validatory choice and assessment of statis- tical predictions (with discussions)." Journal ~]" Royal Statistical Society (v36, n2), ppl l l -147.

Stone, M. (1977). "'An asymptotic equivalence of choice of model by cross-validation and Akaike's criterion." Journal of Rowd Statis- tical Society Iv39, nl), pp44-47.

Swingler. K. (1996). Applying Neural Networks: A Practical Guide. San Francisco: Morgan Kaufmann Publishers.

Tibshirani, R. (1996). "A comparison of some error estimates for neu- ral network models." Neural Computation (v8, n2), pp152-163.

Twomey, J.M. and Smith, A.E. (1998). "Bias and variance of valida- tion methods for function approximation neural networks under conditions of sparse data." IEEE Trans. on Systems. Man and Cybernetics, Part C (v28, n3). pp417-430.

Wasserman, P.D. (1989). Neural Computing: TheoD' and Practice. New York: Van Nostrand Reinhold.

Witten, I.H. and Frank, E. (2000). Data Mining: Practical Machine Learning Tools and Techniques with Java hnplementations. San Francisco: Morgan Kaufmann Publishers.

Wu, C.F.J. and Hamada, M. (1998). Eaperiments: Planning, Analysis, and Parameter Design Optimization. New York: John W'dey & Sons.

Zhang, P. (t993). "Model selection via multifold cross-validation." Annals of Statistics (v21, nl), pp299-31.

Zhu, H. and Rohwer, R. (1996). "No free lunch for cross-validation." Neural Computation (v8, n6), pp142t-1426.

Yu, Z-G. (2003). "'Selection and validation of predictive regression and neural networks models for experimental data from machin- ing surface roughness studies." MS thesis. Peoria, IL: Dept. of Industrial & Mfg. Engg. & Technology, Bradley Univ.

Au thors' Biographies Chang-Xue Jack Feug is a professor of industrial and manufactur-

ing engineering at Bradley University. He received his PhD and MS degrees in industrial engineering, MS degree in manufacturing engi- neering, and BS degree in mechanical engineering. Prior to joining Bradley, Dr. Feng was with Penn State's Berks-Lehigh Valley College from 1995-1998. He was also secretary and treasurer of the SME Greater Reading (PA) Chapter. He has published more than 60 techni- cal papers, three books, and three book chapters. He has supervised and/or funded more than 25 graduate students, postdoctoral associ- ates, and visiting scholars since 1995.

Dr. Feng has applied computational tools, including s~atistics, op- timization, computational neural networks, and fuzzy logic in inte- grated product and process development, agile/lean manufacturing, and quality and precision engineering. His recent research focuses on data mining applied to production and health care systems and on logistics and assembly engineering, He is a senior member of ASQ, IIE, and SME and a member of ASA and INFORMS.

Zhiguang (Samuel) Yu is a project manager at Supply Chain Ser- vice International Inc. in Peoria, IL. He has an MS degree in manufac- turing engineering from Bradley University and a BS degree in materials science and engineering from Tsinghua University, Beijing, China.

Unnati Kingi is a manufacturing engineer at CBT Companies Inc. - ESS (Peoria, ILL She has an MS degree in industrial engineering from Bradley University and a BS degree in industrial engineering from Osmania University in India.

Mirza Pervaiz Baig is a design engineer at CBT Companies Inc. - ESS (Peoria, ILL He has an MS degree in manufacturing engineering from Bradley University and a BS degree in mechanical engineering from NED University, Karachi, Pakistan.

Appendix Table AI

Factors and Levels of Metal Shaft Turning Experiments

Factors Hardness Feed Nose Radius Depth of Cut Cutting Speed

Level A B C D E

Screening Experiment

Low Steel 8620 0.102 mm/rev. 0.794 mm 0.508 mm 80 m/min. (-1) HRB 86 (0.004 in./rev.) (0.0313 in.) (0.02 in.) (1000 rpm) High AL 6061T 0.254 mm/rev. 6.320 mm 1.016 mm 120 m/min. (+1) HRB 52 (0.010 in./rev.) (0.2500 in.) (0.04 in.) ( 1500 rpm)

Confirmation Experiment

Low Steel 8620 0.051 mm/rev. 0.794 nun 0.762 mm 100 m/min. (-1) HRB 86 (0.002 in./rev.) (0.0313 in.) (0.03 in.) (1250 rpm) High AL 6061T 0.152 mm/rev. 6.32 mm 1.270 mm 144 m/min. (+1) HRB 52 (0.006 in./rev.) (0.25 in.) (0.05 in.) (1800 rpm)

Table A2 Factors and Levels of Engine Cylinder Liner Honing Experiments

Factors RS FS RT FT RP FP

Level Grit Size Grit Size Seconds Seconds MPa MPa

Screening Experiment

Low level 240 500 9 9 3.86 1.10 High level 220 400 7 7 3.45 0.69

Confirmation Experiment

Low level 260 500 9 9 3.86 1.10 High level 240 400 7 7 3.45 0.69

105

Page 14: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

Journal ~( Mt.~ufacturing Systems Vol. 24 /No . 2 2005

"I I i io

i: i -~ 2 .......................... ..~ . . . . . . . . . . . . . . . . . . . . . E ...................... 2 . . . . . . . . . . . . . . . . . . . . . . ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

~ e ~ r e~ ~ ova~t ~ P~m v r ~

2

o~

o

/

(a) Comparison of errors (left) and relative errors (right) of prediction for R,

3

2

1

O,S

0 lS ~ ~

2

1

0

K_ ~ : . ~ - ~ v \j: - v v V

, s l o ~ s a o ~

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , J

(b) Comparison of errors (left) and relative errors (right) of predict ion for Rpt

,I

l~nr d I~/~ awl (ql.l II~

I - ' ~ ~'~"~I

:'

~ Ehror ~ R~,K ( ~ W t (4 IBnd M e d l ~ )

I ' ' ~ - ~ ~ ~ " ' ~ ' ~ 1

E

(c) Comparison of errors (left) and relative errors (right) of prediction for R,k

Figure AI Comparison of Errors and Relative Errors of Prediction for the Four Best Models and Six Roughness Parameters in Turning

106

Page 15: Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data

J o u r n a l o.]" Manu]~tcturing S y s t e m s Vol. 2 4 / N o . 2

2 0 0 5

: i ....................................................................................................................................................................................................

o ! .............................. ~. ............................. 20___ ............ --L ........................... ~ ........................... ~ ............................

(d) Comparison of errors (left) and relative errors (right) of prediction for M,~

8

/ o

-2 .L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

o14 T

. . . . . . . j

olz

o l _ _ i

5 1 o 1 5 ~ a s

(e) Comparison of errors (left) and relative errors (right) of prediction for Mr2

I v o r d ~ O n m (4 l N ¢ ~ )

1 4 ...........................................................................................................................................................................................

1 !

A ., .~ i~ i

o e

o 1 o 1~ ~a a s

~a2

t " - . . . . . . . . . . I

o . _

o

(f) Comparison of errors (left) and relative errors (right) of prediction for R.

Figure AI continued Comparison of Errors and Relative Errors of Prediction for the Four Best Models and Six Roughness Parameters in Turning

1 0 7