classification of wine with artificial neural network · pdf fileclassification of wine with...
TRANSCRIPT
Classification of Wine With Artificial Neural
Network
Celik Ozgur
Electrical and Electronics Engineering, Adana Science and Technology University, Adana
Abstract---This paper deals with the using of a
neural network for classification of wine.
Classification plays an important role in human
activity. For classifying a material, the physical
properties, contents of material or further special
information can be used. We tried to implement a
neural network that can classify wines from three
wineries by thirteen attributes. In this work the
two layers feed forward network with back-
propagation (BP) is used to classify the wine
samples. This is an illustration of pattern
recognition problem in which the attributes are
related with different type of wine. Our
designation is to train neural networks to find
which type belongs the wine, when different
attributes are given as input. Keywords---classification, artificial neural
network, wine classification.
I.INTRODUCTION
Wine is one of the most valuable beverages in the
world and it has a wide market all over the world. In
ancient times the quality and origin of wine is
determined by wine experts. Nowadays various
advanced devices are improved. Thanks to these
devices the type of the wines can be determined easily.
Our assignment is to train neural networks to predict
witch type belongs the wine, when it is given other
attributes as input [1]. Wine is a complex product, and
its composition depends on many and diverse factors
such as grape variety, edafoclimatic conditions and
enological practices [2] [4]. All of them have an
important influence on the quality of the wine, and
they are very important in the characterization and
differentiation of wines, with applications to the
detection of frauds [3] [4]. Several authors publicated
that works applying different of these multivariate
techniques in the differentiation and classification of
wines according to their geographical origin, obtained
good results and using different parameters and/or
characteristics such as phenolic compounds, volatile
and aroma compounds, colour parameters, metals,
trace elements, sensorial parameters, etc. [4–15].
Neural networks can solve some really interesting
problems like specifying wine type by using chemical
properties, once they are trained. They are very good
at pattern recognition problems and with enough
elements (called neurons) can classify any data with
arbitrary accuracy [5]. There are many reasons which
deserve to be investigated for using neural networks -
they are data driven, they are self-adaptive, they can
approximate any function - linear as well as non-linear
(which is quite important in this case because groups
often cannot be divided by linear functions). Neural
networks classify objects rather simply - they take data
as input, derive rules based on those data, and make
decisions [1]. The aim of this paper is to define how
neural networks are used to resolve problems of wine
classification. Data which I used are the results of a
chemical analysis of wines grown in the same region
in Italy and it taken from
http://archive.ics.uci.edu/ml/datasets/Wine. Data list
contains 13 different properties used for comparison,
and 3 different classes of output. The attributes are;
Alcohol, Malic acid, Ash, Alcalinity of ash,
Magnesium , Total phenols, Flavanoids ,
Nonflavanoid phenols, Proanthocyanins, Color
Intensity, Hue, OD280/OD315 of dedulted wines,
Proline. Data set contains 178 instances.Each instance
has one of 3 possible classes: 1,2 and 3 [5].
II.OVERVIEW OF ARTIFICIAL NEURAL NETWORK
Artificial neural networks (ANNs) have been
successfully employed in solving complex problems in
various fields of application including pattern
recognition, identification, classification, speech,
vision, and control systems.
The nodes can be seen as computational units. They
receive inputs, and process them to obtain an output.
This processing might be very simple (such as
summing the inputs),or quite complex (a node might
contain another network...). Basic neuron structure
contains input, weights, summing junction, activation,
threshold, output.
f(.)...
TresholdØk
Summing
Junction
Activation
functionx1
x2
xn
wk0
wk1
wk2
wk0
Input Weights
Output
yk
Fig 1: Basic neuron structure
The weights are defined randomly. Because of this we
get different results at any trial. Weights are multiplied
with inputs and then added in summing junction and
lastly output of the summing junction is processed in
activation function and it gives a output.
III.OVERVIEW OF CURRENT METHODS
ANN modeling has been used extensively in the last
decade for spectra modeling, for analyses
concentration prediction or for compound
classification [4]. Artificial neural networks (ANNs)
have supplied effective solutions to complex
problems. Also a number of methods have proposed
for applying problem and obtaining reasonable results.
The methods are chosen according to problem for
supplying effective solution. Some of them explained
below;
An algorithm called “PCA-CG”, developed by Hugo
Guterman in 1990, and published in 1994 [16],
overcomes the difficulties of training large scale ANN
models by calculating from the training data a set of
non-random initial connection weights. It also
estimates the optimal number (surprisingly small) of
neurons in the hidden layer. Using an algorithm that
enables escape from local minima, it was shown that
the PCA-CG algorithm is capable of a very efficient
training [17]. Concurrently, an algorithm was
developed in 1990, and published in 1997, to estimate
the contribution of each input and hidden neuron to
overall learning. Those with small contribution could
be discarded and the ANN models re-trained with the
reduced input set or a different number of hidden
neurons [20]. Combining all these algorithms with
another algorithm that tries to steer clear from getting
into local minima, real-life large-scale ANN models
with hundreds and thousands inputs and outputs were
easily trained during the last decade by the
Guterman–Boger algorithm set [18,19]. Gradient
descent backpropagation algorithm: this algorithm is
one of the most popular training algorithms in the
domain of neural networks. It works by measuring the
output error, calculating the gradient of this error, and
adjusting the ANN weights (and biases) in the
descending gradient direction. Hence, this method is a
gradient descent local search procedure [21]. This
algorithm includes different versions such as: standard
or incremental back-propagation (IBP): the network
weights are updated after presenting each pattern from
the learning data set, rather than once per iteration
[22]; batch back-propagation (BBP): the network
weights update takes place once per iteration, while all
learning data pattern are processed through the
network [23-24]; quick propagation (QP): QPis a
heuristic modification of the back-propagation
algorithm. It is proved much faster than IBP for many
problems. QP is also defined as: mixed learning
heuristics without momentum, learning rate optimized
during training [25]. Levenberg–Marquardt
backpropagation: the Levenberg– Marquardt (LM)
algorithm approximates to the Newton method and has
been also used for ANN training. The Newton method
approximates the error of the network with a second
order expression, which contrasts to the former
category which follows a first order expression. LM is
popular in the ANN domain (even it is considered as
the first approach for an unseen MLP training task)
[24]. Scaled Conjugate Gradient Algorithm: he
basic back-propagation algorithm adjusts the weights
in the steepest descent direction (the most negative of
the gradients). This is the direction in which the
performance function is decreasing most rapidly. It
turns out that, although the function decreases most
rapidly along the negative of the gradient, this does
not necessarily produce the fastest convergence. In the
conjugate gradient algorithms a search is performed
along conjugate directions, which produces generally
faster convergence than steepest descent direction
[28]. Genetic algorithm: genetic algorithm (GA) is a
stochastic general search method. It proceeds in an
iterative manner by generating new populations of
individuals from the old ones. This algorithm applies
stochastic operators such as selection, crossover, and
mutation on an initially random population in order to
compute a new population [26]. The search features of
the GA contrast with those of the gradient descent and
LM in that it is not trajectorydriven, but population-
driven. The GA is expected to avoid local optima
frequently by promoting exploration of the search
space, in opposition to the exploitative trend usually
allocated to local search algorithms like gradient
descent or LM [27]
IV.ARTIFICIAL NEURAL NETWORK ANALYSES
An artificial neural network system with 178x13 input
matrix and 178x3 output matrix is created thanks to
MATLAB Neural Networks Toolbox. Number of
input is 13, and number of the output is 3.Also number
of hidden neurons is specified as 10. Some of the
system parameters are given as epoch = 100, learning
rate = 0.2, training function = back-propagation,
transfer functions = sigmoid. 70% of the input data is
used to train the system, and 15% of the input data is
used to test the results. I used two layer feed forward
network with sigmoid hidden and output layer. Some
of the other training functions will be also used and
results will be compared with each other in the scope
of the project. We have checked 'Use Bias Neurons',
and chosen sigmoid transfer function (because the
range of our data is 0-1, had it been -1 to 1, we would
check 'Tanh'). As a learning rule we have chosen
'Backpropagation with Gradient'. This learning rule
will be used in all the networks we create, because
backpropagation is most commonly used technique
and is most suited for this type of problem. In this
method, the objects in the training set are given to the
network one by one in random order and the
regression coefficients are updated each time in order
to make the current prediction error as small as it can
be. This process continues until convergence of the
regression coefficients [1].
Figure 2: Two layer feed forward network with
sigmoid hidden and output layer
Fig 3: Implementation of Neural network
You can see the implementation of neural network at
flowchart above. The steps of implementation are
given clearly. Also we fallow these steps and the
results were obtained as below.
a)Training attempt 1;
Learning Technique: Supervised Learning
Network: Multilayer feed forward network
Learning Algorithm: Levenberg-Marquardt
Number of Neurons in Hidden Layer: 10
Performance Function: Mean Squarred Error
Transfer Function: Sigmoid
Results;
performance = 0.0029
trainPerformance = 6.8896e-06
valPerformance = 0.0065
testPerformance = 0.0126
Figure 4: Confusion matrix with 10 neurons.
Figure 5: MSE vs. Epochs for first training attempt.
Figure 6: Regression Graphic
b) Training attempt 2;
Learning Technique: Supervised Learning
Network: Multilayer feed forward network
Learning Algorithm: Levenberg-Marquardt
Number of Neurons in Hidden Layer: 20
Performance Function: Mean Squarred Error
Transfer Function: Sigmoid
Results;
performance = 0.0041
trainPerformance = 3.4021e-04
valPerformance = 0.0175
testPerformance = 0.0080
Figure 7: Confusion matrix with 20 neurons.
Figure 8: MSE vs. Epochs for second training attempt.
Figure 9: Regression Graphic
c) Training attempt 3;
Learning Technique: Supervised Learning
Network: Multilayer feed forward network
Learning Algorithm: Scaled Conjugate Gradient
Number of Neurons in Hidden Layer: 10
Performance Function: Mean Squarred Error
Transfer Function: Sigmoid
Results;
performance = 0.0070
trainPerformance = 0.0031
valPerformance = 0.0049
testPerformance = 0.0266
Figure 10: Confusion matrix with 10 neurons.
Figure 11: MSE vs. Epochs for first training attempt.
Figure 12: Regression Graphic
d)Training attempt 4;
Learning Technique: Supervised Learning
Network: Multilayer feed forward network
Learning Algorithm: Scaled Conjugate Gradient
Number of Neurons in Hidden Layer: 20
Performance Function: Mean Squarred Error
Transfer Function: Sigmoid
Results;
performance = 0.1121
trainPerformance = 0.1219
valPerformance = 0.0494
testPerformance = 0.1297
Figure 13: Confusion matrix with 10 neurons.
Figure 14: MSE vs. Epochs for first training attempt.
Figure 15: Regression Graphic
Levenberg – Marquardt Learning Algorithm(with 10 neurons in hidden layer)
Levenberg – Marquardt Learning Algorithm(with 20 neurons in hidden layer)
Scaled Conjugate Gradient Learning Algorithm(with 10 neurons in hidden layer)
Scaled Conjugate Gradient Learning Algorithm(with 20 neurons in hidden layer)
performance 0.0029 0.0041 0.0070 0.1121
trainPerformance 6.8896e-06 3.4021e-04 0.0031 0.1219
valPerformance 0.0065 0.0175 0.0049 0.0494
testPerformance 0.0126 0.0080 0.0266 0.1297
Table 1: Shows results with different learning algorithms and different neuron numbers
V.Expected Results
In this paper, the identification of the three distinct
Italian wines has been attempted. The training
examples demonstrated that numbers of hidden
neurons may affect results. To make a reliable
comparison, input data size 178x13 is sufficient.
When I analyze the results, network with Levenberg-
Marquardt learning algorithm with 10 neurons in
hidden layer gives better than other algorithms.
Weights are defined randomly because of this we get
different results in any trial. I can say the Levenberg-
Marquardt learning algorithm is faster because we get
results at 10 epochs. Also we have to adjust some
parameters like function,weights etc. The desired
values are obtained by using the two layers feed
forward network with Levenberg Marquardt (LM)
method.
References
[1] PREDICTING THE CLASS OF WINE WITH
NEURAL NETWORKS An example of a multivariate
data type classification problem using Neuroph by
Ana Nikolic, Faculty of Organisation Sciences,
University of Belgrade
[2] G. Troost, Tecnolog´ıa del vino, Omega S.A.,
Barcelona, Spain, 1985.
[3] I.S. Arvanitoyannis, M.N. Katsota, E.P. Psarra,
E.H. Soufleros, S. Kallithraka, Trends Food Sci.
Technol. 10 (1999) 321.
[4] S. Pérez-Magariñoa, M. Ortega-Herasa, M.L.
González-San Joséa,∗ , Z. Bogerb,c ,Comparative
study of artificial neural network and multivariate
methods to classify Spanish DO rose wines, Talanta
62 (2004) 983–990
[5] WINE CLASSIFICATION USING NEURAL
NETWORKS An example of a multivariate data type
classification problem using Neuroph framework by
Milica Stojković, Faculty of Organizational Sciences,
University of Belgrade
[6] M. Forina, C. Armanino, M. Castino, M. Ubigli,
Vitis 25 (1986) 189.
[7] R.M. Tapias, P. Callao, M.S. Larrechi, J. Guash,
F.X. Rius, Connaiss. Vigne Vin 21 (1987) 43.
[8] M.L. González-Sanjosé, G. Santa-Mar´ıa, C.
D´ıez, J. Food Comp. Anal. 3 (1990) 54.
[9] I. Moret, G. Scarponi, P. Cescon, J. Agric. Food
Chem. 42 (1994) 1143.
[10] C.M. Garc´ıa-Jares, M.S. Garc´ıa-Mart´ın, N.
Carro-Mariño, R.
Cela-Torrijos, J. Sci. Food Agric. 69 (1995) 175.
[11] L. Almela, S. Javaloy, J.A. Fernández-López,
J.M. López-Roca, J. Sci. Food Agric. 70 (1996) 173.
[12] J.M. Baxter, M.E. Crews, J. Dennis, I. Goodall,
D. Anderson, Food Chem. 60 (1997) 443.
[13] M. Ortega-Heras, M.L. González-Sanjosé, S.
Beltrán, Qu´ım. Anal. 18 (1999) 127.
[14] H.K. Sivertsen, B. Holen, F. Nicolaysen, E.
Risvk, J. Sci. Food Agric. 79 (1999) 107.
[15] S. Pérez-Magariño, M.L. González-SanJosé,
Food Sci. Technol. Int. 7 (2001) 237.
[16] H. Guterman, Neural, Parallel Sci. Comput. 2
(1994) 43.
[17] Z. Boger, Instrum. Soc. Am. Trans. 31 (1992) 25.
[18] Z. Boger, Inform. Sci. Appl. 101 (1997) 203.
[19] Z. Boger, in: Proceedings of the International
Joint Conference on Neural Networks, Honolulu,
Hawaii, 2002, pp. 2000–2005.
[20] Z. Boger, H. Guterman, in: Proceedings of the
IEEE International Conference on Systems Man and
Cybernetics, Orlando, FL, USA, 1997, pp. 3030–3035.
[21] Rumelhart, D., Hinton, G., Williams, R., 1986.
Learning representations by backpropagation errors.
Nature 323, 533–536.
[22] Freeman, J.A., Skappura, D.M., 1991. Neural
Networks Algorithms, Applications and Programming
Techniques. Addison-Wesley, Houston, pp. 12– 105.
[23] Hagan, M.T., Demuth, H.B., Beale, M.H., 1996.
Neural Network Design. PWS, Boston.
[24] Hagan, M.T., Menhaj, M.B., 1994. Training
feedforward networks with the Marquardt algorithm.
IEEE Trans. Neural Netw. 5, 989–993.
[25] Fahlman, S.E., 1988. Faster-learning variations
on back-propagation: an empirical study. In:
Proceedings of the 1988 Connectionist Models
Summer School.
[26] Holland, J.H., 1975. Adaptation in Natural and
Artificial Systems. The University of Michigan Press,
Michigan.
[27] A. Ghaffari a, H. Abdollahi b,∗ ,M.R.
Khoshayand a, I. Soltani Bozchalooi c, A. Dadgar a,
M. Rafiee-Tehrani a,Performance comparison of
neural network training algorithms in modeling of
bimodal drug delivery, International Journal of
Pharmaceutics 327 (2006) 126–138
[28] Adeli H & Hung S L, Machine learning neural
networks, genetic algorithms,and fuzzy systems(John
Wiley & Sons Inc., New York, NY, USA), 1995.