classification of wine with artificial neural network · pdf fileclassification of wine with...

Classification of Wine With Artificial Neural

Network

Celik Ozgur

Electrical and Electronics Engineering, Adana Science and Technology University, Adana

[email protected]

Abstract---This paper deals with the using of a

neural network for classification of wine.

Classification plays an important role in human

activity. For classifying a material, the physical

properties, contents of material or further special

information can be used. We tried to implement a

neural network that can classify wines from three

wineries by thirteen attributes. In this work the

two layers feed forward network with back-

propagation (BP) is used to classify the wine

samples. This is an illustration of pattern

recognition problem in which the attributes are

related with different type of wine. Our

designation is to train neural networks to find

which type belongs the wine, when different

attributes are given as input. Keywords---classification, artificial neural

network, wine classification.

I.INTRODUCTION

Wine is one of the most valuable beverages in the

world and it has a wide market all over the world. In

ancient times the quality and origin of wine is

determined by wine experts. Nowadays various

advanced devices are improved. Thanks to these

devices the type of the wines can be determined easily.

Our assignment is to train neural networks to predict

witch type belongs the wine, when it is given other

attributes as input [1]. Wine is a complex product, and

its composition depends on many and diverse factors

such as grape variety, edafoclimatic conditions and

enological practices [2] [4]. All of them have an

important influence on the quality of the wine, and

they are very important in the characterization and

differentiation of wines, with applications to the

detection of frauds [3] [4]. Several authors publicated

that works applying different of these multivariate

techniques in the differentiation and classification of

wines according to their geographical origin, obtained

good results and using different parameters and/or

characteristics such as phenolic compounds, volatile

and aroma compounds, colour parameters, metals,

trace elements, sensorial parameters, etc. [4–15].

Neural networks can solve some really interesting

problems like specifying wine type by using chemical

properties, once they are trained. They are very good

at pattern recognition problems and with enough

elements (called neurons) can classify any data with

arbitrary accuracy [5]. There are many reasons which

deserve to be investigated for using neural networks -

they are data driven, they are self-adaptive, they can

approximate any function - linear as well as non-linear

(which is quite important in this case because groups

often cannot be divided by linear functions). Neural

networks classify objects rather simply - they take data

as input, derive rules based on those data, and make

decisions [1]. The aim of this paper is to define how

neural networks are used to resolve problems of wine

classification. Data which I used are the results of a

chemical analysis of wines grown in the same region

in Italy and it taken from

http://archive.ics.uci.edu/ml/datasets/Wine. Data list

contains 13 different properties used for comparison,

and 3 different classes of output. The attributes are;

Alcohol, Malic acid, Ash, Alcalinity of ash,

Magnesium , Total phenols, Flavanoids ,

Nonflavanoid phenols, Proanthocyanins, Color

Intensity, Hue, OD280/OD315 of dedulted wines,

Proline. Data set contains 178 instances.Each instance

has one of 3 possible classes: 1,2 and 3 [5].

II.OVERVIEW OF ARTIFICIAL NEURAL NETWORK

Artificial neural networks (ANNs) have been

successfully employed in solving complex problems in

various fields of application including pattern

recognition, identification, classification, speech,

vision, and control systems.

The nodes can be seen as computational units. They

receive inputs, and process them to obtain an output.

This processing might be very simple (such as

summing the inputs),or quite complex (a node might

contain another network...). Basic neuron structure

contains input, weights, summing junction, activation,

threshold, output.

http://archive.ics.uci.edu/ml/datasets/Wine

f(.)...

TresholdØk

Summing

Junction

Activation

functionx1

x2

xn

wk0

wk1

wk2

wk0

Input Weights

Output

yk

Fig 1: Basic neuron structure

The weights are defined randomly. Because of this we

get different results at any trial. Weights are multiplied

with inputs and then added in summing junction and

lastly output of the summing junction is processed in

activation function and it gives a output.

III.OVERVIEW OF CURRENT METHODS

ANN modeling has been used extensively in the last

decade for spectra modeling, for analyses

concentration prediction or for compound

classification [4]. Artificial neural networks (ANNs)

have supplied effective solutions to complex

problems. Also a number of methods have proposed

for applying problem and obtaining reasonable results.

The methods are chosen according to problem for

supplying effective solution. Some of them explained

below;

An algorithm called “PCA-CG”, developed by Hugo

Guterman in 1990, and published in 1994 [16],

overcomes the difficulties of training large scale ANN

models by calculating from the training data a set of

non-random initial connection weights. It also

estimates the optimal number (surprisingly small) of

neurons in the hidden layer. Using an algorithm that

enables escape from local minima, it was shown that

the PCA-CG algorithm is capable of a very efficient

training [17]. Concurrently, an algorithm was

developed in 1990, and published in 1997, to estimate

the contribution of each input and hidden neuron to

overall learning. Those with small contribution could

be discarded and the ANN models re-trained with the

reduced input set or a different number of hidden

neurons [20]. Combining all these algorithms with

another algorithm that tries to steer clear from getting

into local minima, real-life large-scale ANN models

with hundreds and thousands inputs and outputs were

easily trained during the last decade by the

Guterman–Boger algorithm set [18,19]. Gradient

descent backpropagation algorithm: this algorithm is

one of the most popular training algorithms in the

domain of neural networks. It works by measuring the

output error, calculating the gradient of this error, and

adjusting the ANN weights (and biases) in the

descending gradient direction. Hence, this method is a

gradient descent local search procedure [21]. This

algorithm includes different versions such as: standard

or incremental back-propagation (IBP): the network

weights are updated after presenting each pattern from

the learning data set, rather than once per iteration

[22]; batch back-propagation (BBP): the network

weights update takes place once per iteration, while all

learning data pattern are processed through the

network [23-24]; quick propagation (QP): QPis a

heuristic modification of the back-propagation

algorithm. It is proved much faster than IBP for many

problems. QP is also defined as: mixed learning

heuristics without momentum, learning rate optimized

during training [25]. Levenberg–Marquardt

backpropagation: the Levenberg– Marquardt (LM)

algorithm approximates to the Newton method and has

been also used for ANN training. The Newton method

approximates the error of the network with a second

order expression, which contrasts to the former

category which follows a first order expression. LM is

popular in the ANN domain (even it is considered as

the first approach for an unseen MLP training task)

[24]. Scaled Conjugate Gradient Algorithm: he

basic back-propagation algorithm adjusts the weights

in the steepest descent direction (the most negative of

the gradients). This is the direction in which the

performance function is decreasing most rapidly. It

turns out that, although the function decreases most

rapidly along the negative of the gradient, this does

not necessarily produce the fastest convergence. In the

conjugate gradient algorithms a search is performed

along conjugate directions, which produces generally

faster convergence than steepest descent direction

[28]. Genetic algorithm: genetic algorithm (GA) is a

stochastic general search method. It proceeds in an

iterative manner by generating new populations of

individuals from the old ones. This algorithm applies

stochastic operators such as selection, crossover, and

mutation on an initially random population in order to

compute a new population [26]. The search features of

the GA contrast with those of the gradient descent and

LM in that it is not trajectorydriven, but population-

driven. The GA is expected to avoid local optima

frequently by promoting exploration of the search

space, in opposition to the exploitative trend usually

allocated to local search algorithms like gradient

descent or LM [27]

IV.ARTIFICIAL NEURAL NETWORK ANALYSES

An artificial neural network system with 178x13 input

matrix and 178x3 output matrix is created thanks to

MATLAB Neural Networks Toolbox. Number of

input is 13, and number of the output is 3.Also number

of hidden neurons is specified as 10. Some of the

system parameters are given as epoch = 100, learning

rate = 0.2, training function = back-propagation,

transfer functions = sigmoid. 70% of the input data is

used to train the system, and 15% of the input data is

used to test the results. I used two layer feed forward

network with sigmoid hidden and output layer. Some

of the other training functions will be also used and

results will be compared with each other in the scope

of the project. We have checked 'Use Bias Neurons',

and chosen sigmoid transfer function (because the

range of our data is 0-1, had it been -1 to 1, we would

check 'Tanh'). As a learning rule we have chosen

'Backpropagation with Gradient'. This learning rule

will be used in all the networks we create, because

backpropagation is most commonly used technique

and is most suited for this type of problem. In this

method, the objects in the training set are given to the

network one by one in random order and the

regression coefficients are updated each time in order

to make the current prediction error as small as it can

be. This process continues until convergence of the

regression coefficients [1].

Figure 2: Two layer feed forward network with

sigmoid hidden and output layer

Fig 3: Implementation of Neural network

You can see the implementation of neural network at

flowchart above. The steps of implementation are

given clearly. Also we fallow these steps and the

results were obtained as below.

a)Training attempt 1;

Learning Technique: Supervised Learning

Network: Multilayer feed forward network

Learning Algorithm: Levenberg-Marquardt

Number of Neurons in Hidden Layer: 10

Performance Function: Mean Squarred Error

Transfer Function: Sigmoid

Results;

performance = 0.0029

trainPerformance = 6.8896e-06

valPerformance = 0.0065

testPerformance = 0.0126

Figure 4: Confusion matrix with 10 neurons.

Figure 5: MSE vs. Epochs for first training attempt.

Figure 6: Regression Graphic

b) Training attempt 2;



Learning Algorithm: Levenberg-Marquardt




Results;


trainPerformance = 3.4021e-04




Figure 8: MSE vs. Epochs for second training attempt.


c) Training attempt 3;



Learning Algorithm: Scaled Conjugate Gradient




Results;


trainPerformance = 0.0031






d)Training attempt 4;



Learning Algorithm: Scaled Conjugate Gradient




Results;


trainPerformance = 0.1219






Levenberg – Marquardt Learning Algorithm(with 10 neurons in hidden layer)

Levenberg – Marquardt Learning Algorithm(with 20 neurons in hidden layer)

Scaled Conjugate Gradient Learning Algorithm(with 10 neurons in hidden layer)

Scaled Conjugate Gradient Learning Algorithm(with 20 neurons in hidden layer)

performance 0.0029 0.0041 0.0070 0.1121

trainPerformance 6.8896e-06 3.4021e-04 0.0031 0.1219

valPerformance 0.0065 0.0175 0.0049 0.0494

testPerformance 0.0126 0.0080 0.0266 0.1297

Table 1: Shows results with different learning algorithms and different neuron numbers

V.Expected Results

In this paper, the identification of the three distinct

Italian wines has been attempted. The training

examples demonstrated that numbers of hidden

neurons may affect results. To make a reliable

comparison, input data size 178x13 is sufficient.

When I analyze the results, network with Levenberg-

Marquardt learning algorithm with 10 neurons in

hidden layer gives better than other algorithms.

Weights are defined randomly because of this we get

different results in any trial. I can say the Levenberg-

Marquardt learning algorithm is faster because we get

results at 10 epochs. Also we have to adjust some

parameters like function,weights etc. The desired

values are obtained by using the two layers feed

forward network with Levenberg Marquardt (LM)

method.

References

[1] PREDICTING THE CLASS OF WINE WITH

NEURAL NETWORKS An example of a multivariate

data type classification problem using Neuroph by

Ana Nikolic, Faculty of Organisation Sciences,

University of Belgrade

[2] G. Troost, Tecnolog´ıa del vino, Omega S.A.,

Barcelona, Spain, 1985.

[3] I.S. Arvanitoyannis, M.N. Katsota, E.P. Psarra,

E.H. Soufleros, S. Kallithraka, Trends Food Sci.

Technol. 10 (1999) 321.

[4] S. Pérez-Magariñoa, M. Ortega-Herasa, M.L.

González-San Joséa,∗ , Z. Bogerb,c ,Comparative

study of artificial neural network and multivariate

methods to classify Spanish DO rose wines, Talanta

62 (2004) 983–990

[5] WINE CLASSIFICATION USING NEURAL

NETWORKS An example of a multivariate data type

classification problem using Neuroph framework by

Milica Stojković, Faculty of Organizational Sciences,

University of Belgrade

[6] M. Forina, C. Armanino, M. Castino, M. Ubigli,

Vitis 25 (1986) 189.

[7] R.M. Tapias, P. Callao, M.S. Larrechi, J. Guash,

F.X. Rius, Connaiss. Vigne Vin 21 (1987) 43.

[8] M.L. González-Sanjosé, G. Santa-Mar´ıa, C.

D´ıez, J. Food Comp. Anal. 3 (1990) 54.

[9] I. Moret, G. Scarponi, P. Cescon, J. Agric. Food

Chem. 42 (1994) 1143.

[10] C.M. Garc´ıa-Jares, M.S. Garc´ıa-Mart´ın, N.

Carro-Mariño, R.

Cela-Torrijos, J. Sci. Food Agric. 69 (1995) 175.

[11] L. Almela, S. Javaloy, J.A. Fernández-López,

J.M. López-Roca, J. Sci. Food Agric. 70 (1996) 173.

[12] J.M. Baxter, M.E. Crews, J. Dennis, I. Goodall,

D. Anderson, Food Chem. 60 (1997) 443.

[13] M. Ortega-Heras, M.L. González-Sanjosé, S.

Beltrán, Qu´ım. Anal. 18 (1999) 127.

[14] H.K. Sivertsen, B. Holen, F. Nicolaysen, E.

Risvk, J. Sci. Food Agric. 79 (1999) 107.

[15] S. Pérez-Magariño, M.L. González-SanJosé,

Food Sci. Technol. Int. 7 (2001) 237.

[16] H. Guterman, Neural, Parallel Sci. Comput. 2

(1994) 43.

[17] Z. Boger, Instrum. Soc. Am. Trans. 31 (1992) 25.

[18] Z. Boger, Inform. Sci. Appl. 101 (1997) 203.

[19] Z. Boger, in: Proceedings of the International

Joint Conference on Neural Networks, Honolulu,

Hawaii, 2002, pp. 2000–2005.

[20] Z. Boger, H. Guterman, in: Proceedings of the

IEEE International Conference on Systems Man and

Cybernetics, Orlando, FL, USA, 1997, pp. 3030–3035.

[21] Rumelhart, D., Hinton, G., Williams, R., 1986.

Learning representations by backpropagation errors.

Nature 323, 533–536.

[22] Freeman, J.A., Skappura, D.M., 1991. Neural

Networks Algorithms, Applications and Programming

Techniques. Addison-Wesley, Houston, pp. 12– 105.

[23] Hagan, M.T., Demuth, H.B., Beale, M.H., 1996.

Neural Network Design. PWS, Boston.

[24] Hagan, M.T., Menhaj, M.B., 1994. Training

feedforward networks with the Marquardt algorithm.

IEEE Trans. Neural Netw. 5, 989–993.

[25] Fahlman, S.E., 1988. Faster-learning variations

on back-propagation: an empirical study. In:

Proceedings of the 1988 Connectionist Models

Summer School.

[26] Holland, J.H., 1975. Adaptation in Natural and

Artificial Systems. The University of Michigan Press,

Michigan.

[27] A. Ghaffari a, H. Abdollahi b,∗ ,M.R.

Khoshayand a, I. Soltani Bozchalooi c, A. Dadgar a,

M. Rafiee-Tehrani a,Performance comparison of

neural network training algorithms in modeling of

bimodal drug delivery, International Journal of

Pharmaceutics 327 (2006) 126–138

[28] Adeli H & Hung S L, Machine learning neural

networks, genetic algorithms,and fuzzy systems(John

Wiley & Sons Inc., New York, NY, USA), 1995.

classification of wine with artificial neural network · pdf fileclassification of wine with...

Documents