winedm_formated

3
National Conference on Advances in Computing ( NCAC’13 ), 05-06 March 2013 Wine Fermentation: By Using Data Mining Technique Abstract The recent applications of data mining techniques in the field of agriculture. In this paper, we consider the problem of discovering problematic wine fermentation s at the early stages of the process, by using sensor data information.  Problematic fe rmentations can c ause losses to w ine makers, bec ause such fermentat ions could be too slow to provide the  final product, or they may even be come stagnant. Keywords: Wine fermentation, Clustering, Biclustering, Data mining.  _____________________________________ _____________________ 1. Introduction Wine is widely produced around the world. Industrial production of wine is an important business in many countries. For this reason, the study of the fermentation process, which is able to transform grape  juice into the alcoholic beverage, is of increasing inte re st in the fi el d of agri cult ure. Pr oble ma ti c fermentations, indeed, may cause losses to industries. If a fer men tat ion proces s is slo wer than usu al, for example, the final product is produced in a longer time. More over, in the worst case , when the fermenta tion  process gets stuck, a part of the production could be completely spoiled. Data mining is a field of operations research that analyzes large databases with the aim of acquiring nove l knowle dge. In re ce nt ye ars, da ta mi ni ng te chni ques ha ve sp ec if ic al ly be en ap pl ied to agri cult ur al pr oblems in or de r to fi nd important information about the problem under study. In the case of wi ne fe rmenta ti ons, a data ba se of compound mea sureme nts , tak en at dif fer ent times dur ing the fermentation process, can be exploited for extracting information that can help the prediction of problematic fermentations[1]. 2. Wine fermentations Wine is widely produced all over the world. There exist different types of wine, which depend by diff erent fact ors, and espe cial ly by the origin of the grapes that are employed in the production. A common  point for all wine s is the ferment ation process, in wh ich the sug ar con tai ned in the gr ape s is tra nsf ormed in alco ho l. This is a ve ry de lica te pr oc ess. Wh en  producing wine industrially, indeed, large quantities of wi ne ma y ge t spoi le d be ca use of a pr oble ma ti c fermentation process, causing losses to the industry. In order to overcome to this issue, a prediction of the  problematic wine fermentations could be attempted, so that an enologist can interfere with the process in time for guaranteeing a good fermentation. In order to moni tor wine fermen tatio n proce sses, meta bolit es such as, for example, gluc ose, fruc tose, organic acids, glycerol and ethanol can be measured, and the data obtained during the fermentation process can be analyzed in order to obtain useful information. However, analyses are usually limited to data that are obt ain ed wit hin the fir st 3 day s of fer men tat ion.  Naturally, this is done in order to learn about a poss ible  problematic fermentation at the beginning of the  process. Fermentations can be divided in 3 classes: the first class conta ins normal fermentations, while the second and the third one contain the problematic ones. In particular, the second class contains fermentations which are slow, in the sense that they can bring the wine to the end of the production, but in an amount of time which is longer than usual. Finally, the third class conta ins stuc k fermenta tions , i.e. fermentat ions that stop at a certain moment and they are not able to give the final product. School of Computer Sciences, North Maharashtra University , Jalgaon, India Mr. R.D. Magare Mr. D.M. Rana Ms. V.R. Jadhav Mr. G.S. Ragde Milind College Of Science Milind College Of Science Milind College Of Science Auranagabad [M.S],India Au ran agabad [M.S], Indi a Au ran agabad [M.S],I nd ia Auranaga bad [M.S] ,In di a [email protected] [email protected] vab23jan2000rediffmail.com [email protected]  

Upload: 2010ganesh

Post on 04-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: winedm_formated

7/29/2019 winedm_formated

http://slidepdf.com/reader/full/winedmformated 1/3

National Conference on Advances in Computing ( NCAC’13 ), 05-06 March 2013

Wine Fermentation: By Using Data Mining Technique

Abstract

The recent applications of data mining techniques in the field of agriculture. In this paper, we consider the problem of 

discovering problematic wine fermentations at the early stages of the process, by using sensor data information.

 Problematic fermentations can cause losses to wine makers, because such fermentations could be too slow to provide the

 final product, or they may even become stagnant.

Keywords: Wine fermentation, Clustering, Biclustering, Data mining.

 __________________________________________________________________________________________________

1. Introduction

Wine is widely produced around the world.

Industrial production of wine is an important business

in many countries. For this reason, the study of the

fermentation process, which is able to transform grape

 juice into the alcoholic beverage, is of increasinginterest in the field of agriculture. Problematic

fermentations, indeed, may cause losses to industries.

If a fermentation process is slower than usual, for 

example, the final product is produced in a longer time.

Moreover, in the worst case, when the fermentation process gets stuck, a part of the production could be

completely spoiled.

Data mining is a field of operations research

that analyzes large databases with the aim of acquiring

novel knowledge. In recent years, data miningtechniques have specifically been applied to

agricultural problems in order to find important

information about the problem under study. In the case

of wine fermentations, a database of compound

measurements, taken at different times during the

fermentation process, can be exploited for extracting

information that can help the prediction of problematic

fermentations[1].

2. Wine fermentations

Wine is widely produced all over the world.

There exist different types of wine, which depend by

different factors, and especially by the origin of the

grapes that are employed in the production. A common

 point for all wines is the fermentation process, in which

the sugar contained in the grapes is transformed in

alcohol. This is a very delicate process. When

 producing wine industrially, indeed, large quantities of 

wine may get spoiled because of a problematicfermentation process, causing losses to the industry. In

order to overcome to this issue, a prediction of the

 problematic wine fermentations could be attempted, so

that an enologist can interfere with the process in time

for guaranteeing a good fermentation.In order to monitor wine fermentation processes,

metabolites such as, for example, glucose, fructose,

organic acids, glycerol and ethanol can be measured,

and the data obtained during the fermentation process

can be analyzed in order to obtain useful information.However, analyses are usually limited to data that are

obtained within the first 3 days of fermentation.

 Naturally, this is done in order to learn about a possible

 problematic fermentation at the beginning of the

 process. Fermentations can be divided in 3 classes: the

first class contains normal fermentations, while the

second and the third one contain the problematic ones.

In particular, the second class contains fermentationswhich are slow, in the sense that they can bring the

wine to the end of the production, but in an amount of 

time which is longer than usual. Finally, the third class

contains stuck fermentations, i.e. fermentations that

stop at a certain moment and they are not able to give

the final product. 

School of Computer Sciences, North Maharashtra University, Jalgaon, India

Mr. R.D. Magare Mr. D.M. Rana Ms. V.R. Jadhav Mr. G.S. Ragde

Milind College Of Science Milind College Of Science Milind College Of Science Auranagabad [M.S]

Auranagabad [M.S],India Auranagabad [M.S],India Auranagabad [M.S],India

[email protected]  [email protected]  vab23jan2000rediffmail.com rajiv_magare@yaho

Page 2: winedm_formated

7/29/2019 winedm_formated

http://slidepdf.com/reader/full/winedmformated 2/3

National Conference on Advances in Computing ( NCAC’13 ), 05-06 March 2013

We will present an analysis performed ondatasets of wine fermentations with the aim of 

 predicting problematic fermentations at the early stages

of the process. Clustering and supervised biclustering

techniques are employed for finding solutions to this

 problem.

A clustering technique might indeed define

clusters that are related to normal or problematic

Fermentations by exploiting the inherent characteristics

of the data. For this reason, a group of clusters can

actually be defined for each fermentation.Fermentations that share the same group most likely

share the same kind of characteristics. Depending on

the percentage of normal, slow and stuck fermentations

that are contained in the found  groups of clusters, a

score can be assigned to any  other fermentation that

happen to be in the same group and for which a

classification is not known. In these studies, the k-

means algorithm was employed for finding clusters of data points, where the number of clusters k was

arbitrarily set. In this work, we consider the training set

for selecting the features that allow for performingcorrect classifications of the fermentations. To this aim,

we search consistent biclusterings of the training set,

which are able to associate subgroups of features to

subgroups of samples of the dataset (each sample

represents one fermentation ) [3].In order to obtain a consistent biclustering of 

the training set, some features are removed from the

set. This is done by solving a combinatorial

optimization problem. Once a consistent biclustering is

found from a training set, the corresponding

relationship between samples and features can be

exploited for classification purposes. Given a testing

set related to the same problem, the classification of itssamples, by definition, is supposed to be not known.

However, the classification of its features is known, because it is exactly the same of the training set, and

this information can therefore be exploited for 

reconstructing the classification of the samples of the

testing set [1].

3. ComputingTable 1 shows some experiments in which the

combinatorial optimization problem has been solved in

order to find -consistent biclusterings of  A.  f ( x) is

the objective function of this optimization problem,

which is a counter a selected features, that must bemaximized. As expected,  f ( x) decreases when the

 parameter increases, because features subject to a

noise or to an error that is larger than are supposed

to be removed from the set. err  is the number of 

misclassifications on the testing set, when its samples

are classified accordingly to the found -consistent

 biclusterings. The biclustering with the largest

value is able to predict correctly 4 out of 8 samples.The corresponding biclustering related to the testing set

is in fact not consistent.

Future works will be mainly performed in the

following two directions. First, larger datasets of winefermentations need to be considered for obtaining

 better results. The fact that the considered testing set

contains information which is not included in thetraining set suggests that it does not contain all

necessary information for a correct definition of the

 biclusterings. Since industrial data are usually difficult

to obtain, one possibility is to produce these data in

laboratory, where small quantities of wine are

fermented into a controlled environment. Moreover, we

also plan to work on the formalization of the strategythat we proposed in this paper for validating the

obtained classifications [4].

4. Data Mining Process

Data mining process consists of three major steps. Of 

course, it all starts with a big pile of data. The first

 processing step is data preparation often referred to as

“scrubbing the data.” Data is selected, cleaned, and

 preprocessed under the guidance and knowledge of a

domain expert. The most time-consuming part of thedata mining process is preparing data for data mining.

This step can be streamlined in part if the data is

already in a database, data warehouse, or digital

library, although mining data across different

databases, for example, is still a challenge. Second, a

data mining algorithm is used to process the prepared

data, compressing and transforming it to make it easy

to identify any latent valuable nuggets of information.In the second step in data mining, once the data is

collected and preprocessed, the data mining algorithms

 perform the actual sifting process. Many techniques

have been used to perform the common data mining

activities of associations, clustering, classification,

modeling, sequential patterns, and time series

School of Computer Sciences, North Maharashtra University, Jalgaon, India

Page 3: winedm_formated

7/29/2019 winedm_formated

http://slidepdf.com/reader/full/winedmformated 3/3

National Conference on Advances in Computing ( NCAC’13 ), 05-06 March 2013

forecasting. These techniques range from statistics torough sets to neural networks. The third phase is the

data analysis phase, where the data mining output is

evaluated to see if additional domain knowledge was

discovered and to determine the relative importance of 

the facts generated by the mining algorithms.

The final step is the analysis of the data mining results

or output. In some cases the output is in a form that

makes it very easy to discern the valuable nuggets of 

information from the trivial or uninteresting facts. The

relationships are represented in if-then rules form. Withrules recast into textual form, the valuable information

is much easier to identify. In other cases, however, the

results will have to be analyzed either visually or 

through  another level of tools  to classify the nuggets

according to the predicted value [2].

5. ConclusionIn a data mining approach to this problem has been

discussed where the k-means algorithm was used. New

studies where Clustering and biclustering techniques

are employed for identifying the compounds of wine

that are most likely the cause of problematic

fermentations.

6. References

1. Mucherino, A., Papajorgji, P., Pardalos, P.M.,

2009. Data Mining in Agriculture, Springer 

Optimization and Its Applications.

2. Introduction to Data Mining and itsApplications,

S. Sumathi, S.N. Sivanandam

3. S. Busygin, O.A. Prokopyev, P.M. Pardalos,Feature Selection for Consistent Bi-clustering via Fractional 0-1 Programming, Journal of Combinatorial Optimization10, 7-21, 2005.

4. Urtubia, A., Perez-Correa, J.R., Soto,

A. Pszczolkowski P., 2007. Using Data

 Mining Techniques to Predict Industrial 

Wine Problem Fermentations, Food Control  18, 1512–1517.

School of Computer Sciences, North Maharashtra University, Jalgaon, India