winedm_formated

7/29/2019 winedm_formated

http://slidepdf.com/reader/full/winedmformated 1/3

National Conference on Advances in Computing ( NCAC’13 ), 05-06 March 2013

Wine Fermentation: By Using Data Mining Technique

Abstract

The recent applications of data mining techniques in the field of agriculture. In this paper, we consider the problem of

discovering problematic wine fermentations at the early stages of the process, by using sensor data information.

Problematic fermentations can cause losses to wine makers, because such fermentations could be too slow to provide the

final product, or they may even become stagnant.

Keywords: Wine fermentation, Clustering, Biclustering, Data mining.

__________________________________________________________________________________________________

1. Introduction

Wine is widely produced around the world.

Industrial production of wine is an important business

in many countries. For this reason, the study of the

fermentation process, which is able to transform grape

juice into the alcoholic beverage, is of increasinginterest in the field of agriculture. Problematic

fermentations, indeed, may cause losses to industries.

If a fermentation process is slower than usual, for

example, the final product is produced in a longer time.

Moreover, in the worst case, when the fermentation process gets stuck, a part of the production could be

completely spoiled.

Data mining is a field of operations research

that analyzes large databases with the aim of acquiring

novel knowledge. In recent years, data miningtechniques have specifically been applied to

agricultural problems in order to find important

information about the problem under study. In the case

of wine fermentations, a database of compound

measurements, taken at different times during the

fermentation process, can be exploited for extracting

information that can help the prediction of problematic

fermentations[1].

2. Wine fermentations

Wine is widely produced all over the world.

There exist different types of wine, which depend by

different factors, and especially by the origin of the

grapes that are employed in the production. A common

point for all wines is the fermentation process, in which

the sugar contained in the grapes is transformed in

alcohol. This is a very delicate process. When

producing wine industrially, indeed, large quantities of

wine may get spoiled because of a problematicfermentation process, causing losses to the industry. In

order to overcome to this issue, a prediction of the

problematic wine fermentations could be attempted, so

that an enologist can interfere with the process in time

for guaranteeing a good fermentation.In order to monitor wine fermentation processes,

metabolites such as, for example, glucose, fructose,

organic acids, glycerol and ethanol can be measured,

and the data obtained during the fermentation process

can be analyzed in order to obtain useful information.However, analyses are usually limited to data that are

obtained within the first 3 days of fermentation.

Naturally, this is done in order to learn about a possible

problematic fermentation at the beginning of the

process. Fermentations can be divided in 3 classes: the

first class contains normal fermentations, while the

second and the third one contain the problematic ones.

In particular, the second class contains fermentationswhich are slow, in the sense that they can bring the

wine to the end of the production, but in an amount of

time which is longer than usual. Finally, the third class

contains stuck fermentations, i.e. fermentations that

stop at a certain moment and they are not able to give

the final product.

School of Computer Sciences, North Maharashtra University, Jalgaon, India

Mr. R.D. Magare Mr. D.M. Rana Ms. V.R. Jadhav Mr. G.S. Ragde

Milind College Of Science Milind College Of Science Milind College Of Science Auranagabad [M.S]

Auranagabad [M.S],India Auranagabad [M.S],India Auranagabad [M.S],India

[email protected] [email protected] vab23jan2000rediffmail.com rajiv_magare@yaho

mailto:[email protected]








We will present an analysis performed ondatasets of wine fermentations with the aim of

predicting problematic fermentations at the early stages

of the process. Clustering and supervised biclustering

techniques are employed for finding solutions to this

problem.

A clustering technique might indeed define

clusters that are related to normal or problematic

Fermentations by exploiting the inherent characteristics

of the data. For this reason, a group of clusters can

actually be defined for each fermentation.Fermentations that share the same group most likely

share the same kind of characteristics. Depending on

the percentage of normal, slow and stuck fermentations

that are contained in the found groups of clusters, a

score can be assigned to any other fermentation that

happen to be in the same group and for which a

classification is not known. In these studies, the k-

means algorithm was employed for finding clusters of data points, where the number of clusters k was

arbitrarily set. In this work, we consider the training set

for selecting the features that allow for performingcorrect classifications of the fermentations. To this aim,

we search consistent biclusterings of the training set,

which are able to associate subgroups of features to

subgroups of samples of the dataset (each sample

represents one fermentation ) [3].In order to obtain a consistent biclustering of

the training set, some features are removed from the

set. This is done by solving a combinatorial

optimization problem. Once a consistent biclustering is

found from a training set, the corresponding

relationship between samples and features can be

exploited for classification purposes. Given a testing

set related to the same problem, the classification of itssamples, by definition, is supposed to be not known.

However, the classification of its features is known, because it is exactly the same of the training set, and

this information can therefore be exploited for

reconstructing the classification of the samples of the

testing set [1].

3. ComputingTable 1 shows some experiments in which the

combinatorial optimization problem has been solved in

order to find -consistent biclusterings of A. f ( x) is

the objective function of this optimization problem,

which is a counter a selected features, that must bemaximized. As expected, f ( x) decreases when the

parameter increases, because features subject to a

noise or to an error that is larger than are supposed

to be removed from the set. err is the number of

misclassifications on the testing set, when its samples

are classified accordingly to the found -consistent

biclusterings. The biclustering with the largest

value is able to predict correctly 4 out of 8 samples.The corresponding biclustering related to the testing set

is in fact not consistent.

Future works will be mainly performed in the

following two directions. First, larger datasets of winefermentations need to be considered for obtaining

better results. The fact that the considered testing set

contains information which is not included in thetraining set suggests that it does not contain all

necessary information for a correct definition of the

biclusterings. Since industrial data are usually difficult

to obtain, one possibility is to produce these data in

laboratory, where small quantities of wine are

fermented into a controlled environment. Moreover, we

also plan to work on the formalization of the strategythat we proposed in this paper for validating the

obtained classifications [4].

4. Data Mining Process

Data mining process consists of three major steps. Of

course, it all starts with a big pile of data. The first

processing step is data preparation often referred to as

“scrubbing the data.” Data is selected, cleaned, and

preprocessed under the guidance and knowledge of a

domain expert. The most time-consuming part of thedata mining process is preparing data for data mining.

This step can be streamlined in part if the data is

already in a database, data warehouse, or digital

library, although mining data across different

databases, for example, is still a challenge. Second, a

data mining algorithm is used to process the prepared

data, compressing and transforming it to make it easy

to identify any latent valuable nuggets of information.In the second step in data mining, once the data is

collected and preprocessed, the data mining algorithms

perform the actual sifting process. Many techniques

have been used to perform the common data mining

activities of associations, clustering, classification,

modeling, sequential patterns, and time series





forecasting. These techniques range from statistics torough sets to neural networks. The third phase is the

data analysis phase, where the data mining output is

evaluated to see if additional domain knowledge was

discovered and to determine the relative importance of

the facts generated by the mining algorithms.

The final step is the analysis of the data mining results

or output. In some cases the output is in a form that

makes it very easy to discern the valuable nuggets of

information from the trivial or uninteresting facts. The

relationships are represented in if-then rules form. Withrules recast into textual form, the valuable information

is much easier to identify. In other cases, however, the

results will have to be analyzed either visually or

through another level of tools to classify the nuggets

according to the predicted value [2].

5. ConclusionIn a data mining approach to this problem has been

discussed where the k-means algorithm was used. New

studies where Clustering and biclustering techniques

are employed for identifying the compounds of wine

that are most likely the cause of problematic

fermentations.

6. References

1. Mucherino, A., Papajorgji, P., Pardalos, P.M.,

2009. Data Mining in Agriculture, Springer

Optimization and Its Applications.

2. Introduction to Data Mining and itsApplications,

S. Sumathi, S.N. Sivanandam

3. S. Busygin, O.A. Prokopyev, P.M. Pardalos,Feature Selection for Consistent Bi-clustering via Fractional 0-1 Programming, Journal of Combinatorial Optimization10, 7-21, 2005.

4. Urtubia, A., Perez-Correa, J.R., Soto,

A. Pszczolkowski P., 2007. Using Data

Mining Techniques to Predict Industrial

Wine Problem Fermentations, Food Control 18, 1512–1517.


winedm_formated

Documents