winedm_formated
TRANSCRIPT
7/29/2019 winedm_formated
http://slidepdf.com/reader/full/winedmformated 1/3
National Conference on Advances in Computing ( NCAC’13 ), 05-06 March 2013
Wine Fermentation: By Using Data Mining Technique
Abstract
The recent applications of data mining techniques in the field of agriculture. In this paper, we consider the problem of
discovering problematic wine fermentations at the early stages of the process, by using sensor data information.
Problematic fermentations can cause losses to wine makers, because such fermentations could be too slow to provide the
final product, or they may even become stagnant.
Keywords: Wine fermentation, Clustering, Biclustering, Data mining.
__________________________________________________________________________________________________
1. Introduction
Wine is widely produced around the world.
Industrial production of wine is an important business
in many countries. For this reason, the study of the
fermentation process, which is able to transform grape
juice into the alcoholic beverage, is of increasinginterest in the field of agriculture. Problematic
fermentations, indeed, may cause losses to industries.
If a fermentation process is slower than usual, for
example, the final product is produced in a longer time.
Moreover, in the worst case, when the fermentation process gets stuck, a part of the production could be
completely spoiled.
Data mining is a field of operations research
that analyzes large databases with the aim of acquiring
novel knowledge. In recent years, data miningtechniques have specifically been applied to
agricultural problems in order to find important
information about the problem under study. In the case
of wine fermentations, a database of compound
measurements, taken at different times during the
fermentation process, can be exploited for extracting
information that can help the prediction of problematic
fermentations[1].
2. Wine fermentations
Wine is widely produced all over the world.
There exist different types of wine, which depend by
different factors, and especially by the origin of the
grapes that are employed in the production. A common
point for all wines is the fermentation process, in which
the sugar contained in the grapes is transformed in
alcohol. This is a very delicate process. When
producing wine industrially, indeed, large quantities of
wine may get spoiled because of a problematicfermentation process, causing losses to the industry. In
order to overcome to this issue, a prediction of the
problematic wine fermentations could be attempted, so
that an enologist can interfere with the process in time
for guaranteeing a good fermentation.In order to monitor wine fermentation processes,
metabolites such as, for example, glucose, fructose,
organic acids, glycerol and ethanol can be measured,
and the data obtained during the fermentation process
can be analyzed in order to obtain useful information.However, analyses are usually limited to data that are
obtained within the first 3 days of fermentation.
Naturally, this is done in order to learn about a possible
problematic fermentation at the beginning of the
process. Fermentations can be divided in 3 classes: the
first class contains normal fermentations, while the
second and the third one contain the problematic ones.
In particular, the second class contains fermentationswhich are slow, in the sense that they can bring the
wine to the end of the production, but in an amount of
time which is longer than usual. Finally, the third class
contains stuck fermentations, i.e. fermentations that
stop at a certain moment and they are not able to give
the final product.
School of Computer Sciences, North Maharashtra University, Jalgaon, India
Mr. R.D. Magare Mr. D.M. Rana Ms. V.R. Jadhav Mr. G.S. Ragde
Milind College Of Science Milind College Of Science Milind College Of Science Auranagabad [M.S]
Auranagabad [M.S],India Auranagabad [M.S],India Auranagabad [M.S],India
[email protected] [email protected] vab23jan2000rediffmail.com rajiv_magare@yaho
7/29/2019 winedm_formated
http://slidepdf.com/reader/full/winedmformated 2/3
National Conference on Advances in Computing ( NCAC’13 ), 05-06 March 2013
We will present an analysis performed ondatasets of wine fermentations with the aim of
predicting problematic fermentations at the early stages
of the process. Clustering and supervised biclustering
techniques are employed for finding solutions to this
problem.
A clustering technique might indeed define
clusters that are related to normal or problematic
Fermentations by exploiting the inherent characteristics
of the data. For this reason, a group of clusters can
actually be defined for each fermentation.Fermentations that share the same group most likely
share the same kind of characteristics. Depending on
the percentage of normal, slow and stuck fermentations
that are contained in the found groups of clusters, a
score can be assigned to any other fermentation that
happen to be in the same group and for which a
classification is not known. In these studies, the k-
means algorithm was employed for finding clusters of data points, where the number of clusters k was
arbitrarily set. In this work, we consider the training set
for selecting the features that allow for performingcorrect classifications of the fermentations. To this aim,
we search consistent biclusterings of the training set,
which are able to associate subgroups of features to
subgroups of samples of the dataset (each sample
represents one fermentation ) [3].In order to obtain a consistent biclustering of
the training set, some features are removed from the
set. This is done by solving a combinatorial
optimization problem. Once a consistent biclustering is
found from a training set, the corresponding
relationship between samples and features can be
exploited for classification purposes. Given a testing
set related to the same problem, the classification of itssamples, by definition, is supposed to be not known.
However, the classification of its features is known, because it is exactly the same of the training set, and
this information can therefore be exploited for
reconstructing the classification of the samples of the
testing set [1].
3. ComputingTable 1 shows some experiments in which the
combinatorial optimization problem has been solved in
order to find -consistent biclusterings of A. f ( x) is
the objective function of this optimization problem,
which is a counter a selected features, that must bemaximized. As expected, f ( x) decreases when the
parameter increases, because features subject to a
noise or to an error that is larger than are supposed
to be removed from the set. err is the number of
misclassifications on the testing set, when its samples
are classified accordingly to the found -consistent
biclusterings. The biclustering with the largest
value is able to predict correctly 4 out of 8 samples.The corresponding biclustering related to the testing set
is in fact not consistent.
Future works will be mainly performed in the
following two directions. First, larger datasets of winefermentations need to be considered for obtaining
better results. The fact that the considered testing set
contains information which is not included in thetraining set suggests that it does not contain all
necessary information for a correct definition of the
biclusterings. Since industrial data are usually difficult
to obtain, one possibility is to produce these data in
laboratory, where small quantities of wine are
fermented into a controlled environment. Moreover, we
also plan to work on the formalization of the strategythat we proposed in this paper for validating the
obtained classifications [4].
4. Data Mining Process
Data mining process consists of three major steps. Of
course, it all starts with a big pile of data. The first
processing step is data preparation often referred to as
“scrubbing the data.” Data is selected, cleaned, and
preprocessed under the guidance and knowledge of a
domain expert. The most time-consuming part of thedata mining process is preparing data for data mining.
This step can be streamlined in part if the data is
already in a database, data warehouse, or digital
library, although mining data across different
databases, for example, is still a challenge. Second, a
data mining algorithm is used to process the prepared
data, compressing and transforming it to make it easy
to identify any latent valuable nuggets of information.In the second step in data mining, once the data is
collected and preprocessed, the data mining algorithms
perform the actual sifting process. Many techniques
have been used to perform the common data mining
activities of associations, clustering, classification,
modeling, sequential patterns, and time series
School of Computer Sciences, North Maharashtra University, Jalgaon, India
7/29/2019 winedm_formated
http://slidepdf.com/reader/full/winedmformated 3/3
National Conference on Advances in Computing ( NCAC’13 ), 05-06 March 2013
forecasting. These techniques range from statistics torough sets to neural networks. The third phase is the
data analysis phase, where the data mining output is
evaluated to see if additional domain knowledge was
discovered and to determine the relative importance of
the facts generated by the mining algorithms.
The final step is the analysis of the data mining results
or output. In some cases the output is in a form that
makes it very easy to discern the valuable nuggets of
information from the trivial or uninteresting facts. The
relationships are represented in if-then rules form. Withrules recast into textual form, the valuable information
is much easier to identify. In other cases, however, the
results will have to be analyzed either visually or
through another level of tools to classify the nuggets
according to the predicted value [2].
5. ConclusionIn a data mining approach to this problem has been
discussed where the k-means algorithm was used. New
studies where Clustering and biclustering techniques
are employed for identifying the compounds of wine
that are most likely the cause of problematic
fermentations.
6. References
1. Mucherino, A., Papajorgji, P., Pardalos, P.M.,
2009. Data Mining in Agriculture, Springer
Optimization and Its Applications.
2. Introduction to Data Mining and itsApplications,
S. Sumathi, S.N. Sivanandam
3. S. Busygin, O.A. Prokopyev, P.M. Pardalos,Feature Selection for Consistent Bi-clustering via Fractional 0-1 Programming, Journal of Combinatorial Optimization10, 7-21, 2005.
4. Urtubia, A., Perez-Correa, J.R., Soto,
A. Pszczolkowski P., 2007. Using Data
Mining Techniques to Predict Industrial
Wine Problem Fermentations, Food Control 18, 1512–1517.
School of Computer Sciences, North Maharashtra University, Jalgaon, India