literature surveylib.unipune.ac.in:8080/.../123456789/8434/11/11_chapter2.pdf · 2018. 10. 19. ·...
Post on 15-Jun-2021
2 Views
Preview:
TRANSCRIPT
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
Chapter 2
LITERATURE SURVEY
In literature most of the authors have used different machine learning algorithms to
analyse time changing big data. In [12], author have put forth algorithm based on neural
network to analyse behaviours of customers using social media data set. In [13], author
has explored a way to find link between to users social media like twitter or facebook
using machine learning algorithm. In [14, 15, 16] authors have thrown light on big data
architecture, challenges etc. In work done by Isvani Frıas-Blanco, Jose del Campo-
Avila[40], moving average method is suggested which is used for Online and Non-
Parametric Drift Detection Methods Based on Hoeffding’s Bounds. Short-Term Load
Forecasting Based on Big Data Technologies by Pei Zhang [5], explain decision tree
framework to forecast short term load like electricity. Petra Perner has claimed that
decision tree induction is suitable than traditional methods for big data mining in his
paper “Decision Tree Induction Methods and their Applications to Big Data”. Also there
is lot of work available on social media data analysis. Characteristics of social activities
and patterns of communication in Twitter are studied by Naaman et al. [31]. Davidov et
al. [32] have used hash tags and other sentiment labels for sentiment analysis. An
effective and efficient followee recommender system built by Hannon et al [33].
Methods to recommend influential users proposed by Kwak et al [34]. Twitter use within
and across organizations and geographic markets comparison is given by Burton et al.
[35]. Kim et al. [36], have explored how to maximize the outcomes of SMM through
Word-of-Mouth (WOM) marketing by identifying the core group of users. On distributed
implementation of decision trees some work is available. In [27], author has defined way
to extract knowledge using decision Tree and Naïve Bayes Algorithm for Classification
and Generation of Actionable Knowledge for Direct Marketing. On distributed
implementation also there is a literature available. Distributed implementation of support
vector machines is proposed by [21].In [39], author has proposed map reduce
implementation of C4.5 decision tree algorithm.
Motivation behind choosing unstructured data is shown below in a graph. We can see day
by day usage of unstructured data is increasing over structured data.
Ph. D Thesis Computer Engineering 11
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
Figure 2.1: Motivational Graph for Unstructured Data Usage [Source: IDC’s Digital Universe Study]
In the beginning different Decision Tree Learning was used to analyse the big data. In
work done by Hall. et al. [10], there is defined an approach for forming learning rules of
the large set of training data. The approach is to have a single decision system generated
from a large and independent n subset of data. Whereas Patil et al, uses a hybrid approach
combining both genetic algorithm and decision tree to create an optimized decision tree
thus improving efficiency and performance of computation. In Literature many authors
have tried to exploit the machine learning algorithms for different structured and
unstructured data. It is given below in the table 2.1. In this table we can see maximum
literature is available for image data which is structured.
Ph. D Thesis Computer Engineering 12
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
2.1 Literature Survey on Methodologies Used for Social Data Mining Table 2.1: Comparison of Social Data Mining Techniques
Sr. No
Title Technique Used Outcome
1. On Distributed Fuzzy
Decision Trees for Big
Data(IEEE 2017)
Distributed Fuzzy
Decision Tree Algorithm
based on map-reduce
architecture is proposed
Implementation is
based MLib library
2 Sentiment Analysis of Top
Colleges in India
Using Twitter
Data(IEEE2016)
Naïve Bayes and Support
Vector Machine and an
Artificial Neural Network
model:
highlights a
comparison between
the results obtained
by exploiting the
following machine
learning algorithms:
Naïve Bayes and
Support Vector
Machine and an
Artificial Neural
Network model:
3 Mining Social Media Data
for Understanding Students’
Learning Experience(IEEE
2015)
SVM multi label classifier Focused on
engineering students’
Twitter posts to
understand issues
and problems in their
educational
experiences
4 Smart text-classification of
user-generated data in
educational social
networks(IEEE 2014)
partial-supervised learning
for Hierarchical Dirichlet
Process (HDP) for text
classification with
inherent hierarchical
More flexible way
and better guide for
the model learning
from the unlabelled
documents
Ph. D Thesis Computer Engineering 13
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
structure in education
5 Network-Based Modelling
and Intelligent Data Mining
of Social Media for
Improving Care(IEEE
2015)
Network-based approach
for modelling users' forum
interactions and employed
a network partitioning
method based on
optimizing a stability
quality measure.
Used to determine
consumer opinion
and identify
influential users
within the retrieved
modules using
information derived
from both word-
frequency data and
network-based
properties
6 A novel data-mining
approach leveraging social
media to monitor and
respond to outcomes of
diabetes drugs and
treatment(IEEE 2013)
A novel data-mining
method was developed to
gauge the experiences of
medical devices and drugs
by patients with diabetes
mellitus
Rapid data
collection, feedback,
and analysis that
would enable
improved outcomes
and solutions for
public health.
2.2 Literature Survey on Methodologies Used for Big Data Mining
• In [4], Drift Detection Methods are proposed for online and non-parametric data using on
Hoeffding’s Bounds. As well as Moving average method is suggested to identify drift of
data. author proposed a methods to analysis the performance for learning algorithm
during data stream classification. Concept drift occur in order to handle this two methods
used first moving averages -this used for detecting sudden changes and second is
weighted moving averages - this used for detecting slow changes. The main advantage of
proposed method is that it is independent from the learning algorithm, and used with any
Ph. D Thesis Computer Engineering 14
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
classifier in order to track concept drift. It used a Naive Bayes classifier and a Perceptron
method.
• In [5], Pei Zhang has implemented Short-Term Load Forecasting model based on Big
Data Technologies. Decision tree framework is proposed to forecast short term load like
electricity.
• In [6], Petra Perner has used Decision Tree Induction Methods and their Applications to
Big Data. Author has explained how decision tree induction is suitable than traditional
methods.
• A paper on Learning with Drift Detection by J Gama, P Medas, G Castillo and Pedro
Rodrigues (2004) [55] proposed technique that controls the streaming data and error that
occurs during classification algorithm. Concept drift handled by time windows. Two
approaches used during classification: 1st learn model at time interval without
considering whether change is happen or not. 2nd first detect change in data stream then
adapt model. If error rate increases, then concept drift occurs. In this two register is used
to track information of error rate first Smin and secondly Pmin .This two are used to find
warning level and alarm level.
• A paper on Early Drift Detection Method by M Baena-Garcia, J Campo-Avila, R Fidalgo,
A Bifet, R Gavalda and R Morales-Bueno (2006), [57] proposed an EDDM method
which used to find concept drift and suitable to detect slow and gradual changes even
when that changes is very slow. Distance between two classification errors is used to find
drift. The drift detection method handle noisy dataset and classification algorithm is not
designed with that technique. This method implements with any classification algorithm
first using it as wrapper of batch learning algorithm and second is implementing with
inside online algorithm. The distance between classification errors are used for detecting
concept drift.
• A paper on Learning from time changing data with Adaptive Windowing by Albert Bifet,
Ricard Gavalda (2007), [60] presents a method to handle concept drift when learning
from time evolving data. In this sliding window technique is used and window size is not
fixed. The window size increases if data is stationary to achieve greater accuracy and
when drift occur window size shrink to remove old data from window. They also propose
ADWIN2 algorithm which is time and memory efficient. Then it combines with nave
Ph. D Thesis Computer Engineering 15
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
Bayes predictor to maintain up to date result from classification. An advantage is that
window size automatically grows and shrink asper rate of change observed.
Disadvantages ADWIN is costly because it analysis all the sub window of the current
window for suitable cut. Another is that it deletes one element each time when it detects
drift in data.
• A paper on Exponentially weighted moving average charts for detecting concept drift
by G. J. Ross, N. M. Adams, D. Tasoulis, D. Hand (2012), [59] presents EWMA method
to monitor error rate of an classifier. It used single pass and computationally efficient
algorithm. It also controls misclassified instance rate.
• A paper on Active Learning with Drifting Streaming Data by Indre Zliobaite, Albert
Bifet, Bernhard Pfahringer, and Georey Holmes (2014), [61] presents a background for
data stream classification and presents active learning methods for processing dynamic
data. Technique to handle and allocate the labelling cost above time, to controls the
labelling for correct classifiers and to find drift. Author stated that analysis of results
shows that the methods effective when the classification cost is too small. The advantages
of that this strategy provides base for incremental active learning and also works on
uncertainty.
Table 2.2: Comparison of Big Data Mining Techniques
Sr.
No.
Publication
/Author
Paper Title Algorithm/Tech.
Pros Cons
01 F. Blanco, J.
C.A, G R.
Jimenez, R. M.
Bueno, and
Y C Mota. 2015
Online and Non-
Parametric Drift
Detection Methods
Based
on Hoeffdings Bounds
HDDM Accurately
find out
drifted data
and update
model.
Issues
related with
speed and
accuracy.
02 Yanhuang
Jiang, Qiangli
Zhao, Yutong
Lu1 2014
Ensemble based Data
Stream Mining with
Recalling and
Forgetting
Mechanisms
MAE Ensemble
pruning is
used
as a
recalling
mechanism
Need more
experiment
s to
optimize
the values
of the
Ph. D Thesis Computer Engineering 16
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
to select
useful
component
Classifiers
for each
incoming
data chunk.
parameters
in MAE
algorithm,
such as
memory ca-
pacity,
forgetting
factor.
03 G. Ross, N.
Adams, D.
Tasoulis, and D.
Hand
2012
Exponentially
weighted
moving average charts
for
detecting concept drift
EWMA Points from
the data
stream
should be
pro-
cessed only
once and
discarded
rather than
stored in
memory.
The time
required to
pro-
cess each
point
should be
small and
constant
over
time.
04 F. cao, J. Liang,
L. Bai, X.
Zhao, C. Dang.
2010
A framework for
clustering
categorical Time-
evolving
data.
clusterin
g
algorith
m
It is
effective
for large
dataset. It
not only
accurately
detects the
drifting
concepts
but also
attains
clustering
Compared
with other
algorithm
this
algorithm
needs
fewer
parameters,
which is
favourable
for specific
application.
Ph. D Thesis Computer Engineering 17
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
results of
better
quality
05 L.L.Minku,
A.P.White, X.
yao. 2010
The impact of
diversity on
online Ensemble
Learning
in the presence of
concept
Drift.
Diversity
on
Ensembl
e
learning
It used to
reduce the
initial
increase in
the error
caused by a
drift.
To recover
from the
drift
and
converge to
the new
concept
additional
mechanism
required.
06 A. Bifet and R.
Gavald
SIAM Int. Conf.
Data
Min., 2007
Learning from time-
changing data with
adaptive windowing
ADWIN In this use
sliding
win-
dows and
window
will
grow
automatical
ly when
the data is
stationary,
for
greater
accuracy,
and will
shrink
automatical
ly when
It is
inefficient
in time and
memory.
Expensive
be-
cause it
checks all
large
enough sub
windows of
the
current
window for
possible
cut.
Ph. D Thesis Computer Engineering 18
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
change is
taking
place, to
discard
stale data.
07 M. Baena, J. del
Campo,
R. Fidalgo, A.
Bifet, R.
Gavalda, and
R.Morales
2006
Early drift detection
method
EDDM It works
with slow
gradual
changes. It
uses
the distance
between
classificati
on errors to
detect
changes.
Do not
provide
rigorous
guarantees
of
performanc
e.
08 J. Gama, P.
Medas, G.
Castillo, and P.
Rodrigues
2004
Learning with drift
detection
DDM It control
online error
rate of
learning
algorithm.
Sudden
changes are
detected
easily.
Not dealing
with slow
gradual
changes.
Do not
provide
rigorous
guarantees
of
performanc
e.
Ph. D Thesis Computer Engineering 19
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
2.3 Literature Survey on Different Machine Learning (ML) Techniques
Used for Sentiment Analysis In paper [68] authors developed a workflow to integrate both qualitative analysis and
large-scale data mining techniques. By focusing on engineering students’ twitter posts to
understand issues and problems in their educational experiences. Authors first conducted
a qualitative analysis on samples taken from about 25,000 tweets related to engineering
students’ college life. They found engineering students encounter problems such as heavy
study load, lack of social engagement, and sleep deprivation. Based on these results,
authors implemented a multi-label classification algorithm to classify tweets reflecting
students’ problems.
In paper [69] authors proposed partial supervised learning for HDP which enables HDP
to make use of partial known knowledge to guide the model learning process. This partial
learning enables HDP which is aimed at solving clustering problems to tackle
classification problems and meanwhile partial supervised learning helps improve the
classification accuracy. They applied the proposed partial supervised learning for HDP to
classify posts (micro-blogs) in an educational environment.
In paper [70] authors proposes a novel application of text categorization to identify
relevant and irrelevant micro-blogging questions asked in a classroom. Several modelling
approaches and several weighting or pre-processing configurations are studied for this
application through extensive experiments.
In paper [71] authors propose a two-step analysis framework that focuses on positive and
negative sentiment, as well as the side effects of treatment, in user’s forum posts, and
identifies user communities (modules) and influential users for the purpose of
ascertaining user opinion of cancer treatment. They used a Self-Organizing Map to
analyse word frequency data derived from user’s forum posts. They introduced a novel
network-based approach for modelling users forum interactions and employed a network
partitioning method based on optimizing a stability quality measure.
In paper [72] authors explored privacy concerns related to mining social media networks.
Specifically, author looked at the issue through a crime incident mining context, looking
at matters related to social media data ownership, legal protection of personal
information, methods that may be used to anonyms users as well as some ethical
Ph. D Thesis Computer Engineering 20
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
dilemmas when processing identifying information especially for an application such as a
crime incident reporting tool.
In paper [73] authors implemented novel data-mining method was developed to gauge
the experiences of medical devices and drugs by patients with diabetes mellitus. Self-
organizing maps were used to analyse forum posts numerically to better understand user
opinion of medical devices and drugs. The end-result is a word list compilation that
correlates certain positive and negative word cluster groups with medical drugs and
devices. The implication of this novel data-mining method could open new avenues of
research into rapid data collection, feedback, and analysis that would enable improved
outcomes and solutions for public health.
In paper [74] authors presented a scalable user-profiling solution that extracts terms and
concepts-based user profiles from social media conversation data, implemented using the
Apache Hadoop framework. Authors also discussed the challenges and presented some
evaluation. In addition, they wish to extend the profile to include other data sources, both
structured data (e.g., transaction logs) and unstructured data (e.g., mobile browsing logs)
and thus be able to verify and generate more robust profiles.
Table 2.3: Comparison of Sentiment Analysis Techniques Sr.
No
Title Technique Used Outcome
01 Mining Social Media Data
for Understanding
Students’ Learning
Experience
Mining Social Media Data
for Understanding Students’
Learning Experience
Mining Social Media
Data for Understanding
Students’ Learning
Experience
02 Smart text-classification of
user-generated data in
educational social
networks
partial-supervised learning
for Hierarchical Dirichlet
Process
(HDP) for text classification
with inherent hierarchical
structure in education
More flexible way and
better guide for the
model learning from the
unlabelled documents
Ph. D Thesis Computer Engineering 21
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
03 Network-Based Modelling
and Intelligent Data
Mining of Social Media
Improving Care
Network-based approach for
modelling users’ interactions
and employed a network
partitioning based on
optimizing a stability quality
measure.
Used to determine
consumer opinion and
identify influential users
within the retrieved
information derived
from both word-
frequency data and
network-based
Properties
04 A novel data-mining
approach leveraging social
media to monitor and
respond to outcomes of
diabetes drugs and
treatment
A novel data-mining method
was developed to gauge the
experiences of medical
devices and drugs by
patients with diabetes
mellitus
Rapid data collection,
feedback, and analysis
that would enable
improved outcomes and
solutions for public
health.
2.4 Literature Survey on Different Machine Learning (ML) Techniques
Used for Distributed Data Mining In [50] paper “A novel algorithm for distributed data mining in HDFS”, author has
explained named Association rule mining based on Hadoop (ARMH) has been proposed
to utilize the clusters effectively and mining frequent pattern from large databases.
Hadoop distributed framework helps in managing the workload among the clusters. The
ARMH was implemented in Hadoop using Map Reduce programming paradigm.
This paper [51] has analysed the drawback of existing DDM systems and put forward a
service-oriented architecture of DDM on the grid. The mining algorithm and distributed
data sets in the proposed framework are abstracted as Web service resource (WS-
resource), which can cooperate to perform DDM as required dynamically. Finally, a grid
based on local area network was built with Globus Toolkit 4.0Beta and the algorithm of
WS-resource, dataset WS-resource for data mining on the grid are developed.
Ph. D Thesis Computer Engineering 22
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
In paper [52], “Distributed data mining: a survey”, Author has surveyed the-state-of-the-
art algorithms and applications in distributed data mining and discuss the future research
opportunities.
In paper [53],”Study of Distributed Data Mining”, Distributed Data Mining algorithms,
methods and trends to discover knowledge from distributed data in an effective and
efficient way. Author has explained DDM (Distributed Data Mining) based Multi Agent
System and parallel data mining techniques.
In paper [54], “Privacy-Preserving Distributed Data Mining Techniques: A Survey”,
author has provided extensive survey on different privacy preserving data mining
methods and analyses the representative techniques for privacy preserving data mining.
We majorly discuss the distributed privacy preservation techniques which provide secure
solutions using primitive operations of cryptographic protocols such as secure multi-party
computation (SMPC), secret sharing schemes (SSS) and homomorphic encryption (HC)
2.5 Literature Survey on Different Machine Learning (ML) Techniques
Used for Data Mining Table 2.4: Comparison of different ML algorithms available in literature
Sr
no
Title Year methodol
ogy
Applic
ation
Classific
ation
Or
Predicti
on
Advantages Disadvant
ages
1. Robust and Effective Component-based Banknote Recognition by SURF Features
2011 Speeded Up Robust Features (SURF).
Banknote recognition for blind or visually impaired people
Classification
100% recognition rate on challenging dataset & faster
-
2. Automatic detection and classification Of objects with
2014 Support vector machine(SVM), Neural
Automatic selling of goods,
Classification
SVMs deliver a unique solution, gains
Limitation of the SV approach lies in choice of
Ph. D Thesis Computer Engineering 23
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
optimized search space
network. vending Machines
Stability & flexibility. It produce accurate and robust classification results.
the kernel, Speed & size.
3. A MapReduce based distributed SVM algorithm for binary classification
2013 SVM in cloud computing environment.
- prediction
Use for training big datasets.
-
4. Forgery Detection and Value Identification of Euro Banknotes
2013 Use of both hardware & software modules. Proposed approach: 1.Calibration 2.training 3.use
To detect counterfeits of Euro banknotes.
Classification
Robust to changes in environmental Lighting & non-uniformity of the infrared light.
-
5.
Banknote recognition using inductive learning
2013 RULES-3 inductive learning
Petrol station automats , Parking automats , Currency exchange machines
classification
Saves memory space. Decision can be made in a short time. Easy & cheap to develop the system
Sometimes frustrating, May reach false conclusions.
6 Employing multiple-kernel support vector
2011 Multiple kernel support vector machine
Automatic good selling machin
Classification
Suppose more counterfeitpreventive features are
The performance of SVMs largely
Ph. D Thesis Computer Engineering 24
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
machines for counterfeit banknote recognition
e Vending machines Automatic monetary transaction machine
added to the banknotes. Our system can still be capable of distinguishing between genuine and forged banknotes without any modification
depends on the choice of kernels.
7 ANN based currency recognition system using compressed gray scale and application for sriodell currency notes
2008 SLCRec ATM Machine
classification
Capability of separating classes properly in varying image conditions,better robustness for noise
-
8 Using hidden marcov model for paper currency recognition
2013 Hidden marcov model(HMM).
ATMs and vending machine
Accuracy and robustness
Uses size and color properties which are same for many countries
9 Recognition on Indian currency based on LBP
2012 Local binary partition(LBP)
ATMs and vending machine
classification
Simplicity and high speed,high recognition rate,good performance for low noise,low computational complexity
Cannot detect counterfeit banknotes
10 Recognition of Mexican banknote s
2012 Local binary partition(
In countries
classification
High recognition performanc
Cost is
high,canno
Ph. D Thesis Computer Engineering 25
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
via their color and texture features
LBP) and RGB space,LVQ network as classifier
where colors are employed to identify different denominations
e,less processing time,invariant to image rotation,
t detect
counterfeit
banknotes
11 Support Vector Machine-Based Classification Scheme for Myoelectric Control Applied to Upper Limb
2008 myoelectric control, support vector machine (SVM),MES
hands-free human–machine interfaces for disabled people
classification
It demonstrates exceptional accuracy, robust performance, and low computational load
does not
invalidate
the
achieved
conclusion
s in the
design of
pattern-
recognitio
nbased
myoelectri
c control
12 Constructing L2-SVM-Based Fuzzy Classifiers in High-Dimensional Space With Automatic Model Selection and Fuzzy Rule Ranking
2007 L2-SVM Image and video classification
classification
automatically choose the number of fuzzy rules and identify the important input features at the same time. More reasonable rule ranking scheme
High
dimention
al
problem,
does not
select
variable
automatica
lly
13 Cutting Plane Training for
2011 SVM Text and hyperte
classification
asymptotic time complexity
Ph. D Thesis Computer Engineering 26
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
Linear Support Vector Machines
xt categoritioin, Handwriting recognition
scales more reasonable, reduces training time
14 Euro banknote recognition system using a tree layer modelled using RBF networks
2003 Three layered modelled using neural network Radial basis function network
ATM classification
Good performance of both accepting valid banknotes and rejecting invalid data Performance of validation part without using IR images
The size
of three
layered
modelled
using
becomes
smaller
than RBF
network
by
reducing
redundant
input
neuron
15 Recognition System for Pakistani Paper Currency
2013 Euclidean distance classifier, Weighted Euclidean distance classifier , knn classify
ATM Machines Auto-seller machines Bank money-counters
classification
low cost machine working efficiently
16 Automatic recognition of serial numbers in bank notes
2014 Feature extraction methods gradient direction feature,
Counterfeit recognition of RMB (renmin
classification
applies the cascade schemes to the context of rejection, which could
Only used
to
recognize
serial
Ph. D Thesis Computer Engineering 27
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
Gabor feature, and CNN trainable feature),classifiers (SVM, LDF, MQDF, and CNN
bi bank note, the paper currency used in China)
dramatically reduce the number of rejected samples while achieving 100% reliability. Highest test accuracy
number of
RMB
17 Location based Recordation System
2013 Method was proposed to predict preferred restaurants based on weather and demographics of customers like age, mood etc. Bayesian network was used.
Social media
Prediction
Along with user’s biographic data location data is also used.
The data is
collected
manually
by
tracking
seven
volunteers
in real
world.
18 Analysis of location based social media data
2013 How radius of gyration varies according to various city demographics like population ,household income
social media
Classification
19 Predicting customers purchase behaviour
2013 A multiclass classifier was used
social media
Prediction
Only reflected relationship between
Other
affecting
factors
Ph. D Thesis Computer Engineering 28
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
user’s profile and user’s purchase behaviour.
like
comments
about the
product
were not
considered
.
20 Employing multiple kernel Support vector machines for counterfeit banknote recognition
Chi-Yuan Yeh, Wen-Pin Su, Shie-Jue Lee
Each banknote is divided into partitions and the luminance histograms of the partitions are taken as the input of the system. Each partition is associated with its own kernels. Linearly weighted combination is adopted to combine multiple kernels into a combined matrix. Optimal
Banking
Classification
Two strategies are adopted to reduce the amount of time and space required by the SDP method. One strategy assumes the non-negativity of the kernel weights, and the other one is to set the sum of the weights to be unity
proposed
approach
outperfor
ms single-
kernel
SVMs,
standard
SVMs
with SDP,
and
multiple-
SVM
classifiers.
Ph. D Thesis Computer Engineering 29
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
weights with kernel matrices in the combination are obtained through semi-definite programming (SDP) learning
21 Using Hidden Markov Models for paper currency recognition
Hamid Hassanpour , Payam M. Farahabadi
By employing HMM, the texture characteristics of paper currencies are modelled as a random process. A similarity measure is used for the classification in the proposed algorithm
Banking
Classification
the proposed algorithm can be used for distinguishing paper currency from different countries.
Only texture characteristics are considered
Ph. D Thesis Computer Engineering 30
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
The motivational graph given below suggests need of research for social network big
data.
Figure 2.2: Motivational Graph for Approach Selection
In this graph we can find that there is less work available on parallel implementation and
social network data. Fewer papers are suggesting ways to decrease the time required.
Applying popular machine learning algorithms to large amounts of data raised new
challenges for the ML practitioners. A traditional ML library does not support well
processing of huge datasets, so that new approaches were needed. Parallelization using
modern parallel computing frameworks, such as MapReduce, CUDA, or Dryad gained in
popularity and acceptance, resulting in new ML libraries developed on top of these
frameworks.
Social Motivation
The predictive analysis of Big Data will help business analytics to understand market
trends, understand customer behaviour, and take feedback on different products and
services also for friend recommendation or for link prediction.
0
5
10
15
20
25
30
No of Papers
No of Papers
1,15
,16,
17,1
8,19
21,2
6,27
,28,
29,3
0,34
10,1
1,13
,14,
15,1
6,17
10,1
1,26
7,8,
9,23
,24,
25
12,1
3,14
,15,
16,1
7,29
,31,
34
12,1
3,14
,15,
16,1
7,29
,30,
31,3
4
1,11
,12,
13,1
8,19
19,2
0,21
,22,
28,2
4,25
,31
3,4,
5,32
,33
12,1
3,14
,15,
16,1
7,29
,30,
31,3
4
7,8,
9,23
,24,
25
Ph. D Thesis Computer Engineering 31
Distributed Algorithm for Pattern Classification and Prediction of Big Data by Using Machine Learning Techniques
Technical Motivation
Very less work is available on distributed big data analysis which will train and handle
large amount of data stream. Distributed implementation will help to reduce the time
required for classification and prediction [1]. Previous literature has not considered data
clean up and pre-processing techniques. Most social network work considered
bibliographic information [1, 13]. Location specific information is not considered.
Educational Motivation
Prediction and classification of big data on social network is new area of research. It will
help to gather business intelligence information from social media data. This study will
help to enhance the knowledge of distributed machine learning domain. It will add values
to existing machine learning algorithms which will work efficiently for big data.
2.6 Summary
In this chapter, literature survey on different machine learning techniques is done. Many
authors have used different machine learning algorithms to analyses time changing big
data. Some authors have used different machine learning algorithms for doing sematic
analysis. For concept drift detection also different methods like moving average,
hoeffding bound methods, windowing methods are suggested. Some authors have used
neural network. Its accuracy is good but computation speed is slow. Its complexity is
high and self-explanatory level is low. For data mining, some authors have used support
vector machines (SVM).In SVM, visualization of results is less. Use of kernels in SVM
adds more complexity into it. Some authors have swarm intelligence for optimization and
clustering purpose.
Ph. D Thesis Computer Engineering 32
top related