a comparison study of support vector machines …relialab.org/upload/files/06_03_comparison svm...

Journal of

MechanicalScience andTechnology

Journal of Mechanical Science and Technology 21 (2007) 607~615

A Comparison Study of Support Vector Machines and HiddenMarkov Models in Machinery Condition Monitoring

Qiang Miao, Hong-Zhong Huang*, Xianfeng FanSchoolofMechatronics Engineering, University ofElectronic Science and Technology ofChina,

Chengdu, Sichuan, 610054, P R. China

(Manuscript Received August 31,2006; Revised March 16,2007; Accepted March 16, 2007)

Abstract

Condition classification is an important step in machinery fault detection, which is a problem of pattern recognition.Currently, there are a lot of techniques in this area and the purpose of this paper is to investigate two popularrecognition techniques, namely hidden Markov model and support vector machine. At the begiIming, we brieflyintroduced the procedure offeature extraction and the theoretical background ofthis paper. The comparison experimentwas conducted for gearbox fault detection and the analysis results from this work showed that support vector machinehas better classification performance in this area.

Keywords: Pattern recognition; Feature extraction; Hidden markov model; Support vector machine

1. Introduction

Condition monitoring and classification of machinery is a rapidly developing area of machine leamingand pattern recognition, especially with the application of artificial intelligence techniques such asArtificial Neural Networks (ANN) and Support Vector Machines (SVM). In the procedure of conditionmonitoring and classification, vibration data arecommonly chosen because of their properties of easycollection, abundant information, and relatively lowoverhead. However, raw vibration data carmot bedirectly used as input of classification system, sincethese signals are usually contaminated by noises.Signal processing and feature extraction are necessarysteps that can eliminate noises and enhance usefulinformation related to machine conditions. The basicidea of signal processing and feature extraction is totransform the original data from one domain (usually

'Corresponding author. Tel.. +86 2883206916, Fax.. +86 2883206828E-mail address:[email protected]

in time domain) to another space (usually in timefrequency domain) with some mathematical tools.Through this transform, important information can beintensified and identified.

In recent years, a lot of feature extraction methodshave been proposed in machinery condition monitoring problem utilizing vibration data. For example,Xu and Ge (2004) proposed an intelligent fault diagnosis system in monitoring stamping process usingwavelet packet, in which features are acquired fromtime marginal energy and frequency marginal energy.Tahk and Shin (2002) proposed a diagnosis algorithmto detect the defective rollers based on the frequencyanalysis of web tension signals, which is to use thecharacteristic features (root mean square, peak value,power, spectral density) of tension signals for theidentification of the faulty rollers and the diagnosis ofthe degree of fault in the rollers. In addition, theselection of time-domain signal statistics, such asmean, root mean square, variance, skewness, kurtosisand normalized higher order central moments, isanother way of feature extraction (Samanta, 2004) to

608 Hong-Zhong Huang et al. / Journal ofMechanical Science and Technology 21 (2007) 607-615

distinguish between normal and defective conditionsof pump. In this paper, we realized this step throughthe e>..1:raction of Lipschitz exponent function proposed in (Miao, 2005). During next section, a briefintroduction of our feature extraction method will begIVen.

The research presented in this paper is to comparethe classification performance of Hidden MarkovModels (HMM) and SVM in gear fault detection.HMM is based on statistical leaming theory and ispredominantly applied in current speech recognitionsystems (Huo et aI., 1995) because it can normalizethe time-variation of the speech signal and characterize the speech signal statistically and optimally.Moreover, application of HMM in condition monitoring is another popular research topic includingstamping process monitoring (Xu and Ge, 2004), toolwear monitoring (Erhmc et aI., 2001), and bearingfault detection (Ocak and Loparo, 2001), etc. SupportVector Machine, which is a relatively new techniquebased on statistical leaming theory, is gaining interests of researchers in the areas of machine leaming,computer vision and pattern recognition (Burges,1998; Yuan, 2006). The main difference betweenHMM and SVM is in the principle of risk minimization (RM) (Vapnik, 1999; Samanta, 2004). Incase of HMM, empirical risk minimization (ERM) isused, which is the simplest induction principlewhereas in SVM, structural risk minimization (SRM)is used as induction principle. The difference in RMleads to better generalization performance for SVMthanHMM.

In this paper, we investigate the classificationperformance of these two methods in machinerycondition monitoring. An example of gear fault detection is given to compare their performance fromthe aspect of classification accuracy. The rest of thepaper is organized as follows. Section 2 introducesthe procedure of wavelet transform and extraction ofLipschitz exponent function. Theoretical backgroundof SVM and HMM are described in Section 3. InSection 4, a comparison study of SVM and HMMbased classifiers is conducted in solving gearboxcondition monitoring problem. Conclusion from thisresearch is drawn in Section 5.

2. Feature extraction in condition monitoring

2.1 Diagram ofproposed condition monitoring system

Generally, a condition monitoring system of machi-

nery can be divided into three subsystems: signalprocessing unit, feature extraction unit and conditionclassification unit. In this paper, vibration data recorded from a gearbox case is used as the example forthe comparison study. Raw vibration data is processedwith wavelet transform and the output is a timefrequency representation of signal. In feature e>..1:raction unit, the Lipschitz exponent function is e>..1:ractedfrom time-frequency representation. Finally, a comparison between SVM based classification systemand HMM based one is conducted. Figure 1 is a flowchart of condition monitoring procedure in this research.

2.2 Extraction of lipschitz exponent function withwavelet

Wavelets have been gaining popularity as a choiceof multi-scale transform since their first appearance inthe early 1990s. Wavelets are mathematical functionsthat decompose a signal into its constituent partsusing a set of wavelet basis functions. The family ofbasis functions used for wavelet analysis is created byboth dilations (scaling) and translations (in timedomain) of a mother wavelet, thereby providing bothtime and frequency information about the signalbeing analyzed. The resulting coefficients from thewavelet transform of a time domain signal, such asthe acceleration response of a structure, can berepresented in a two-dimensional time-scale map.

Signals recorded from rotating machinery usuallyinclude stationary components coming from periodic

Fig. 1. The diagram of condition monitoring system.

Hong-Zhong Huang et al. / Journal ofMechanical Science and Technology 21 (2007) 607~615 609

Here a and bare bOlllldaries that define a smallinterval of x. Then at each modulus maximum (s, x) inthe cone defined by (4),

WhereB=A (HC').Using the above mentioned method, singularities in

a signal can be detected by examining the evolution

Eq. (2) guarantees that the wavelet with n+l vanishing moments is orthogonal to the polynomials oforder up to n. If the fllllction.f(x) is Lipschitz 0. at Xo,

n< 0. <n+l, then there exists a constantA such that forall points x in a neighbourhood of Xo and any scales(Mallat and Hwang, 1992),

(4)

(6)

(3)

(5)

(2)fxklf/(x)dx= O,iorO < k < n + 1

I Wf(s,x) I:'O:Bsa

,

1x- Xo I::; Cs,

which is equivalent to

The local Lipschitz 0. of .f(x) at Xo depends on thedecay of IW.f(s, x)1 at fine scales in the neighborhoodof Xo. The decay can be measured through the localmodulus maxima. Define modulus maxima as anypoint (so, xo) such that IW.f(so, x)1 is a local maxima atx= Xo. If there exists a scale So > 0, and a constant C,such that for XE (a, b) and s< so, all the modulusmaxima of W.f(s, x) belong to a cone defined by (Miaoand Makis, 2007)

to the scales. The wavelet method for estimating theLipschitz exponent is similar to that of the Fouriertransform. By examining the decay of time-scale mapat specific points in time across all scales (frequencies), the local Lipschitz regularity of the signalcan be determined.

In order to measure Lipschitz exponent 0., we mustimpose that the wavelet lfJ(x) has enough vanishingmoments. For example, a wavelet lfJ(x) is said to haven+1 vanishing moments, if and only if for all positiveintegers k < n+1, following condition can be satisfied,

We call Lipschitz regularity of .f(x) and xo, thesuperior bound of all values 0. such that .f(x) isLipschitz 0. at Xo. In case .f(x) is continuously differentiable at a point, it has Lipschitz 1 at this point. Ifthe Lipschitz regularity 0. of.f(x) at x = xo, satisfies n<0. <n+1, then.f(x) is n times differentiable at Xo but itsnth derivative is singular at Xo and 0. characterizes thissingularity. Here n is a positive integer.

To measure the Lipschitz 0., a classical method is tolook at the asymptotic decay of the amplitude offunction .f(x)'s Fourier transform. However, theasymptotic decay of a signal's frequency spectrumrelates directly to the uniform Lipschitz regularity. Itonly provides a measure of the minimum globalregularity of the fllllction and ca=ot localize theinformation along the temporal variable x (Loutridisand Trochidis, 2004), which means it is helpless inhandling non-stationary transient signals. On the otherhand, if the wavelet lfJ(x) has a compact support, thevalue of wavelet coefficient W.f(s, x) depends on thevalue of.f(x) in a neighborhood, of size proportional

.f(X) - Pix-Xo) I ::;AIx-xol a for ~-xol < ho. (1)

motion of system such as gear meshing. However, theoccurrence of machine failure may result in emergence of non-stationary transients masked by noiseand normal meshing components of rotating object. Asignal processing method that ignores the regular partof a signal and focuses on the transient part has morepotential in extracting diagnostic information. In thisresearch, these transient phenomena can be treated assingularities in signal and a local method with goodsignal to noise characteristics should be appliedbecause of their low energy content. Such a method isthe singularity analysis based on wavelet pioneeredby Mallat and Hwang (1992). According to this method, most of fault related information in a signal canbe captured in the local maxima of the wavelet transform modulus and characterized by Lipschitz exponent, which is a quantitative description of signal'slocal regularity.

In order to llllderstand the concept of Lipschitzexponent and some new methods derived from it, astrict definition is given here. Assume .f(x) to be a[mite energy fllllction, that is, .f(x)EL2(R). The definition of Lipschitz exponent is: .f(x) is said to beLipschitz 0. at xo, if and only if there exists twoconstants A and ho > 0 and a polynomial Pn(x) oforder n, such that

610 Hong-Zhong Huang et al. / Journal ofMechanical Science and Technology 21 (2007) 607~615

Fig. 2. An example of2-class classifier by SVM.

I

Subject to L A,Yi = 0, A, ~ 0i=l

I(x) =sgn {tA,YiXTXi +b} (i = 1, ... , l). (8)

Here sgn(*) is the sign fllllction and the Lagarangecoefficient Ai is the solution of the following quadraticprogramming problem (Yuan and Chu, 2006):

(7)

o

oo Pcsi ti ve a ass

HI.H

D={ (~'Yi)} (i = 1, .. .l),YiE {-I, +l}

H2 is called the margin. The closest positive class andnegative class points lying on these two hyperplanesare called the support vectors.

In this section, we briefly describe the SVM theoryand more details can be fOlllld in (Burges, 2004).Assume we have a data set:

where A = {Ai}. In real-world application, problemstypically involve data which can only be separatedusing a nonlinear decision surface. Optimization onthe input data in this case involves the use of a kernelbased transformation. By projecting the originalsample space into a high-dimensional eigenspacewith a kernel fllllction K(X, ~), the nonlinear problembecomes linearly separable and the SVM classifier

where ~ is input sample and Yi is output class. Thegoal of SVM is to define a hyperplane which dividesD, such that all the points in the same class are on thesame side of the hyperplane H while maximizing thedistance between the two classes and the separatinghyperplane. The optimal separating plane is a linearclassifier that can be defmed as the following decisionfllllction:

of the modulus maxima of the wavelet transformalong maxima line. Modulus maxima line is definedby co=ecting the adjacent modulus maxima points.An alternative to the extraction of the maxima line isto look at the decay of the wavelet modulus across thescales at each time point. Points where large changesoccur in the signal (such as singularities) will havelarge coefficients at certain scales, thus havingsignificant decay. The measure of this decay is theLipschitz exponent of the signal at a given time point.By examining the change of this exponent in time,singularities can be identified.

Based on the analysis above, Lipschitz exponentfunction Lp(x) can be derived as follows. Theresulting coefficients from wavelet transform ofvibration signal generate a two-dimensional timescale representation which takes a form of matrix.One dimension of the time-scale matrix represents adifferent scale (s), while the other one (x) denotes adifferent time point in the signal. Take each colur=,which represents the frequency spectrum of the signalat the corresponding time point x, and use LinearRegression method to approximately estimate a(Robertson et aI., 2003). Finally the Lipschitz exponent fllllction Lp(x) can be achieved which is afunction that describes the change of Lipschitz valuealong x. The choice of Linear Regression method is asimplification of (6) and similar technique is validatedby Robertson et al. (2003) in structural healthmonitoring.

3. Theoretical background of SVM and HMM

3.1 Condition classification based on SVM

Support vector machine (SVM) (Vapnik, 1999) is aclassifier that estimates decision surfaces directlyrather than modeling a probability distribution acrossthe training data. SVMs have demonstrated goodperformance on several classic pattern recognitionproblems including machinery condition monitoring(Burges, 1998; Yang et aI., 2005; Yuan and Chu,2006). Figure 2 is an example of the classification ofa series of points for two different classes (represented by triangles and squares, respectively). It is atypical 2-class (positive and negative classes) SVMproblem in which the points are perfectly separableusing a linear decision region. H is a separatinghyperplane. HI and H2 define two hyperplanes andthey are parallel to H. The distance between HI and


Vapnik (1999) provides us several kernel functionslike linear, polynomial, Gaussian and Laplacian radialbasis functions (REF). In this research, the GaussianREF kernel, given by Eq. (11) is used. Fig. 3. A simple example of HMM with 3 hidden states.

has the classifYing function as follows:

I(X) = sgn{tA,YiK(X,Xi)+b} (10)

NxJe 1

QI

!I

Q!!

Qi

(11)

where (J denotes the width of the Gaussian REFkernel parameter and can be determined by aniterative process to select an optimum value based onthe full feature set.

3.2 Condition classification based on HMM

In the last decade, HMM has attracted the attentionof many researchers in pattern recognition such ashandwriting, speech and signature verification. Thepower of an HMM lies in its ability to model thetemporal evolution of a signal via an underlyingMarkov process. Widespread use of HMMs can beattributed to the availability of efficient parameterestimation procedures that involve maximizing thelikelihood of the data given the model. This is aniterative method called expectation maximization(EM) algorithm. The EM algorithm provides aniterative framework for maximum likelihood estimation with good convergence property, though it doesnot guarantee finding the global optimality.

HMM has advantages that allow it to providesolutions by modeling and leaming by itself, even if itdoes not have exact knowledge about the problem tobe solved. HMMs are divided into continuous anddiscrete models according to their probability densitydistribution. In this paper, discrete HMMs are selected because they can represent any distribution asno assumptions are made regarding the underlyingdistribution.

HMM is represented by a graph structure thatconsists ofN nodes, called hidden states, and arcs thatrepresent transitions between nodes. Figure 3 is anexample of HMM model with N=3 hidden states. Theshadow nodes in the figure represent observationsfrom corresponding hidden states. The observationsymbol probability distribution models spatial charac-

teristics, and the state transition probability distribution models time evolution characteristics. HMMstates are not directly observable, but can be inferredthrough a sequence of observed symbols. To describethe HMM formally, the following model notationswill be used.

• set of hidden states:S = {Sj,S2, ... ,SN} ' where N is the number of

states in HMM,• state transition probability distribution:A = {aijf, where au = P[q'+l=~ Iq, = Sd, for 1:C::: i,

j:C:::N,• set of observation symbols:V = {vj ,v2 , ... ,VM }, where M is the number of

observation symbols per state,• observation symbol probability distribution:B = {b j (k)}, where bik) = P[Vk aU Iq, = .'lj], tor 1:C:::

j:C:::N, 1:C::: k:;,M.

• initial state probability distribution:Jr ={Jri }, where ffi = p[qj = Si]' for 1:C::: i:C:::N.

Where q, represents the hidden state at time t. AnHMM can be represented by the compact notationA = (A, B, Jr) . HMM modeling involves choosing

the number of hidden states, N, the number of discretesymbols, M, and the specification of three probabilitydistributions A, B, and Jr.

4. Comparison Study between SVM and HMM

4.1 Experimental set-up

Vibration data used in this paper are measured froma gearbox driven by an electrical motor underlaboratory environment. This is a single stage gearreduction unit mounted on a mechanical diagnostictest bed (MDTB, see Fig. 4). The MDTB is functionally a motor-drive train-generator test stand(Byington and Kozlowski, 2000). The gearbox is


driven at a set input speed using a 30 Hp, 1750 rpmAC (drive) motor, and the torque is applied by a 75Hp, 1750 rpm AC (absorption) motor. The maximumspeed and torque are 3500 rpm and 225 ft-lbsrespectively. In this test TIlll, the shaft speed is keptconstant at l750rpm. The variation of the torque isaccomplished by a vector unit capable of controllingthe current output of the absorption motor. TheMDTB is highly efficient because the electricalpower that is generated by the absorber is fed back tothe driver motor. The mechanical and electrical lossesare sustained by a small fraction of wall power. TheMDTB has the capability of testing single and doublereduction industrial gearboxes with ratios from about1.2: 1 to 6: 1. The general information of gearbox isshown in Table 1.

The experiment started with a brand new gearboxllllder 100% of rated workload. Mter sometime, itincreased to 200% of rated workload and ran lllltilsome criteria of gearbox health condition weresatisfied. However, the gear failure occured before thetestrig was stopped. During this procedure, vibrationdata were recorded occasionally with a period of 10seconds at a sampling rate of 20 kHz. Each time wecollected vibration data (10 seconds), it was saved asa data file and numbered consequently. Therefore,there are 148 data files, among which 12 of them (file1 to 12) are llllder 100% rated workload and normalconditon, 25 of them (file 13 to 37) llllder 200% rated

workload and normal conditon, and the remainingIII of them (file 38 to 148) under 200% ratedworkload and failure condition. Figure 5 is a graphicdemonstration of data history in this experiment.

Each data file has a length of 10 seconds (200,000sampling points), and one revolution of gear contains1052 sampling points, which is calculated based onmechanical specifications of the gearbox. Thus, eachdata file contains arolllld 190 periods of revolutionand can be divided into 10 bins (lOx 19=190), whichmeans we can extract 10 Lp fllllctions from each datafiles. These Lp fllllctions will be used as input featuresof classifiers.

As we introduced previously, there are 148 gearboxvibration data files, which can be divided into threesets of data. That is, 12 of them are llllder 100% ratedworkload and normal condition (Dataset A), 25 ofthem llllder 200% rated workload and normal condition (Dataset B), III of them llllder 200% workloadand failure condition (Dataset C). Since Dataset Band Care llllder same working environment anddifferent machine conditions, we select data from Band C for the training and testing of correspondingclassifiers without consideration of Dataset A. Wechoose 25 data files from Dataset Band 30 fromDataset C to form normal (Nl) and failure (PI)datasets, respectively. In addition, we select another30 data files from Dataset C and name it as F2.Therefore, there are three datasets, Nl, Fl and F2.The difference between F1 and F2 is that F1 isselected from data files that the gearbox is under earlyfailure stage (file number between 38 and 92) and F2is selected from data files that the gearbox is llllderheavy failure stage (file number between 93 and 148).Figure 6 is a graph to show the visual differenceof Lipschitz exponent fllllctions from these threedatasets.

Fig. 4. Mechanical diagnostic test bed. Start of experiment(file 1)

Occurrence offailure (file 38)

Table 1. Gearbox information in this experiment.

Fig. 5. Description of vibration data history.

Change of workload(file 13)

time axis

failure condition

workload200% of rated

workload

normal condi tion

100% of rat

DS3S0l50XX

DodgeAPG

R8600l

1750 rpm

1.533

2.388

46

30

GearboxID#

Make

Model

Rated Input Speed

Gear Ratio

Contact Ratio

Number ofTeeth (Driven gear)

Number of Teetb (Pinion gear)

Hong-Zhong Huang et al. / Journal ofMechanical Science and Technology 21 (2007) 607~615

f:~~do 100 200 300 400 500 600 700 eoo 900 1000

(oj Plot of Lp !Joel len under namal cendmen N1 Sompllng Pelnl. (Pelnl)

i:~ ;:: I I: ' do 100 200 300 400 500 600 700 eoo 900 1000

(b) Piel oILp lmetien l.I1der liglll failure eon<itien F1 Sompllng Pdnl. (Point)

l:~o 100 200 300 400 500 600 700 eoo 900 1000

(c) Plot ofLp flJIction under heavy failure coodtioo F2 Samplir.g PdnlS (Point)

Fig. 6. Plot of Lipschitz exponent functions in three datasets (Nl, Fl, F2).

613

observationsequence

P(Olnormal)

P(Olfailure)

Fig. 7. Structure of HMM based classification system.

4.2 Specification ofSVM and HMM based classifiersin this comparison

In SVM modeling, since this is a two-class problem,one SVM model is enough for classification. In thispaper, we used SVM Matlab toolbox developed byRakotomamonjy and Canu (2005). The GaussianREF is chosen as the kernel function and theparameter (J is 0.5, which is based on n-fold crossvalidation procedure. In this procedure, 4 data filesfrom Nl and F1 respectively are split into 4 partitionsand (J is selected from 0.1 to 0.9 with step size 0.1.Although the convergence of REF kernels is typicallyslower than that of polynomial kernels, REF kernelsoften deliver better performance and they are extremely popular (Shao and Chang, 2005). In the training step, we train the SVM based classifier withLipschitz exponent functions extracted from 4 datafiles ofNl and Fl respectively and the remaining are

used for testing.In HMM modeling, two HMM models are esta

blished, one for normal condition and the other forfailure condition. In this case, the hidden states ofeach HMM do not have certain physical meanings. Intesting procedure, compute the log-likelihood foreach model given the input observation (features) andmake decision based on the model that has larger loglikelihood. The number of hidden states in eachmodel is 5, which is based on the evaluation ofHMM-based system performance with different hidden states. Figure 7 shows the structure of HMMbased classification system.

4.3 Comparison results

This is a typical two-class condition classificationproblem because we only consider two machineryconditions, namely normal and failure. In vibration


Table 2. Perfonnance comparison of HMM and SVM basedclassifiers.

HMM based classifier SVM based classifier

Dataset Test Training Test Trainingsuccess time (s) success time (s)('Yo) ('Yo)

N1 85.71 12.371 85.71 4.783

F1 88.46 13.103 92.31 5.892

F2 76.67 83.33

signal processing, comblet and wavelet transform hasbeen applied and details can be fOlmd in (Miao, 2005).Since the signal in each data file is partitioned into 10bins, wavelet transform is applied for these 10 piecesof signals. In feature extraction, 10 Lipschitz exponent fllllctions Lp corresponding to the wavelet transform of 10 pieces of signals are extracted from onedata file. Therefore, there are 250, 300, 300 featurevectors in dataset Nl, F 1 and F2, respectively. In thetraining step, we use 40 feature vectors from 4 datafiles in Nl and F 1 datasets. In the testing step, all theremaining data are used for validation.

The performance of the classifiers is evaluatedusing the detection rate, or test success percentage.Table 2 is the analysis results from this study.Theoretically, the performance of SVM basedclassifier should be better than HMM based classifiersince the former one is based on the principle ofstructural risk minimization (Justino et aI., 2005). Thecomparison shows that SVM has better performancethan HMM in the testing dataset Fl (92.31% withSVM based classifier v.s. 88.46% with HMM basedclassifier) although the difference is not very distinct(since they have same detection rate in Nl dataset,85.71%). In addition, when we use Dataset F2 to testthe classifiers, the drop of performance of HMMbased classifier (from 88.46% to 76.67%) is greaterthan that of SVM (from 92.31 % to 83.33%). This isan interesting phenomenon and it demonstrates thatSVM has better generalization than HMM becausewith the aggravation of fault, the previously trainedclassifier may not be accurate enough. From thisaspect, another problem is proposed for the adaptivetraining of classifier. Furthermore, the computation(for a PC with Pentium IV processor of 1.7GHz and512Mb RAM) for training the classifiers are alsoshown, and SVM has better efficiency. However, itshould be mentioned that the difference incomputation time should not be very important if thetraining is done off-line.

5. Conclusions and future work

The paper presents a brief comparison of HMMand SVM in machinery condition monitoring. This isan investigation of pattern recognition techniques forthe design of an efficient and robust classificationsystem. Until now, based on the analysis results wecan see that SVM has better recognition performanceand generalization. In the future research, we willinvestigate the design of adaptive classifier because itis a realistic problem in machinery condition monitoring due to the deterioration of machine condition.

Acknowledgments

This research was partially supported by theNational Natural Science FOlllldation of China lllldercontract number 60672165. We are most grateful tothe Applied Research Laboratory at the Pe=sylvaniaState University and the Department of the Navy,Office of the Chief of Naval Research (ONR) forproviding the vibration data used to develop this work.We also appreciate very much the valuable andconstructive comments from the anonymous refereeswho greatly helped to improve this work.

References

Burges, C. 1. c., 1998, "A tutorial on SupportVector Machines for Pattern Recognition," DataMining and Knowledge Discovery, Vol. 2, pp. l2l~

167.Byington, C. S., Kozlowski, 1. D., 2000, "Transi

tional Data for Estimation of Gearbox RemainingUseful Life," Mechanical Diagnostic Test Bed Data,Applied Research Laboratory, The Pe=sylvaniaState University.

Ertllllc, H M., Loparo, K. A, Ocak, H, 2001,"Tool Wear Condition Monitoring in DrillingOperations Using Hidden Markov Models(HMM),"International Journal of Machine Tools & Manufacture, Vol. 41, pp. 1363~1384.

Huo, Q., Chan, c., Lee, C. H, 1995, "BaysianAdaptive Learning of the Parameters of HiddenMarkov Model for Speech Recognition," IEEETransaction on Speech and Audio Processing, Vol. 3,No.5, pp. 334~345.

Justino, E. 1. R., Bortolozzi, F, Sabourin, R., 2005,"A Comparison of SVM and HMM Classifiers in theOff-Line Signature Verification," Pattern RecognitionLetters, Vol.26, pp. 1377~1385.


Loutriis, S., Trochidis, A, 2004, "Classification ofGear Faults Using Hoelder Exponents," MechanicalSystems and Signal Processing, Vol. 18, pp. 1009~

1030.Mallat, S., Hwang, W L., 1992, "Singularity

Detection and Processing with Wavelets," IEEETransactions on Information Theory, Vol. 38, No.2,pp.6l7·...-643.

Miao, Q., Makis, V, 2007, "Condition Monitoringand Classification of Rotating Machinery UsingWavelets and Hidden Markov Models," MechanicalSystems and Signal Processing, Vol. 21, No.2, pp.840~855.

Miao, Q., 2005, "Application of Wavelets andHidden Markov Model in Condition-Based Maintenance," Ph.D Thesis, University of Toronto.

Ocak, H, Loparo, K. A, 2001, "A New BearingFault Detection and Diagnosis Scheme Based onHidden Markov Modeling of Vibration Signals,"IEEE ICASSP 2001, Vol. 5,pp. 3l4l~3l44.

Rakotomamonjy, A, Canu, S., 2005, http://asi.insarouen.fr/~arakotom/toolbox/index.html.

Robertson, A N., Farrar, C. R, Sohn, H, 2003,"Singularity Detection for Structural HealthMonitoring Using Holder Exponents," MechanicalSystems and Signal Processing, Vol. 17, No.6, pp.1163~1l84.

Samanta, B., 2004, "Gear Fault Detection Using

Artificial Neural Networks and Support Vector Machines with Genetic Algorithms," Mechanical Systemsand Signal Processing, Vol. 18, No.3, pp. 625~44.

Shao, Y, Chang, C. -H, 2005, "Wavelet Transformto Hybrid Support Vector Machine and HiddenMarkov Model for Speech Recognition," IEEEISCAS 2005, pp. 3833~3836.

Tahk, K. M., Shin, K. H, 2002, "A Study on theFault Diagnosis of Roller-Shape Using FrequencyAnalysis of Tension Signals and Artificial NeuralNetworks Based Approach in a Web Transport System," KSME International Journal, Vol. 16, No. 12,pp. l604~16l2.

Vapnik, V N., 1999, "An Overview of StatisticalLearning Theory," IEEE Transaction on NeuralNetworks, Vol. 10, No.5, pp. 988~999.

Xu, Y, Ge, M., 2004, "Hidden Markov ModelBase Process Monitoring System," Journal ofIntelligent Manufacturing, Vol. 15, pp. 337~350.

Yang, B. S., Han, T., Hwang, W W, 2005, "FaultDiagnosis of Rotating Machinery Based on Multiclass Support Vector Machines," Journal ofMechanical Science and Technology, Vol. 19, No.3, pp.846~859.

Yuan, S. F, Chu, F L., 2006, "Support VectorMachines-Based Fault Diagnosis for Turbo-PumpRotor," Mechanical Systems and Signal Processing,Vol. 20, No.4, pp. 939~952.

a comparison study of support vector machines …relialab.org/upload/files/06_03_comparison svm...

Documents