machine learning for signal processing

2
Editorial Machine Learning for Signal Processing This special issue on ‘‘Machine Learning for Signal Processing’’ features a selection of extended versions of papers that have originally been presented at the 2006 IEEE International Work- shop on Machine Learning for Signal Processing (MLSP2006) in Maynooth, Ireland (September 6–8) (formerly called the IEEE International Workshop on Neural Networks for Signal Processing, NNSP). The authors have been invited to contribute to this special issue on the basis of originality, technical quality, and relevance. Also there is one contribution from the authors that won the 2006 MLSP data competition. The invited papers have been subjected to a rigorous and anonymous peer review process. The guest editors are convinced that this special issue provides the reader with interesting examples of how machine learning can tackle today’s challenging signal processing problems. The papers can be roughly grouped into the following categories: clustering and classification, Bayesian methods and generative modeling, signal separation, and applications. 1. Clustering and classification: Renjifo and co-workers address the computational burden when classifiers, such as support vector machines, are trained on large data sets. As a solution to this, the authors propose a new algorithm, called Incremental Asymmetric Proximal SVM, that performs a greedy search across the data to select the basis vectors of the classifier, and then tunes the parameters automatically. Nelson and co-workers introduce a signal-theoretic method that limits the required training and validation of the SVM classifier to a finite kernel hyper-parameter search using the sinc kernel. The method is adapted to the max sequence kernel, so that positive definiteness, and thus convergence, can be guaranteed. Jenssen and Eltoft introduce a new input space analysis of the properties of the sum-of-squared-error K-means clustering performed with the Mercer kernel. Their derivation extends the theory of traditional K-means from properties of mean vectors to information theoretic properties of Parzen window- based probability density estimation. 2. Bayesian methods and generative modeling: Harva and Ray- chaudhury introduce a Bayesian method for estimating the time delays between irregularly sampled signals. The posterior distribution of the delay is obtained partly by an exact marginalization of a specific type of Kalman filter and partly by Markov chain Monte Carlo (MCMC) modeling. Klami and Kaski study data fusion under the assumption that the data source-specific variation is irrelevant and that only the shared variation is relevant. In order to tackle issues such as overfitting and model order selection, which come with addressing shared variation by maximizing a dependency measure, such as with canonical correlation analysis (CCA), the authors turn to probabilistic generative modeling, which in turn makes all tools of Bayesian inference applicable. 3. Signal separation: Lee and co-workers address the blind source separation (BSS) problem by exploiting the prior knowledge that the mixed sources are bounded. A customized contrast function is defined that relies on a simple endpoint estimator. Almeida and Almeida address the nonlinear separation of mixtures of images that occur when a page is scanned or photographed when the background shows through. The authors developed significant improvements of nonlinear denoising source separation (DSS) so that one-shot processing, rather that an iterative one, becomes possible. Radfar and co- workers perform speaker independent single-channel speech separation. They fit a generative model to the envelopes of the log spectra coming from different speakers, consider an expression for the relation between this model and the density of the mixture and the signal-to-signal ratio (SSR) and, finally, estimate the model parameters, along with the SSR, which maximize the log-likelihood of the mixture density. Vincent and Plumbley investigate a generic inference method based on an approximate factorization of the joint product of indepen- dent distributions of small subsets of parameters. They evaluate this method on the task of multiple pitch estimation using different levels of factorization. 4. Applications: O’Grady and Pearlmutter develop an extension to convolutive non-negative matrix factorization (NMF) that includes a sparseness constraint due to which auditory data can be parsimoniously represented. In combination with a spectral magnitude transformation of speech signals, the developed method detects auditory objects that resemble speech phones. Jeong and co-workers apply the nonlinear extension of the minimum average correlation energy (MACE) filter, which relies on correntropy, to face recognition. The computational cost of the correntropy MACE (CMACE) filter is a critical issue in applications, which the authors address with a dimensionality reduction based on random projections. Redmond and co-workers describe a simple denoising technique, based on spatial averaging, to reduce the number of trials needed to increase the signal-to-noise level. They apply their technique to Magnetoencephalography (MEG) data. Miller and co-work- ers consider ensemble classification when there is no common labeled data for designing the function which aggregates classifier decisions. Classifier combinations such as voting methods may perform poorly in this case. The authors propose ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/neucom Neurocomputing 0925-2312/$ - see front matter & 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2008.08.004 Neurocomputing 72 (2008) 1–2

Upload: marc-m-van-hulle

Post on 10-Sep-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

ARTICLE IN PRESS

Neurocomputing 72 (2008) 1–2

Contents lists available at ScienceDirect

Neurocomputing

0925-23

doi:10.1

journal homepage: www.elsevier.com/locate/neucom

Editorial

Machine Learning for Signal Processing

This special issue on ‘‘Machine Learning for Signal Processing’’features a selection of extended versions of papers that haveoriginally been presented at the 2006 IEEE International Work-shop on Machine Learning for Signal Processing (MLSP2006) inMaynooth, Ireland (September 6–8) (formerly called the IEEEInternational Workshop on Neural Networks for Signal Processing,NNSP). The authors have been invited to contribute to this specialissue on the basis of originality, technical quality, and relevance.Also there is one contribution from the authors that won the 2006MLSP data competition. The invited papers have been subjected toa rigorous and anonymous peer review process. The guest editorsare convinced that this special issue provides the reader withinteresting examples of how machine learning can tackle today’schallenging signal processing problems.

The papers can be roughly grouped into the followingcategories: clustering and classification, Bayesian methods andgenerative modeling, signal separation, and applications.

1.

Clustering and classification: Renjifo and co-workers address thecomputational burden when classifiers, such as support vectormachines, are trained on large data sets. As a solution to this,the authors propose a new algorithm, called IncrementalAsymmetric Proximal SVM, that performs a greedy searchacross the data to select the basis vectors of the classifier,and then tunes the parameters automatically. Nelson andco-workers introduce a signal-theoretic method that limits therequired training and validation of the SVM classifier to a finitekernel hyper-parameter search using the sinc kernel. Themethod is adapted to the max sequence kernel, so that positivedefiniteness, and thus convergence, can be guaranteed. Jenssenand Eltoft introduce a new input space analysis of theproperties of the sum-of-squared-error K-means clusteringperformed with the Mercer kernel. Their derivation extendsthe theory of traditional K-means from properties of meanvectors to information theoretic properties of Parzen window-based probability density estimation.

2.

Bayesian methods and generative modeling: Harva and Ray-chaudhury introduce a Bayesian method for estimating thetime delays between irregularly sampled signals. The posteriordistribution of the delay is obtained partly by an exactmarginalization of a specific type of Kalman filter and partlyby Markov chain Monte Carlo (MCMC) modeling. Klami andKaski study data fusion under the assumption that the datasource-specific variation is irrelevant and that only the sharedvariation is relevant. In order to tackle issues such asoverfitting and model order selection, which come with

12/$ - see front matter & 2008 Elsevier B.V. All rights reserved.

016/j.neucom.2008.08.004

addressing shared variation by maximizing a dependencymeasure, such as with canonical correlation analysis (CCA),the authors turn to probabilistic generative modeling, which inturn makes all tools of Bayesian inference applicable.

3.

Signal separation: Lee and co-workers address the blind sourceseparation (BSS) problem by exploiting the prior knowledgethat the mixed sources are bounded. A customized contrastfunction is defined that relies on a simple endpoint estimator.Almeida and Almeida address the nonlinear separation ofmixtures of images that occur when a page is scanned orphotographed when the background shows through. Theauthors developed significant improvements of nonlineardenoising source separation (DSS) so that one-shot processing,rather that an iterative one, becomes possible. Radfar and co-workers perform speaker independent single-channel speechseparation. They fit a generative model to the envelopes of thelog spectra coming from different speakers, consider anexpression for the relation between this model and the densityof the mixture and the signal-to-signal ratio (SSR) and, finally,estimate the model parameters, along with the SSR, whichmaximize the log-likelihood of the mixture density. Vincentand Plumbley investigate a generic inference method based onan approximate factorization of the joint product of indepen-dent distributions of small subsets of parameters. Theyevaluate this method on the task of multiple pitch estimationusing different levels of factorization.

4.

Applications: O’Grady and Pearlmutter develop an extension toconvolutive non-negative matrix factorization (NMF) thatincludes a sparseness constraint due to which auditory datacan be parsimoniously represented. In combination with aspectral magnitude transformation of speech signals, thedeveloped method detects auditory objects that resemblespeech phones. Jeong and co-workers apply the nonlinearextension of the minimum average correlation energy (MACE)filter, which relies on correntropy, to face recognition. Thecomputational cost of the correntropy MACE (CMACE) filter is acritical issue in applications, which the authors address with adimensionality reduction based on random projections. Redmondand co-workers describe a simple denoising technique, basedon spatial averaging, to reduce the number of trials needed toincrease the signal-to-noise level. They apply their techniqueto Magnetoencephalography (MEG) data. Miller and co-work-ers consider ensemble classification when there is no commonlabeled data for designing the function which aggregatesclassifier decisions. Classifier combinations such as votingmethods may perform poorly in this case. The authors propose

ARTICLE IN PRESS

Editorial / Neurocomputing 72 (2008) 1–22

several transductive methods, of which a constraint-based oneseems to perform best. The new method is applied to biometricauthentication.

The editors would like to thank all the authors for their excellentpapers, and the anonymous reviewers for their comments and usefulsuggestions. Special thanks go to Dr. Tom Heskes for inviting us toedit this special issue, and to Vera Kamphuis from NeurocomputingEditorial Office for her help in putting it all together.

Marc M. Van HulleK.U.Leuven, Belgium

E-mail address: [email protected]

Jan LarsenTechnical University of Denmark, Denmark