blind source separation for non-stationary mixing

9
Journal of VLSI Signal Processing 26, 15–23, 2000 c 2000 Kluwer Academic Publishers. Manufactured in The Netherlands. Blind Source Separation for Non-Stationary Mixing RICHARD EVERSON Department of Computer Science, University of Exeter, Exeter, UK STEPHEN ROBERTS Department of Engineering Science, University of Oxford, Oxford, UK Received March 25, 1999; Revised November 3, 1999 Abstract. Blind source separation attempts to recover independent sources which have been linearly mixed to produce observations. We consider blind source separation with non-stationary mixing, but stationary sources. The linear mixing of the independent sources is modelled as evolving according to a Markov process, and a method for tracking the mixing and simultaneously inferring the sources is presented. Observational noise is included in the model. The technique may be used for online filtering or retrospective smoothing. The tracking of mixtures of temporally correlated is examined and sampling from within a sliding window is shown to be effective for destroying temporal correlations. The method is illustrated with numerical examples. 1. Introduction Over the last decade in particular there has been much interest in methods of blind source separation (BSS) and deconvolution (see [1] for a review). One may think of the blind source separation as the problem of identifying speakers (sources) in a room given only recordings from a number of microphones, each of which records a linear mixture of the sources, whose statistical characteristics are unknown. The casting of this problem (which is often referred to as Independent Component Analysis – ICA) in a neuro-mimetic frame- work [2] has done much to simplify and popularise the technique. More recently still the ICA solution has been shown to be the maximum-likelihood point of a latent- variable model [3–5]. Here we consider the blind source separation prob- lem when the mixing of the sources is non-stationary. Pursuing the speakers in a room analogy, we address the problem of identifying the speakers when they (or equivalently, the microphones) are moving. The prob- lem is cast in terms of a hidden state (the mixing pro- portions of the sources) which we track using dynamic methods similar to the Kalman filter. We first briefly review classical ICA and describe a source model which permits the separation of light- tailed (leptokurtic) sources as well as heavy tailed sources, which the standard ICA model implicitly as- sumes. ICA with non-stationary mixing is described in terms of a hidden state model and methods for estimat- ing the sources and the mixing are described. Finally we address the non-stationary mixing problem when the sources are independent, but possess temporal cor- relations. 2. Stationary ICA Classical ICA assumes that there are M indepen- dent sources whose probability density functions are p m (s m ). Observations, x t R N , are produced by the instantaneous linear mixing of the sources by A: x t = As t (1)

Upload: richard-everson

Post on 02-Aug-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Blind Source Separation for Non-Stationary Mixing

Journal of VLSI Signal Processing 26, 15–23, 2000c© 2000 Kluwer Academic Publishers. Manufactured in The Netherlands.

Blind Source Separation for Non-Stationary Mixing

RICHARD EVERSONDepartment of Computer Science, University of Exeter, Exeter, UK

STEPHEN ROBERTSDepartment of Engineering Science, University of Oxford, Oxford, UK

Received March 25, 1999; Revised November 3, 1999

Abstract. Blind source separation attempts to recover independent sources which have been linearly mixed toproduce observations. We consider blind source separation with non-stationary mixing, but stationary sources. Thelinear mixing of the independent sources is modelled as evolving according to a Markov process, and a methodfor tracking the mixing and simultaneously inferring the sources is presented. Observational noise is included inthe model. The technique may be used for online filtering or retrospective smoothing. The tracking of mixtures oftemporally correlated is examined and sampling from within a sliding window is shown to be effective for destroyingtemporal correlations. The method is illustrated with numerical examples.

1. Introduction

Over the last decade in particular there has been muchinterest in methods of blind source separation (BSS)and deconvolution (see [1] for a review). One maythink of the blind source separation as the problem ofidentifying speakers (sources) in a room given onlyrecordings from a number of microphones, each ofwhich records a linear mixture of the sources, whosestatistical characteristics are unknown. The casting ofthis problem (which is often referred to as IndependentComponent Analysis – ICA) in a neuro-mimetic frame-work [2] has done much to simplify and popularise thetechnique. More recently still the ICA solution has beenshown to be the maximum-likelihood point of a latent-variable model [3–5].

Here we consider the blind source separation prob-lem when the mixing of the sources is non-stationary.Pursuing the speakers in a room analogy, we addressthe problem of identifying the speakers when they (orequivalently, the microphones) are moving. The prob-lem is cast in terms of a hidden state (the mixing pro-

portions of the sources) which we track using dynamicmethods similar to the Kalman filter.

We first briefly review classical ICA and describe asource model which permits the separation of light-tailed (leptokurtic) sources as well as heavy tailedsources, which the standard ICA model implicitly as-sumes. ICA with non-stationary mixing is described interms of a hidden state model and methods for estimat-ing the sources and the mixing are described. Finallywe address the non-stationary mixing problem whenthe sources are independent, but possess temporal cor-relations.

2. Stationary ICA

Classical ICA assumes that there areM indepen-dent sources whose probability density functions arepm(sm). Observations,xt ∈ RN , are produced by theinstantaneous linear mixing of the sources byA:

xt = Ast (1)

Page 2: Blind Source Separation for Non-Stationary Mixing

16 Everson and Roberts

The mixing matrix,A, must have at least as many rowsas columns (N ≥M), so that the dimension of eachobservation is at least as great as the number of sources.The aim of ICA methods is to recover the latent sourcesst by findingW, the (pseudo-) inverse ofA:

st = Wxt = W Ast (2)

The assumption that the sources are independentmeans that the joint probability density function (pdf)of the sources factorises into the product of marginaldensities:

p(st ) =M∏

m=1

p(smt

)(3)

Using this factorisation, the (pseudo) likelihood of theobservationxt is [3–5]:

log l = − log |detA| −M∑

m=1

log pm(smt

)(4)

The normalised log likelihood of a set of observationst = 1, . . . , T is therefore

logL = − log |detA| − 1

T

T∑t=1

M∑m=1

log pm(smt

)(5)

The optimumA may then be found by maximisationof logL with respect toA, assuming some specificform for p(sm

t ). Sucessive gradient ascents on loglleads to the Bell & Sejnowksi stochastic learning rulefor ICA [2], while batch learning is achieved by max-imising logL. Learning rates may be considerably en-hanced by modifying the learning rule to make it co-variant [3, 6].

Since the likelihood is unchanged ifA is pre-multiplied by a diagonal matrixD or a permutationmatrix P, the original scale of the sources cannot berecovered. The separating matrixW is therefore onlythe inverse ofA up to a diagonal scaling and permuta-tion, that is:

W A= P D (6)

In order to maximise the likelihood some assump-tions about the form of the source pdfsp(sm

t ) must bemade, even though they area priori unknown. A com-mon choice isp(sm

t ) ∝ 1/ cosh(smt ), which leads to

a tanh nonlinearity in the learning rule. Although the

source model is apparently fixed, scaling of the mix-ing matrix tunes the model to particular sources [7],and with a tanh nonlinearity platykurtic (heavy tailed)sources can be separated, although not leptokurtic ones.Cardoso [8] has elucidated the conditions under whichthe true mixing matrix is a stable fixed point of thelearning rule.

2.1. Generalised Exponentials

By adopting a more flexible model for the source den-sities one might be able to separate a wider rangeof source densities. Attias [9] has used mixtures ofGaussians to model the sources, which permits multi-modal sources and Lee et al. [10] switch between sub-and super-Gaussian source models.

In order to be able to separate light-tailed sources wehave used the generalised exponential density:

p(sm |θm) = z exp−∣∣∣∣sm − νm

wm

∣∣∣∣rm

(7)

where the normalising constant is

z= rm

2wm0(1/rm)(8)

and the density depends upon parametersθm = {µm,

wm, rm}. The location of the distribution is set byµm,its width bywm and the weight of its tails is determinedby rm. Clearly p is Gaussian whenrm = 2, Laplacianwhenrm = 1, and the uniform distribution is approxi-mated in the limitrm→∞.

Rather than learn{µm, wm, rm} along with the ele-ments of the separating matrixW,which magnifies thesize of the search space, they may be calculated fromthe sequences{sm

t } (t = 1, . . . , T) at any, and perhapsevery, stage of learning. The location parameter is wellestimated by the sample mean and the maximum like-lihood estimate forrm andwm may be obtained bysolving aone-dimensional equation [7].

We have used the generalised exponentials in a quasi-Newton (BFGS [11]) ICA algorithm. At each stage ofthe optimisation the parameters{µm, wm, rm} describ-ing the distribution of themth separated variable werefound, permitting the calculation of logL and its gra-dient. This algorithm is able to separate a mixture of aLaplacian source, a Gaussian source and a uniformlydistributed source. Algorithms using a static tanh non-linearity are unable to separate this mixture. Furtherdetails are given in [7].

Page 3: Blind Source Separation for Non-Stationary Mixing

Blind Source Separation 17

Figure 1. Graphical model describing non-stationary BSS.

3. Non-stationary Blind Source Separation

Figure 1 shows the graphical model describing the con-ditional independence relations of the non-stationaryBSS model. In common with static blind source sep-aration, we adopt a generative model in whichM in-dependent sources are linearly mixed at each instant.Unlike static BSS, however, the mixing matrixAt isallowed to vary with time. We also assume that the ob-servationxt is contaminated by normally distributednoisewt ∼ N (0, R). Thus

xt = Atst + wt (9)

The dynamics ofAt are modelled by a first orderMarkov process, in which the elements ofAt diffusefrom one observation time to the next. If we letat =vec(At ) be theN×M-dimensional vector obtained bystacking the columns ofAt , thenat evolves accordingto

at+1 = Fat + vt (10)

wherevt is zero-mean Gaussian noise with covarianceQ, andF is the state transition matrix; in the absence ofa priori information we takeF to be the identity matrix.The state Eq. (10) and the statistics ofvt together definethe densityp(at+1 | at ).

A full specification of the state must include the pa-rameter setθ = {θm},m= 1, . . . ,M , which describesthe independent source densities:

p(s|θ) =M∏

m=1

p(sm |θm) (11)

We model the source densities with generalised expo-nentials, as described in §2.1. Since the sources them-selves are considered to be stationary, the parametersθare taken to be static, but they must be learned as dataare observed.

The problem is now to trackAt and to learnθ asnew observationsxt become available. IfXt denotes thecollection of observations{x1, . . . , xt }, then the goal offiltering methods is to deduce the probability densityfunction of the statep(at | Xt ). This pdf may be foundrecursively in two stages: prediction and correction.If p(at−1 | Xt−1) is known, the state Eq. (10) and theMarkov property thatat depends only onat+1, permitsprediction of the state at timet :

p(at | Xt−1) =∫

p(at | at−1)p(at−1 | Xt−1) dat−1

(12)

The predictive densityp(at | Xt−1)may be regardedas an estimate ofat prior to the observation ofxt . As thedatumxt is observed, the prediction may be correctedvia Bayes’ rule

p(at | Xt ) = Z−1 p(xt | at )p(at | Xt−1) (13)

where the likelihood of the observation given the mix-ing matrix, p(xt | at ), is defined by the observationEq. (9). The normalisation constantZ is known as theinnovations probability:

Z = p(xt | Xt−1)

=∫

p(xt | at )p(at | Xt−1) dat (14)

The prediction (12) and correction/update (13) pairof equations may be used to step through the data on-line, alternately predicting the subsequent state andthen correcting the estimate when a new datum arrives.

3.1. Prediction

Since the state equation is linear and Gaussian the statetransition density is

p(at | at−1) = G(at − Fat−1, Q) (15)

whereG(·, 6) denotes the Gaussian density functionwith mean zero and covariance matrix6.

Page 4: Blind Source Separation for Non-Stationary Mixing

18 Everson and Roberts

We represent the prior densityp(at−1 | Xt−1) as aGaussian:

p(at−1 | Xt−1) = G(at−1− µt−1, 6t−1) (16)

Prediction is then straight-forward:

p(at | Xt−1) = G(at − Fµt−1, Q+ F6t−1FT ) (17)

3.2. Correction

On the observation of a new datumxt the predic-tion (17) can be corrected. Since the observational noiseis assumed to be Gaussian its density is

p(wt ) = G(wt , R) (18)

The pdf of observationsp(xt | At ) is given by

p(xt | At ) =∫

p(xt | At ,θ, st )p(st |θ) dst (19)

and since thesourcesare assumed stationary

p(xt | At ) =∫

p(xt | At , s)p(s|θ) ds

=∫G(xt − Ats, R)

M∏m=1

pm(sm) ds (20)

We emphasise that it is in Eq. (20) that the indepen-dence of the sources is modelled by writing the jointsource density in factored form.

Laplace’s approximation can be used to approximatethe convolution (20) for any fixedAt when the obser-vational noise is small; otherwise the integral can beevaluated by Monte Carlo integration. The correctedpdf p(at | Xt ) of Eq. (13) is then found by drawingsamples,At | Xt from the Gaussian of Eq. (17) andevaluating Eq. (20) for each sample.

The mean and covariance of the correctedp(at | Xt )

are found from the samples and the density approxi-mated once again by a Gaussian before the next pre-diction is made.

Rather than representing the state densities asGaussians at each stage more flexibility may be ob-tained with particle filter techniques [12, 13]. In thesemethods the state density is represented by a collection

of “particles,” each with a probability mass. Each par-ticle’s probability is modified using the state and obser-vation equations, after which a new independent sam-ple is obtained using sampling importance resamplingbefore proceding to the next prediction/observationstep. Though computationally more expensive than theGaussian representation, these methods permit arbi-trary observational noise distributions to be modelledand more complicated, possibly multi-modal, statedensities. The application of particle filter methods tonon-stationary ICA is described elsewhere [14].

3.3. Source Recovery

Rather than making strictly Bayesian estimates of themodel parametersθm = {rm, wm, νm}, the maximuma posteriori(MAP) estimate ofAt is used to estimatest , after which maximum-likelihood estimates of theparameters are found from sequences{sm

τ }tτ=1. Find-ing maximum-likelihood parameters is readily and ro-bustly accomplished [7]. Eachst is found by maximis-ing log p(st | xt , At ), which is equivalent to minimising

(xt − A∗t st )T R−1(xt − A∗t st )+

M∑m=1

∣∣∣∣ smt

wm

∣∣∣∣rm

(21)

whereA∗t is the MAP estimate forAt . The minimisa-tion can be carried out with a pseudo-Newton method,for example. If the noise variance is small,st ≈ A†

t xt ,whereA†

t = (ATt At )

−1ATt is the pseudo-inverse ofAt .

4. Illustration

Here we illustrate the method with two examples.In the first example a Laplacian source (p(s) ∝ e−|s|)

and a source with uniform density are mixed with a mix-ing matrix whose components vary sinusoidally withtime:

At =[

cosωt sinωt−sinωt cosωt

](22)

Note, however, that the oscillation frequency doublesduring the second half of the simulation making it moredifficult to track. Figure 2 shows the true mixing matrixand the tracking of it by non-stationary ICA.

Like BSS for stationary mixing, this method cannotdistinguish between a column ofAt and a scaling ofthe column. In Fig. 2 the algorithm has “latched on” to

Page 5: Blind Source Separation for Non-Stationary Mixing

Blind Source Separation 19

Figure 2. Tracking a mixture of a Laplacian and Gaussian sources.

the negative of the first column ofAt , which is showndashed. We resolve the scaling ambiguity between thevariance of the sources and the scale of the columns ofAt by insisting that the variance of each source is unity;i.e., we ignore the estimated value ofwm (Eq. (7)),instead settingwm = 1 for all m and allowing all thescale information to reside in the columns ofAt .

To provide an initial estimate of the mixing matrixand source parameters static ICA was run on the first100 samples. At timest > 100 the generalised expo-nential parameters were re-estimated every 10 obser-vations. Figure 3 shows that the estimated source pa-rameters converge to close to their correct values of 1for the Laplacian source and “large” for the uniformsource.

Figure 3. Online estimates of the generalised exponential parame-tersrm during the tracking shown in Fig. 2.

Estimates of the tracking error are provided by thecovariance,6t , of the state density (Eq. (16)). In thiscase the trueAt lies within one standard deviation ofthe estimatedAt almost all the time. We remark thatit appears to be more difficult to track the columnsassociated with light-tailed sources than heavy-tailedsources. We note, furthermore, that the Gaussian caseappears to be most difficult. In Fig. 2,A11 andA21 mixthe Laplacian source, and the uniform source is mixedby A12 andA22 which are tracked less well, especiallyduring the second half of the simulation. We suspectthat the difficulty in tracking columns associated withnearly Gaussian sources is due to the ambiguity be-tween a Gaussian source and the observational noisewhich is assumed to be Gaussian.

It is easy to envisage situations in which the mixingmatrix might briefly become singular. For example, ifthe microphones are positioned so that each receivesthe same proportions of each speaker the columns ofAt are linearly dependent andAt is singular. In thissituation At cannot be inverted and source estimates(Eq. (21)) are very poor. To cope with this we monitorthe condition number ofAt ; when it is large, implyingthat At is close to singular, the source estimates arediscarded for the purposes of inferring the source modelparameters,{rm, wm, µm}.

In Fig. 4 we show non-stationary BSS applied toLaplacian and uniform sources mixed with the matrices

At =[

cos 2ωt sinωt−sin 2ωt cosωt

](23)

Figure 4. Tracking through a singularity. The mixing matrix is sin-gular att = 1000.

Page 6: Blind Source Separation for Non-Stationary Mixing

20 Everson and Roberts

whereω is chosen so thatA1000 is singular. Clearlythe mixing matrix is tracked through the singularity,although not so closely as whenAt is well conditioned.Figure 5 shows the condition number of the MAPAt .The normalising constantZ = p(xt | Xt−1) in the pre-diction Eq. (17) is known as the innovations probabilityand measures the degree to which a new datum fits thedynamic model learned by the tracker. Discrete changesof state are signalled by low innovations probability.Figure 5 also shows the innovations probability for themixing shown in Fig. 4: the presence of the singularityis clearly reflected.

Note also that the simulation shown in Fig. 4 wasdeliberately initialised fairly close to, but not exactlyat the trueA1. The “latching on” of the tracker to thecorrect mixing matrix in the first 100 observations isevident in the figure.

Figure 5. Top: Condition number of the MAP estimate ofAt . Att = 1000 the true mixing matrix is singular. Matrices with conditionnumbers greater than 10 were not used for estimating the sourceparameters. Bottom: Innovations probabilityp(xt | Xt−1).

5. Smoothing

The filtering methods presented estimate the mix-ing matrix asp(At | Xt ). They are therefore strictlycausal and can be used for online tracking. If the dataare analysed retrospectively future observations(xτ ,τ > t) may be used to refine the estimate ofAt . TheMarkov structure of the generative model permits thepdf p(at | XT ) to be found from a forward pass throughthe data, followed by a backward sweep in which theinfluence of future observations onat is evaluated. See,for example, [15] for a detailed exposition of forward-backward recursions.

In the forward pass the joint probability

p(at , x1, . . . , xt ) = αt

=∫αt−1 p(at | at−1) p(xt | at ) dat−1 (24)

is recursively evaluated. In the backward sweep thecondtional probability

p(xt+1, . . . , xT | at ) = βt

=∫β t+1 p(at+1 | at )p(xt+1 | at+1) dat+1 (25)

is found. Finally the two are combined to produce asmoothed non-causal estimate of the mixing matrix:

p(at | x1, . . . , xT ) ∝ αtβt (26)

If αt and βt are each approximated by Gaussians itis necessary to save only the means and covariancematrices

Figure 6 illustrates tracking by both smoothing andcausal filtering. As before the elements of the mixingmatrix vary sinusoidally with time except for disconti-nous jumps att = 600 and 1200. Both the filtering andforward-backward recursions track the mixing matrix;however the smoothed estimate is less noisy and moreaccurate, particularly at the discontinuities. Note alsothat the following the discontinuity att = 1200 thenegative of the first columm ofAt is tracked.

6. Temporal Correlations

The graphical model in Fig. 1 assumes that successivesamples from each source are independent, so that thesources are stochastic. When temporal correlations inthe sources are present the model must be modified to

Page 7: Blind Source Separation for Non-Stationary Mixing

Blind Source Separation 21

Figure 6. Top: Retrospective tracking with forward-backard recur-sions. Bottom: Online filtering of the same data. Dashed lines showthe negative of the mixing matrix elements.

include the conditional dependence ofsmt on sm

t−1. Inthis case the hidden state is now comprised ofat and thestates of the sourcesst , and predictions and correctionsfor the full state should be made. Since the sourcesare independent, predictions for the each source andat may be made independently and the system is afactorial hidden Markov model [15].

A number of source predictors have been imple-mented, including the Kalman filter, AR models andGaussian mixture models. However, the fundamentalindeterminacy of the source scales renders the com-bined tracker unstable. The instability arises becausethe change in observation fromxt+1 to xt cannot beunambiguously assigned to either a change in the mix-ing matrix or a change in the sources. Small errors inthe prediction of the sources induce errors in the mix-ing matrix estimates, which in turn lead to errors in

subsequent source predictions; these errors are then in-corporated into the predictive model for the sources andfurther (worse) errors in the prediction are made. Thisproblem is not present in the stochastic case becausethe source model is much more tightly constrained.

Under the assumption that the sources evolve on arapid timescale compared with the mixing matrix, theeffect of temporal correlations in the sources may be re-moved by averaging over a sliding window. That is, thelikelihood p(xt | At )used in the correction step (Eq. 13)is replaced by

{L∏

τ=−L

p(xt+τ | At+τ )

} 12L+1

(27)

Figure 7. Tracking temporally correlated sources. Top: Elementsof the mixing matrix during tracking. Dashed lines show the truemixing matrix elements. Bottom: Recovered sources and the truesources (dashed) for times 1000–1500.

Page 8: Blind Source Separation for Non-Stationary Mixing

22 Everson and Roberts

The length of the window 2L + 1 is chosen to be ofa typical timescale of the sources. Tracking using theaveraged likelihood is computationally expensive be-cause at eacht thep(xt+τ | At+τ )must be evaluated foreachτ in the sliding window. An alternative methodof destroying the source temporal correlations is to re-place the likelihoodp(xt | At )with p(xt+τ | At+τ )withτ chosen at random from within the sliding window(−L ≤ τ ≤ L). This is no more expensive than usingp(xt | At ) and effectively destroys the source correla-tions.

Figure 7 illustrates the tracking of a mixing ma-trix with temporally correlated sources. The windowlength wasL = 50. Tracking is not as accurate as in thestochastic case, however the mixing matrix is followedand the sources are recovered well.

7. Conclusion

We have presented a method for blind source sepa-ration when the mixing proportions are non-stationary.The method is strictly causal and can be used for onlinetracking (or “filtering”). If data are analysed retrospec-tively forward-backward recursions may be used forsmoothing rather than filtering. The mixing of tempo-rally correlated sources may be tracked by averagingor sampling from within a sliding window.

In common with most tracking methods, the statenoise covarianceQ and the observational noise covari-anceRare parameters which must be set. Although wehave not addressed the issue here, it is straight-forward,though laborious, to obtain maximum-likelihood esti-mates for them using the EM method [15]. It wouldalso be possible to estimate the state mixing matrixFin the same manner.

Although we have modelled the source densitieshere with generalised exponentials, which permits theseparation of a wide range of sources, it is possibleto both generalise or restrict the source model. Morecomplicated (possibly multi-modal) densities may berepresented by a mixture of Gaussians. On the otherhand, if all the sources are restricted to be Gaussianthe method becomes a tracking factor analyser. In thezero noise limit the method performs non-stationaryprincipal component analysis.

Acknowledgment

We gratefully acknowledge partial funding fromBritish Aerospace plc.

References

1. T-W. Lee, M. Girolami, A.J. Bell, and T.J. Sejnowski, “AUnifying Information-theoretic Framework for IndependentComponent Analysis,”International Journal on Mathemat-ical and Computer Models, vol. 39, no. 11, 2000, pp. 1–21. Available from http://www.cnl.salk.edu/∼tewon/Public/mcm.ps.gz.

2. A.J. Bell and T.J. Sejnowski, “An Information Maximiza-tion Approach to Blind Separation and Blind Deconvolu-tion,” Neural Computation, vol. 7, no. 6, 1995, pp. 1129–1159.

3. D.J.C. MacKay, “Maximum Likelihood and Covariant Algo-rithms for Independent Component Analysis,” Technical re-port, University of Cambridge, December 1996. Available fromhttp://wol.ra.phy.cam.ac.uk/mackay/.

4. J-F. Cardoso, “Infomax and Maximum Likelihood for Blind Sep-aration,” IEEE Sig. Proc. Letters, vol. 4, no. 4, 1997, pp. 112–114.

5. B. Pearlmutter and L. Parra, “A Context-Sensitive Generaliza-tion of ICA,” in International Conference on Neural InformationProcessing, 1996.

6. S. Amari, A. Cichocki, and H. Yang, “A New Learning Algo-rithm For Blind Signal Separation,” inAdvances in Neural In-formation Processing Systems, vol. 8, D. Touretzky, M. Mozer,and M. Hasselmo, (Eds.), Cambridge MA: MIT Press, 1996,pp. 757–763.

7. R.M. Everson and S.J. Roberts, ICA: A flexible non-linearity anddecorrelating manifold approach.Neural Computation, vol. 11,no. 8, 1999, pp. 1957–1983. Available fromhttp://www.dcs.ex.ac.uk/academics/reverson.

8. J-F. Cardoso, “On the Stability of Source Separation Al-gorithms,” in Neural Networks for Signal processing VIII,T. Constantinides, S.-Y. Kung, M. Nifanjan, and E. Wilson,(Eds.), IEEE Signal Processing Society, IEEE, 1998, pp. 13–22.

9. H. Attias, “Independent Factor Analysis,”Neural Computation,vol. 11, no. 5, 1999, pp. 803–852.

10. T-W. Lee, M. Girolami, and T.J. Sejnowski, Independent Com-ponent Analysis using an Extended Infomax Algorithm forMixed Sub-Gaussian and Super-Gaussian Sources.Neural Com-putation, vol. 11, 1999, pp. 417–441.

11. W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery.Numerical Recipes in C. 2 edition, Cambridge: Cambridge Uni-versity Press, 1992.

12. N. Gordon, D. Salmond, and A.F.M. Smith, Novel approachto nonlinear/non-Gaussian bayesian states estimation.IEEProceedings-F, vol. 140, 1993, pp. 107–113.

13. M. Isard and A. Blake, “Contour Tracking by Stochas-tic Density Propagation of Conditional Density,” inProc.European Conf. Computer Vision, Cambridge, UK, 1996, pp.343–356.

14. R.M. Everson and S.J. Roberts, “Particle Filters for Non-Stationary Independent Components Analysis,” TechnicalReport TR99-6, Imperial College, 1999. Available fromhttp://www.dcs.ex.ac.uk/academics/reversion.

15. Z. Ghahramani, “Learning Dynamic bayesian Networks,” inAdaptive Processing of Temporal Information, C.L. Giles andM. Gori, (eds.), Springer-Verlag, 1999, Lecture Notes in Artifi-cial Intelligence.

Page 9: Blind Source Separation for Non-Stationary Mixing

Blind Source Separation 23

Richard Everson graduated with a degree in physics fromCambridge University in 1983. He worked at Brown and Yale Uni-versities on fluid mechanics and data analysis problems until movingto Rockefeller University, New York, to work on optical imaging andmodelling of the visual cortex. After working at Imperial College,University of London, he moved to the Department of Computer Sci-ence at Exeter University in 1999. Current research interests includedata analysis and pattern recognition; quantitative analysis of brainfunction, particularly from cortical optical imaging data and EEG;modelling of cortical architecture; Bayesian methods, and signal andimage [email protected]

Stephen Robertsgraduated from Oxford University with a degree inphysics in 1987. After working in industry he returned to Oxford andobtained his DPhil in 1991. He was lecturer in Engineering Scienceat St. Hugh’s College, Oxford, prior to his appointment as lecturer inthe Department of Electrical & Electronic Engineering at ImperialCollege, University of London, in 1994. In 1999 he was appointeda University Lecturer in Information Engineering at Oxford. Hisresearch interests include data analysis, information theory, neuralnetworks, scale space methods, Bayesian methods, image and signalprocessing, machine learning and artificial [email protected]