an introduction to a new data analysis tool: independent component

16
An introduction to a new data analysis tool: Independent Component Analysis Andreas Jung Regensburg, March 18 th 2002 Abstract A common problem encountered in data analysis and signal processing, is finding a suitable representation of multivariate data. For computational and conceptual simplicity, often these representations are sought as a linear transformation of the original data. Well known linear transformation are for example the principal component analysis or projection pursuit. A re- cently new developed nonlinear method is the independent component analysis (ICA), in which the components of the desired representation have minimal stochastical dependence. Such a representation seems to capture the essential structure of the data in many applications. In this paper, we will focus on the theory and methods of ICA in contrast to classical trans- formations, as well as the applications of this method to biomedical data as for example electro- encephalography (EEG). For an illustration of the algorithm, we will also visualized the unmix- process with a set of images. Finally we will give an outlook to the possible future developments of ICA. Main aspects of my future research will be: using time structure information from the data to enhance the convergence of the algorithm; determine the meaningfulness of the independent components and treating non-stationary data as most biomedical systems are in a non-equilibrium. 1 Introduction A central problem in data analysis, statistics and signal processing, is finding a suitable representation of the multivariate data, by means of a suitable transformation. It is important for subsequent analysis of the data, whether it is pattern recognition, de-noising, visualization or anything else, that the data is represented in a manner that facilitates the analysis. Especially when biomedical data is analyzed, the representation of the data for the analysis by the physicians must be as clear as possible and should present only the essential structures hidden in the data. Since in real-world problems only continuous-valued parameters are measured, we will concen- trate us in this paper only to continuous-valued multidimensional variables. Let us denote by x = (x 1 ,x 2 ,...,x m ) T R m a m-dimensional random variable; the problem is to find a transformation τ : x y so that the n-dimensional transform y R n defined by y = f(x), f : R m R n (1) has some desirable properties. (Please note that we will use throughout this paper the same notation for the random variables and their realizations, the context should make the distinction clear). Often a linear transformation is used to represent the observed variables, i.e., y = Wx (2) where W is a (n × m)-matrix which has to be determined. Using linear transformations makes the problem computational and conceptually simpler, and facilitates the interpretation of the results. Sev- eral principles and methods have been developed to find a suitable representation, principal component 1

Upload: others

Post on 11-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

An introduction to a new data analysis tool:

Independent Component Analysis

Andreas Jung

Regensburg, March 18th 2002

Abstract

A common problem encountered in data analysis and signal processing, is finding a suitablerepresentation of multivariate data. For computational and conceptual simplicity, often theserepresentations are sought as a linear transformation of the original data. Well known lineartransformation are for example the principal component analysis or projection pursuit. A re-cently new developed nonlinear method is the independent component analysis (ICA), in whichthe components of the desired representation have minimal stochastical dependence. Such arepresentation seems to capture the essential structure of the data in many applications.

In this paper, we will focus on the theory and methods of ICA in contrast to classical trans-formations, as well as the applications of this method to biomedical data as for example electro-encephalography (EEG). For an illustration of the algorithm, we will also visualized the unmix-process with a set of images.

Finally we will give an outlook to the possible future developments of ICA. Main aspectsof my future research will be: using time structure information from the data to enhance theconvergence of the algorithm; determine the meaningfulness of the independent components andtreating non-stationary data as most biomedical systems are in a non-equilibrium.

1 Introduction

A central problem in data analysis, statistics and signal processing, is finding a suitable representationof the multivariate data, by means of a suitable transformation. It is important for subsequent analysisof the data, whether it is pattern recognition, de-noising, visualization or anything else, that the datais represented in a manner that facilitates the analysis. Especially when biomedical data is analyzed,the representation of the data for the analysis by the physicians must be as clear as possible andshould present only the essential structures hidden in the data.

Since in real-world problems only continuous-valued parameters are measured, we will concen-trate us in this paper only to continuous-valued multidimensional variables. Let us denote by x =(x1, x2, . . . , xm)T ∈ Rm a m-dimensional random variable; the problem is to find a transformationτ : x → y so that the n-dimensional transform y ∈ Rn defined by

y = f(x), f : Rm → Rn (1)

has some desirable properties. (Please note that we will use throughout this paper the same notationfor the random variables and their realizations, the context should make the distinction clear). Oftena linear transformation is used to represent the observed variables, i.e.,

y = Wx (2)

where W is a (n ×m)-matrix which has to be determined. Using linear transformations makes theproblem computational and conceptually simpler, and facilitates the interpretation of the results. Sev-eral principles and methods have been developed to find a suitable representation, principal component

1

analysis is just one among them. These methods define a principle that tells which transformation isoptimal, in the sense of optimality of dimension reduction, ”interestingness” of the resulting compo-nents, simplicity of the transformation-matrix W or any other application oriented one.

Recently a new method has gained wide spread attention: the independent component analysis(ICA). As the name implies, the basic goal is to find a transformation, so that the resulting componentsyi are stochastical as independent from each other as possible. One typical application of this methodis the blind source separation (BSS) problem, in which the observed variables x correspond to arealization of a m-dimensional discrete time signal x(t), t = 1, 2, ..., as illustrated in figure 1.

Amplitude

Time

Figure 1: An illustration of a blind source separation (BBS) problem. Due to some circumstances,only a mixture of some underlying source signals can be observed. The goal is to determine thesesources signals. This problem can be solved (under some restrictions) by the independent componentanalysis (ICA).

These observed signals originate from a (non-)linear mixture of some underlying source signalssi(t). Often one can assume that the source signals are stochastical independent from each other, andthus the independent component analysis can find a transformation, so that the transformed signalsyi(t) correspond to the original signals si(t). Such a recovery of the original sources is demonstrate infigure 2, where on the left hand side the original source signals are plotted and on the right hand sidethe by the independent component analysis recovered signals. One can clearly see, that the recoveredsignals are nearly identical to the original sources, except of a permutation and a scaling factor (sign),which can’t be determined by this method.

Amplitude

Time

Amplitude

Time

Figure 2: Comparison between the original sources signals (right) used to construct the mixture shownin figure 1 and the recovered signals (left) from the mixture by using the independent componentanalysis. The unmixed signals are very close to the original signals, except of a scaling-factor andpermutation.

2

Another prominent example of a BSS-problem is the cocktail party problem. A mixture of si-multaneous speech signals are recorded by several microphones. Goal is to separate these originalspeech signals, which correspond to the voices of the speakers. Since the speech signals from differentspeakers are stochastical independent, ICA can separate them with high quality, if at least as manymicrophones are placed in the room as speakers. Possible other applications having multidimensionaldata-sets, where ICA could help revealing the interesting structures, can be high-dimensional timeseries of any kind, from biomedical applications, financial markets or any other scientific experiment,as for example multi-satellite missions.

The technique of ICA was for the first time introduced in early 1980s in the context of neu-ral network modelling. In mid-1990s some highly successful new algorithms were introduced byseveral research groups ([Bell and Sejnowski, 1995], [Lee et al., 1999], [Hyvarinen and Oja, 1997] and[Hyvarinen, 1999a]), together with impressive demonstrations on problems like the cocktail partyproblem, but also on real-world problems like biomedical signal processing or separation of audiosignals in telecommunication.

In this paper, we will only give a short overview about the theory and methods of the independentcomponent analysis, for a more detailed survey we refer to the article [Hyvarinen, 1999b], which wasthe basis of this paper. A complete coverage of ICA and many more references can be found in thebook [Hyvarinen et al., 2001].

2 Classical Transformation

Several principles have been developed in statistics and signal processing to find a suitable linearrepresentation of some observed data. In this section we will discuss classical methods for determiningthe linear transformation as in (2). Two well known classical linear transformations are the principalcomponent analysis, as a second order, and projection pursuit, as a higher order method. They willbe discuss in the following. For simplicity, we assume all variables as centered, which means that themean is subtracted, so the transformed variable can be written as x = x0 − E{x0}, where x0 is theoriginal non-centered variable and E{·} denotes the expectation.

2.1 Second-order methods - Principal ComponentAnalysis

One of the most popular methods for finding a linear transformation of some observed data, are secondorder methods. These methods only use information contained in the covariance matrix of the datavector x and of course the mean of the data, but since the data can always easily be centered, thiscan be neglected.

The use of second-order techniques can be understood in the context of the classical assumption ofGaussianity. The distribution of a variable x, which is distributed normal or Gaussian, is completelydetermined by second-order information. Thus it is unnecessary to include any other information, forexample from higher moments. This makes second-order methods very robust and computationallysimple, since only classical matrix manipulations are used.

One might roughly characterize the second-order methods as one, which tries to find a faithful rep-resentation of the data, in the sense of the mean square error of the reconstruction. In contrast, mosthigher-order methods try to find a meaningful representation, which is of course a task-dependentproperty. But these methods seem to capture a meaningful representation in a wide variety of appli-cations.

The most widely used second-order technique is the principal component analysis (PCA). Thebasic idea is to find components s1, s2, ..., sn, that explain the maximum amount of variance possibleby n linearly transformed components. A good intuitive way to explain PCA is a recursive formulationof the method: In the first step, one looks for the projection on the direction in which the variance ofthe projection is maximized. Let us denote the direction by w1, then

w1 = arg max||w||=1

E{(wT x)2}. (3)

3

Once this direction or first principal component (s1 = wT1 x) is found, only orthogonal directions are

allowed for the next principal component. This iterative process will be continued until all n principalcomponents are found. In practice, the computation of the wi can be simply accomplished using thecovariance matrix E{xxT } = C. The wi are the eigenvectors of C that correspond to the n largesteigenvalues of C.

The basic goal in PCA is the optimal reduction of dimension of the data. Indeed, it can be proventhat the representation given by PCA is an optimal linear dimension reduction technique in the senseof the mean-square error. A simple illustration of PCA can be found in figure 3, in which the the firstcomponent is shown of a two dimensional data set.

Figure 3: A principal component analysis (PCA) of a two dimensional data set (x1, x2), where theline shows the direction of the first principal component. This gives an optimal (in the sense ofmean-square) dimension reduction from 2 to 1 dimensions.

However such a transformation must not always produce the best result in the sense of recoveringmixed sources from the blind source separation problem. As an example, we will show the resultof a PCA transformation of the mixed signals from figure 1. One can clearly see in figure 4, thatthe original signals were not recovered, since the method only searches for directions with maximumvariance.

Amplitude

Time

Figure 4: The result of a principal component analysis (PCA) for the blind source separation problem,shown in figure 1, would yield a wrong reconstruction of the original sources.

Therefore higher-order methods are necessary to find more meaningful transformations, so thatthe representations reveal the wanted information.

2.2 Higher-order methods - Projection pursuit

Higher-order methods use information of the distribution of x that is not contained in the covariancematrix. In order for this to be meaningful, the variables x are assumed to be non-Gaussian, since

4

the information about the distribution of a (zero-mean) Gaussian variable is fully contained in thecovariance matrix.

A technique developed in statistics for finding ”interesting” projections of multidimensional datais the so called projection pursuit. Such projections can be used for optimal visualization of someclustering structures in the data. The reduction of dimension is also an important objective, sincethe aim of this transformation is to visualized only interesting structures normally in two or threedimensions.

Figure 5: To illustrate the problems of variance based methods, one often shows this classical example.A two dimensional data set (x1, x2) is clearly separated into two clusters. The first principal component- direction of maximum variance - would be vertical, so the projection on it would not separate theclusters. In contrast a projection pursuit method would yield a horizontal direction and so clearlyseparate the clusters.

An illustrative example for projection pursuit is shown in figure 5. The classical second-ordermethod would yield an uninteresting and therefore wrong projection, but the projection pursuitmethod would find the, for the visualization, interesting direction, which is the horizontal one. Itwas stated, that only the non-Gaussian distributions are the most interesting ones. Therefore higher-order statistics must be analyzed and used for these methods. The independent component analysis(ICA) goes one step further and uses all higher moments of the distribution.

3 Independent component analysis

The Independent Component Analysis is a method, in which the components of the new representationhave minimal stochastical dependence. For the definition of stochastical independents, let us first recallsome basic definitions needed. Let us denote y1, y2, ..., ym some random variables (for simplicity withzero mean) with the joint probability density f(y1, ..., ym). The variables are stochastical independent,if the density function can be factorized [Papoulis, 1991]

f(y1, ..., ym) = f1(y1)f2(y2)...fm(ym) (4)

where fi(yi) denotes the marginal probability density of yi. Stochastical independents must bedistinguished from uncorrelatedness, which means that

E{yiyj} − E{yi}E{yj} = 0, for i 6= j. (5)

In general, stochastical independents is a much stronger requirement than uncorrelatedness. Forindependents of the yi it must hold

E{g1(yi)g2(yj)} − E{g1(yi)}E{g2(yj)} = 0, for i 6= j, (6)

5

for any (measurable) functions g1 and g2 [Papoulis, 1991]. This is a much stricter condition thanthe uncorrelatedness.

An intuitive example of independent component analysis can be given by a scatter-plot of twosignals x1, x2. Every point in a scatter-plot is given by the pair (x1, x2). One can see the ”cross-like”structure of the plot shown in figure 6, which corresponds to the (factorized) joint density distributionof two independent signals x1 and x2.

0

0

x1

x 2 e1

e2

Figure 6: A scatter-plot of two independent signals (x1,x2). The arrows represent the unit vectorse1,e2.

Assuming, for simplicity, a linear mixture of these signals, one would get a transformed scatter-plotas shown in figure 7. The procedure of recovering the original (independent) sources, as in the blindsource separation problem, can now be illustrated.

0

0

x1

x 2

u1

u2

0

0

x1

x 2

a1

a2

Figure 7: Comparison between PCA and ICA: A mixture of the two independent signals (see figure 6)is shown in a scatter-plot. The vectors represent the unmixing-vectors for the PCA- (left) and theICA-solution (right). It is obvious that the ICA-solution recovers the transformed unit vectors a1,a2,while the PCA only finds the orthogonal vectors u1,u2.

The vectors in the plots represent the unmixing-vectors for the PCA- (right plot) and the ICA-solution (left plot). Obviously the principal component analysis doesn’t find the correct sources, butonly finds orthogonal directions, where u1 (the first principal component) represents the direction ofmaximum variance. So PCA can only recover a rotation, but not an arbitrary linear mixing. ICAin contrast is able to find the transformed unit vectors a1, a2. Theses directions correspond to theoriginal sources we were looking for.

6

3.1 Definition of independent component analysis

In this section, we will define the problem of independent component analysis. As stated in theintroduction, we are looking for a transformation τ : x → y so that the n-dimensional transformy ∈ Rn defined by y = f(x), f : Rm → Rn has minimum mutual stochastical dependence.

Although the general case for any nonlinear function f(x) can be formulated, we shall only considerthe linear case, which has proven to be a task difficult enough. Furthermore only for the linear casethe identifiability of ICA can be shown.

The first and most general definition is as follows:

Definition 1. (General definition) ICA of the random vector x consists of finding a linear transformy = Wx so that the components yi are as independent as possible, in the sense of maximizing somefunction F (y1, ..., ym) that measures stochastical independence.

This is the most general definition, since no assumptions are made on the data or the model, whichgenerated the data. Also the function for measuring the stochastical independence must be defined.This will be done in a section later on, using information theoretical considerations. Furhtermore, onecannot assume to find strictly independent components, since a linear transformation, in general,isnot able to generate independent components. A different approach is taken by the following moreestimation-theoretically oriented definition:

Definition 2. (Noisy ICA model) ICA of a random vector x consists of estimating the followinggenerative model for the data:

x = As + n (7)

where the latent variables (components) si in the vector s = (s1, ..., sn)T are assumed to be independent.The Matrix A is a constant m× n ’mixing’ matrix and n a m-dimensional random noise vector.

This definition reduces the ICA problem to an ordinary estimation of latent variables. However,the problem of estimating an additional noise-term makes the problem complex, therefore the greatmajority of ICA research neglects this term and the following more simple definition can be formulated:

Definition 3. (Noise-free ICA model) ICA of a random vectors x consists of estimating the followinggenerative model for the data:

x = As (8)

where A and s are as in Definition 2.

In the following, we will only concentrate us on the noise-free model, which often can be consideredas a good approximation. The justification for this approximations is that methods using the simplermodel are more robust, especially in real world problems.

3.2 Identifiability of the ICA model

The identifiability of the noise-free ICA model has been treated in [Comon, 1994]. By imposing thefollowing fundamental restrictions, the identifiability of the model can be assured:

• stochastical independence of the components s = (s1, ..., sn),

• all components si must be non-gaussian (with possible exception of one component),

• number of observed signals m must be at least as large as the independent components n, i.e.m ≥ n,

• matrix A must be of full column rank.

Usually, one also assumes that x and s are centered, which is in practice no restriction, since themean of the random variable x can always be subtracted. Note, if x and s are a result of a stochasticprocess, then they have to be strictly stationary!

7

A basic indeterminacy in the model is that the independent components and the columns of themixing matrix A can only be estimated up to a scaling factor, since any constant factor multipliedwith the independent components could be cancelled by dividing the columns of the mixing matrixwith the same factor. Another indeterminacy is the ordering of the independents components. Incontrast to PCA, where the components are ordered by their variances, ICA can not determine theoriginal ordering of s, since any permutation is a solution of the ICA model. However a new orderingcould be introduced by using a measure of non-Gaussianity or the norm of the columns of the mixingmatrix.

3.3 ”The” ICA-Method

Since we have formulated in the previous sections the model and the identifiability of the independentcomponent analysis, we will discuss in this section, how the actual ICA-method looks like.

The estimation of the data represented by the model is usually preformed by formulating anobjective function and optimizing, either minimizing or maximizing, it. This objective function isoften called contrast or cost function and acts as an index, how well the estimated data is representedby the model and how independent the estimated sources are. We can formulate the method as

ICA-Method = Contrastfunction + Optimisation Algorithm (9)

For measuring the independence of the sources, one usually uses a contrast function based oninformation theoretical considerations – this will be discussed in the next paragraph – but alsoother properties can be used, as for example geometric considerations [C.G.Puntonet et al., 1995],[Puntonet and Prieto, 1998], [Jung et al., 2001], [Theis et al., 2001]. For optimizing the function anoptimization algorithm is necessary, many algorithms haven been developed and some of them will bestated in the following.

Information theory Shannon has introduced in the mid 20th century as the first one working oninformation theory, a measure for the information and the bandwidth of channels. He has defined theinformation as a measure, depending on the probability density f(x) of the signal x:

I(x) = log (f(x)) . (10)

The average of the information is called the ”Shannon Entropy”:

E{I(x)} =∑

x

f(x) log (f(x)) (11)

Derived from this entropy, one can define a so called distance measure called ”Kullbach-Leibler”-Divergence between two distributions f(x) and f(y).

K(f(x), f(y)) =∑x,y

f(x) logf(x)f(y)

(12)

The ”Kulllbach-Leibler”-Divergence is only equal to zero, if and only if f(x) and f(y) are equal.Using this fact, a measure for the independence can be formulated, the mutual information:

M(x, y) = K(f(x, y), f(x)f(y))

=∑x,y

f(x, y) logf(x, y)

f(x)f(y)(13)

Two signals or random variables are independent if their joint probability density factorizes intotheir marginal densities f(x, y) = f(x)f(y). Since the Kulbach-Leibler-Divergence is only equal to zeroif their arguments are equal, the mutual information is only zero, if and only if the joint probabilitydensity of x and y factorizes into their marginal densities. These definitions are easily extendable tohigher dimensions and continuous random variables.

8

But one should realize, that such a contrast-function as the mutual information is a highly nonlinearfunction with many local minima. Optimizing such a function is a difficult task, the optimizationalgorithm must be choosen carefully and there is no guaranty for finding the global minimum. In thenext paragraph, we will mention some of the possible optimization algorithms one could use for thisproblem.

Optimization algorithm Algorithmically minimizing a contrast-function can be done with severalmethods, the first and intuitive one is the gradient descent method. To find a minimum, one starts atan arbitrary starting-point and descents along the gradient of the contrast-function until a minimumis reached. An acceleration can be achieved by using the so called conjugate gradient descent method,especially suggestive when the minimum lies in a narrow valley. Unfortunaly this methods mustnot necessarily direct you into the global minimum, since once the algorithm has reached a (local)minimum, the algorithm is stuck in this valley and can’t ”jump” over the surrounding hills. A solutioncan be the combination of gradient descent methods with other approaches like simulated annealingor genetic algorithms.

3.4 A visualized example of unmixing images

To give a better impression, how the ICA algorithm works, we will demonstrated it by visualizing thedemixing process of three mixed black and white images.

Figure 8: The original sources/images exists of 250x250=62500 pixels with a resolution of 256 grayvalues. The first images is a photo taken from an Airbus A300 Zero-g, the second is an artificial imagewith the black text ”ICA do it!” and the last images contains uniform random noise with gray valuesbetween 0 and 255.

The original images shown in figure 8 consists of 250 by 250 = 62500 pixels with a resolution of256 gray values, let us denote them as s, where s is matrix with 3 rows, each row corresponds to oneblack-and-white image, and 62500 columns. We have mixed these images, by using the mixing matrix

A :=

0.5 −0.2 0.20.1 0.4 −0.30.6 0.3 0.8

(14)

so that the mixed signals x are computed by x = As. Each row of x represents one mixed image, allthree images are shown in figure 9.

9

Figure 9: Resulting signals/images after mixing the images from figure 8 with the mixing-matrix Aas described in the text. These images were presented to the FastICA-Algorithm.

When representing these mixed images to an ICA-algorithm, the algorithm tries to unmix themby using only the property of independents of the original images/sources. All algorithms workiteratively, so we have the possibility to show the development of the unmixing process, as we havedone in figure 10. In every step a new estimated mixing matrix A is calculated and by using themutual information M(y) as a measure of independents, where y = A

−1x are the unmixed signals,

we can optimize the mixing matrix.

Figure 10: Visualization of the unmixing-process of the FastICA-Algorithm. Every set of imagescorrespond to one iteration step. It is obvious that the algorithm converges to the original images,except of scaling and permutation. Note that the third iteration seems to be a better solution thanthe last, details for this phenomena are given in the text.

10

One can clearly see the unmixing of the images towards the three original images. Except of scaling(sign) and permutation, which can’t be reconstructed by the algorithm, the unmixed images are veryclose to the original ones. But one can also notice, that the images from the third iteration seem tobe closer to the original images, than the final unmixed ones. Especially the Airbus image has a smallfraction of the text image (”ICA do it!”) overlayed. The result of the ICA-Algorithm is the followingestimated mixing matrix A – for comparison the original mixing matrix is also given.

A :=

0.2000 0.2313 0.4997−0.2997 −0.3923 0.09810.8004 −0.2648 0.5953

, A :=

0.5 −0.2 0.20.1 0.4 −0.30.6 0.3 0.8

. (15)

The phenomena of not perfectly unmixing is easily explained and points out nicely the problemswith the independent component analysis. As fundamental restriction for the identifiability of theICA method, one assumes to have independent sources. A visual inspection of the images in figure 8suggest to have independent signals/images, since having knowledge about one image won’t give youany information about the other two images. But when the correlation matrix is calculated, it shows,that the off-diagonal elements are not zero

correlation(s, s) =

1.0000 −0.0700 −0.0021−0.0700 1.0000 0.0049−0.0021 0.0049 1.0000

(16)

and therefore they are not independent! The restriction of having independent signals as sources is avery strict one and can’t always be assumed. When using images, one would have to take a naturalimage and not a photo of ”man-made” object, since these images have more often horizontal andvertical edges, which give a correlation with the text image.

The ICA-algorithm used here correctly converges to a solution, where the unmixed signals are inde-pendent and their correlation matrix is an identity matrix, but the expected result of having unmixedthe original images can’t be fulfilled. Therefore one should always keep in mind the fundamentalrequirements of the ICA method and realize what one expects from such an method!

4 Applications of ICA

In many signal processing areas, ICA helps to process high dimensional multivariate data. Not only inaudio processing, where speech recognition is an interesting application, but also in image processingof for example satellite data, this method can help to explore the data.

But one major application of ICA is the analysis of bio-medical data. Often one expects thatthe observed data from biological systems is a superposition of some underlying unknown sources.Exactly this separation of independent sources is possible with ICA. Some typical examples are fMRI(functional Magnetic Resonance Imaging), where active regions in the brain are detected, while apatient is working on a predefined problem, or EEG (Electro-Encephalography), where the aim is todetect abnormal neural behavior of the brain, originated from a possible unknown brain tumor.

In the following, we will give an example of an EEG analysis and how it can improve the analysisby the physicians.

4.1 Electro-Encephalography (EEG)

In the Graduiertenkolleg ”Nonlinearity and Nonequilibrium in condensed matter” we are working to-gether with the group of Prof. Brawanski at the University Hospital in Regensburg and in cooperationwith the group of Prof. Lang, where detailed experience with analysis of EEG data with ICA methodshas been gathered in the last years. Acknowledgements go to Dr. Schulmeyer, who was the contactperson for the EEG data at the University Hospital.

A sketch of an analysis of a EEG from a patient without any abnormal neural behavior we will beshown in the following, using this new nonlinear data analysis method.

Electro-Encephalography (EEG) is a method, where electrical potentials are measured with elec-trodes on the surface of the head. Goal is to get a better understanding of the processes in the human

11

brain, to diagnose brain disease or monitor the depth of anaesthesia. One main problem is the su-perposition of the signals of the brain it self and the artifacts like eye-blinks, head movements or theheartbeat. Before any further analysis is possible, one has to extract and separate these signals by adata analysis tool.

To give a better impression of the signals measured during an EEG, we have plotted in figure 11the signals from every electrode over a time period of 7 seconds.

61 62 63 64 65 66 67 68

A2

A1

Pz

Cz

Fz

T6

T5

T4

T3

F8

F7

O2

O1

P4

P3

C4

C3

F4

F3

Fp2

Fp1

Time [s]

EE

G-C

hann

el

Figure 11: An electro-encephalography (EEG) measurement of a patient recorded at the UniversityHospital Regensburg. The EEG-channels are labelled by a shortcut which correspond to a givenelectrode on the head of the patient. The plotted lines show the evolution of the electric potentials ateach electrode over the time. One clear artefact (an eye-blink) is visible in the ”Fp2” channel, furtherthe alpha-waves of the brain (8Hz) are nearly visible in all the channels.

Obviously one can see the mixture of many different signals: the alpha-waves of the brain havinga main frequency of 8Hz is nearly visible in all channels, in contrast the artefact of an eye-blink isonly present in channel ”Fp2” – this electrode is placed close to the right eye. These signals can nowbe separated by the independent component analysis, if one assumes, that the signals we are lookingfor are independent, i.e. mathematical spoken their probability densities factorize in to their marginal

12

probability densities.In figure 12 the independent components respectively the independent signals are plot for the same

time period as in figure 11. Note, that the independent components are not sorted by any criteria, thisis in contrast to the PCA, where the signals are sorted by the variances of the signals. As describedin the caption, some signals can easily be identified by its characteristic waveform.

61 62 63 64 65 66 67 68

21

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

Time [s]

Inde

pend

ent C

ompo

nent

Figure 12: Plotted are the independent components (ICs) of an ICA of the EEG from figure 11. Onecan clearly identify the eye-blink in IC #1, the heartbeat (ecg) is separated into IC #5 and a mainalpha-wave is visible in IC #9. Furthermore the IC #19 could be identified as noise due to fastmuscle-movement and IC #20 seems to be the pulse from the heartbeat. All ICs can be localized byusing the information from the mixing, as shown in figure 13.

But not only the waveform can give information about the extracted signals, also the mixing matrixcontains information about the origin and location of the signals. Each column of the mixing matrixholds the information about how the source-signals were distributed on to the electrodes. This cannow be projected on to the head with a density plot, using the recovered mixing matrix (see figure 13).

Now we can see, that the independent component #8 has no physical meaning, since the distri-

13

bution over the electrodes is just random and no certain area can be distinguished. Therefore thosedensity plots can further help the physicians to characterize and identify the signals. Still it is not jetpossible to rank the independent components by a confidence level or reliability or meaningfulness ofthe components.

Fp1(1) Fp2(2)

F3(3) F4(4)

C3(5) C4(6)

P3(7) P4(8)

O1(9) O2(10)

F7(11) F8(12)

T3(13) T4(14)

T5(15) T6(16)

Fz(17)

Cz(18)

Pz(19)

A1(20) A2(21)

+

0

-

Mixingvector #1

Fp1(1) Fp2(2)

F3(3) F4(4)

C3(5) C4(6)

P3(7) P4(8)

O1(9) O2(10)

F7(11) F8(12)

T3(13) T4(14)

T5(15) T6(16)

Fz(17)

Cz(18)

Pz(19)

A1(20) A2(21)

+

0

)

Mixingvector #5

Fp1(1) Fp2(2)

F3(3) F4(4)

C3(5) C4(6)

P3(7) P4(8)

O1(9) O2(10)

F7(11) F8(12)

T3(13) T4(14)

T5(15) T6(16)

Fz(17)

Cz(18)

Pz(19)

A1(20) A2(21)

+

0

)

Mixingvector #9

Fp1(1) Fp2(2)

F3(3) F4(4)

C3(5) C4(6)

P3(7) P4(8)

O1(9) O2(10)

F7(11) F8(12)

T3(13) T4(14)

T5(15) T6(16)

Fz(17)

Cz(18)

Pz(19)

A1(20) A2(21)

+

0

)

Mixingvector #8

Fp1(1) Fp2(2)

F3(3) F4(4)

C3(5) C4(6)

P3(7) P4(8)

O1(9) O2(10)

F7(11) F8(12)

T3(13) T4(14)

T5(15) T6(16)

Fz(17)

Cz(18)

Pz(19)

A1(20) A2(21)

+

0

)

Mixingvector #19

Fp1(1) Fp2(2)

F3(3) F4(4)

C3(5) C4(6)

P3(7) P4(8)

O1(9) O2(10)

F7(11) F8(12)

T3(13) T4(14)

T5(15) T6(16)

Fz(17)

Cz(18)

Pz(19)

A1(20) A2(21)

+

0

)

Mixingvector #20

Figure 13: Using the information from the recovered mixing-matrix, the origin of the independentcomponents (ICs) can be localized and plotted using a density-plot. Mixing-vector #1 corresponds toan eye-blink, #5 to the heartbeat (ecg) recorded mostly at the ears, #9 to the ground activity of thebrain, #19 to muscle-noise near the left temple and #20 to the heartbeat pulse near he right temple.Note that mixing-vector #8 seems to be an artifact of the ICA-Algorithm, the density-plot gives nophysical interpretation.

4.2 Neuro-monitoring data

Another biomedical application is the analysis of neuro monitoring data from patients on the intensive-care unit. Goal is to identify brain diseases, by a better understanding which parameter and processesthe system trigger and which influence they have on the brain. Furthermore it would be of greatinterested to understand, which measurement categories have the highest information contents andhow the measured signals are coupled. This seemed to be a good application for the independentcomponent analysis.

Dr. Rupert Faltermeier in the group of Prof. Brawanski is providing the data from patients treatedon the intensive-care unit at the neurosurgery. First analysis have shown a high non-stationarityof the data, which makes the analysis with the independent component analysis very difficult, sincestationarity of the data is one basic requirement. Furthermore the time series are to short for nonlineartime series analysis, therefore one should better develop a model of the brain to fit their parametersand analyze the behavior of such a system.

5 Summary and Outlook

The Independent Component Analysis is a new data analysis tool for processing multivariate data sets.In contrast to classical (linear) transformation, where only second order statistics is used, the ICAmethod tries to find a representation in which the transformed components have minimal stochastical

14

dependence. Such a representation seems to capture the essential structure of the data in manyapplications.

Especially in bio-medical applications, where only little is know about the origin of the signals,ICA extracts the main features interesting for the analysis, as shown for example for the electro-encephalography (EEG). This method can therefore enhance the analysis by the physicians and givea better inside into the mechanisms and structure of the brain. With classical (linear) methods, onewas not able to analysis these signals in such a clear way.

Therefore this (non-)linear data analysis method contributes in such a way to the Graduiertenkolleg”Nonlinearity and Nonequilibrium in condensed matter”, that the analysis of bio-medical data withrespect to the (non-)linear and non-stationary nature of the signals from the brain, makes nonlinearmethods necessary.

Our main cooperations in the Graduiertenkolleg are the contacts to the university hospital, es-pecially with Dr. Rupert Faltermeier (former member of the group of Prof. Obermaier) and Dr.Schulmeyer in the group of Prof. Brawanski. In the mathematical aspects of ICA we work togetherwith Fabian Theis and Stefan Bechtluft-Sachs (mathematical institute). In the applications of theICA method to fMRI-data we work together with Tobias Westenhuber in the group of Prof. Lang.

My future research will focus on the development of new ICA methods to implement time-seriesanalysis and dynamical aspects into multivariate data analysis tools. Until now, time structure in-formation is not taken into account in the classical ICA algorithms, therefore we work together withAndreas Kaiser and Thomas Schreiber in the group of Holger Kantz (time series analysis) from theMax Planck Institute for the Physics of Complex Systems (MPI-PKS in Dresden). Our aim is toenhance the separation quality of the independent components, especially for real world data.

Furthermore the meaningfulness of the independent components and the ”real” number of inde-pendent sources are interesting in real world problems. Questions of how reliable are the resultingcomponents are of great interest not only for the physicians.

To treat the non-stationary data from the neuro monitoring, one probably has to develop a modelfor biomedical systems and fit parameters to it, since ICA is not able to treat these data. A mainadvantage of having a model system is the possibility of studying the system behavior (chaotic behaviorfrom nonlinear differential equations?) and probably use prediction-methods to forecast the signals.

As an example for a totaly different application area of the ICA method is the space scienceresearch. Multi satellite missions produce data from experiments on many satellites, which have tobe analyzed by data analysis tools. The Cluster mission from the european space agency (ESA) is aproject with four autonomis satellites measuring the variation of the magnetic field around our earth.Joachim Vogt at the International University in Bremen is working on the data analysis and we arelooking forward to use ICA for these problems. Another space science experiment is the gravitationalwave detector Geo600 in Hannover. Many different channels from seismic instruments are recordedand analyzed on their influence on the experiment it self. The data analysis is mainly done by theMax Plank Institute for Gravitational Physics (Albert-Einstein-Institute) in Potsdam. It would alsobe interesting to know, how well the ICA method could help to separate distortions by seismic eventsfrom real gravitational waves.

One can see, how versatile this new data analysis method is, not only for pure theoretical applica-tions, but also for real world problems like bio-medical data analysis.

References

[Bell and Sejnowski, 1995] Bell, A. J. and Sejnowski, T. J. (1995). An information-maximisationapproach to blind separation and blind deconvolution. Neural Computation, 7:1129–1159.

[C.G.Puntonet et al., 1995] C.G.Puntonet, A.Prieto, C.Jutten, M.R.Alvarez, and J.Ortega (1995).Separation of sources: a geometry-based procedure for reconstruction of n-valued signals. SignalProcessing, 46:267–284.

[Comon, 1994] Comon, P. (1994). Independent component analysis, a new concept? Signal Processing,36:287–314.

15

[Hyvarinen, 1999a] Hyvarinen, A. (1999a). Fast and robust fixed-point algorithms for independentcomponent analysis. IEEE Transactions on Neural Networks, 10(3):626–634.

[Hyvarinen, 1999b] Hyvarinen, A. (1999b). Survey on independent component analysis. Neural Com-puting Surveys, (2):94–128.

[Hyvarinen et al., 2001] Hyvarinen, A., Karhunen, J., and Oja, E. (2001). Independent ComponentAnalysi¡s. John Wiley & Sons, Inc.

[Hyvarinen and Oja, 1997] Hyvarinen, A. and Oja, E. (1997). A fast fixed-point algorithm for inde-pendent component analysis. Neural Computation, 9:1483–1492.

[Jung et al., 2001] Jung, A., Theis, F. J., Puntmnet, C. G., and Lang, E. W. (2001). Faktgeo - ahistogram based approach to linear geometric ica. ICA 2001- Proceedings, San Diego.

[Lee et al., 1999] Lee, T.-W., Girolami, M., and Sejnowski, T. (1999). Independent component analy-sis using an extended infomax algorithm for mixed sub-gaussian and super-gaussian sources. NeuralComputation, 11:417–441.

[Papoulis, 1991] Papoulis, A. (1991). Probability, Random Variables, and Stochastic Processes.McGraw-Hill, 3rd edition.

[Puntonet and Prieto, 1998] Puntonet, C. G. and Prieto, A. (1998). Neural net approach for blindseparation of sources based on geometric properties. Neurocomputing, 18:141–164.

[Theis et al., 2001] Theis, F. J., Jung, A., and Lang, E. W. (2001). A theoretic model for lineargeometric ICA. ICA 2001 - Proceedings, San Diego.

16