modular learning strategy for signal detection in a nonstationary environment

19
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 6, JUNE 1997 1619 Modular Learning Strategy for Signal Detection in a Nonstationary Environment Simon Haykin, Fellow, ZEEE, and Tarun Kumar Bhattacharya Abstract-In this paper, we describe a novel modular learning strategy for the detection of a target signal of interest in a non- stationary environment, which is motivated by the information preservation rule. The strategy makes no assumptions on the environment. It incorporates three functional blocks: 1) time-frequency analysis, 2) feature extraction, 3) pattern classification the delineations of which are guided by the information preser- vation rule. The time-frequency analysis, which is implemented using the Wigner-Ville distribution (WVD), transforms the in- coming received signal into a time-frequency image that accounts for the time-varying nature of the received signal’s spectral content. This image provides a common input to a pair of channels, one of which is adaptively matched to the interference acting alone, and the other is adaptively matched to the target signal plus interference. Each channel of the receiver consists of a principal components analyzer (for feature extraction) followed by a multilayer perceptron (for feature classification), which are implemented using self-organized and supervised forms of learning in feedforward neural networks, respectively. Experimental results based on real-life radar data are pre- sented to demonstrate the superior performance of the new detection strategy over a conventional detector using constant false-alarm rate (CFAR) processing. The data used in the ex- periment pertain to an ocean environment, representing radar returns from small ice targets buried in sea clutter; they were collected with an instrument-quality coherent radar and properly ground truthed. I. INTRODUCTION HIS PAPER addresses the problem of detecting a target T signal corrupted by some form of interference, which is made difficult by the unknown statistics and nonstationary nature of the environment responsible for generating the received signal. The classical solution to the problem of detection is to use a matched filter receiver. Specifically, the matched filter is designed to maximize the signal-to-interference ratio (SIR) at the receiver output. The design is straightforward when the interference is modeled as additive white Gaussian noise (AWGN). However, when the interference is nonstationary, the Manuscript received August 11, 1995; revised May 27, 1996. This work was supported by the Natural Sciences and Engineering Research Council of Canada. The associate editor coordinating the review of this paper and approving it for publication was Dr. Shigem Katagiri. S. Haykin is with the Communications Research Laboratory, McMaster University, Hamilton, Ont., Canada L8S 4K1 (e-mail: haykin CO synapse.mcmaster.ca). T. K. Bhattacharya is with Raytheon Canada Limited, Advanced Systems Development, Waterloo, Ont., Canada N2J 1K6 (e-mail: [email protected]). Publisher Item Identifier S 1053-587X(97)04232-3. matched filter must take on a time-varying character of its own, making the receiver design more difficult. The design becomes even more difficult when the statistics of the interference are unknown. Conceptually, a matched filter is so called because it is matched to the target signal characteristics. In an information- theoretic context, it may thus be said that a matched filter permits the target signal to pass through it with the minimum loss of information. This interpretation of the matched filter for the special case of a stationary background may be viewed as a manifestation of the information preservation rule, which is a rule of thumb rooted in information theory [l]. The informationpreservation rule may be stated as follows PI: In designing a receiver for a signal-processing task (e.g., target detection), the information content of the received signal should be optimally preserved in a statistical sense and efficiently used in a computational sense until the receiver is ready for final decision making. The detection strategy described in this paper is indeed motivated by the information preservation rule in its most general setting. Specifically, the receiver structure is made modular, encom- passing three distinct functional blocks: time-frequency analysis, feature extraction, pattern classification. In this way, the detection problem is transformed into a pattern recognition problem. Time-frequency (t-f) analysis is a well-developed technique [3]-[5]. In particular, Cohen’s class of t-fdistributions, which were described originally in [6] in the context of quantum mechanics, has been applied to a va- riety of signal-processing problems’ [7]-[ 131. Time-frequency analysis also plays a key role in the echo-location system of a bat [14], [15], which provides a source of motivation for Cohen [3] categorizes the applications of time-frequency distributions into three broad areas: to use the distribution to reveal more information about a signal than would be possible with a standard tool, to exploit a particular property of the distribution, which clearly and robustly represents the time-frequency content of a signal pertaining to that property, to use the distribution as a “carrier” of the full information content of a signal without regard for whether the distribution represents the true time-frequency energy density of the signal. Our interest in the application of a time-frequency distribution falls under the last category. 1053-587W97$10.00 0 1997 IEEE

Upload: iisc

Post on 21-Feb-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 6, JUNE 1997 1619

Modular Learning Strategy for Signal Detection in a Nonstationary Environment

Simon Haykin, Fellow, ZEEE, and Tarun Kumar Bhattacharya

Abstract-In this paper, we describe a novel modular learning strategy for the detection of a target signal of interest in a non- stationary environment, which is motivated by the information preservation rule. The strategy makes no assumptions on the environment. It incorporates three functional blocks:

1) time-frequency analysis, 2) feature extraction, 3) pattern classification

the delineations of which are guided by the information preser- vation rule. The time-frequency analysis, which is implemented using the Wigner-Ville distribution (WVD), transforms the in- coming received signal into a time-frequency image that accounts for the time-varying nature of the received signal’s spectral content. This image provides a common input to a pair of channels, one of which is adaptively matched to the interference acting alone, and the other is adaptively matched to the target signal plus interference. Each channel of the receiver consists of a principal components analyzer (for feature extraction) followed by a multilayer perceptron (for feature classification), which are implemented using self-organized and supervised forms of learning in feedforward neural networks, respectively.

Experimental results based on real-life radar data are pre- sented to demonstrate the superior performance of the new detection strategy over a conventional detector using constant false-alarm rate (CFAR) processing. The data used in the ex- periment pertain to an ocean environment, representing radar returns from small ice targets buried in sea clutter; they were collected with an instrument-quality coherent radar and properly ground truthed.

I. INTRODUCTION HIS PAPER addresses the problem of detecting a target T signal corrupted by some form of interference, which is

made difficult by the unknown statistics and nonstationary nature of the environment responsible for generating the received signal.

The classical solution to the problem of detection is to use a matched filter receiver. Specifically, the matched filter is designed to maximize the signal-to-interference ratio (SIR) at the receiver output. The design is straightforward when the interference is modeled as additive white Gaussian noise (AWGN). However, when the interference is nonstationary, the

Manuscript received August 11, 1995; revised May 27, 1996. This work was supported by the Natural Sciences and Engineering Research Council of Canada. The associate editor coordinating the review of this paper and approving it for publication was Dr. Shigem Katagiri.

S. Haykin is with the Communications Research Laboratory, McMaster University, Hamilton, Ont., Canada L8S 4K1 (e-mail: haykin CO synapse.mcmaster.ca).

T. K. Bhattacharya is with Raytheon Canada Limited, Advanced Systems Development, Waterloo, Ont., Canada N2J 1K6 (e-mail: [email protected]).

Publisher Item Identifier S 1053-587X(97)04232-3.

matched filter must take on a time-varying character of its own, making the receiver design more difficult. The design becomes even more difficult when the statistics of the interference are unknown.

Conceptually, a matched filter is so called because it is matched to the target signal characteristics. In an information- theoretic context, it may thus be said that a matched filter permits the target signal to pass through it with the minimum loss of information. This interpretation of the matched filter for the special case of a stationary background may be viewed as a manifestation of the information preservation rule, which is a rule of thumb rooted in information theory [l].

The information preservation rule may be stated as follows P I :

In designing a receiver for a signal-processing task (e.g., target detection), the information content of the received signal should be optimally preserved in a statistical sense and efficiently used in a computational sense until the receiver is ready for final decision making.

The detection strategy described in this paper is indeed motivated by the information preservation rule in its most general setting.

Specifically, the receiver structure is made modular, encom- passing three distinct functional blocks:

time-frequency analysis, feature extraction, pattern classification.

In this way, the detection problem is transformed into a pattern recognition problem. Time-frequency (t-f) analysis is a well-developed technique [3]-[5]. In particular, Cohen’s class of t-fdistributions, which were described originally in [6] in the context of quantum mechanics, has been applied to a va- riety of signal-processing problems’ [7]-[ 131. Time-frequency analysis also plays a key role in the echo-location system of a bat [14], [15], which provides a source of motivation for

’ Cohen [3] categorizes the applications of time-frequency distributions into three broad areas:

to use the distribution to reveal more information about a signal than would be possible with a standard tool, to exploit a particular property of the distribution, which clearly and robustly represents the time-frequency content of a signal pertaining to that property, to use the distribution as a “carrier” of the full information content of a signal without regard for whether the distribution represents the true time-frequency energy density of the signal.

Our interest in the application of a time-frequency distribution falls under the last category.

1053-587W97$10.00 0 1997 IEEE

1620 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 45, NO 6, JUNE 1997

the detection strategy described in this paper. As for feature extraction, it is commonly used as a preprocessing stage to pattern recognition [16]. The idea of extracting features from a t-f image for solving the detection problem is not new [17]-[19]; in the studies reported therein, however, prior knowledge of the target model was assumed. The novelty of the modular learning strategy for signal detection in a nonstationary background described in this paper lies in what follows:

a principled approach to the integration of the three func- tional blocks mentioned above, with no assumptions made on the environment, and the preservation of information contained in the received signal as the primary design objective, the formulation of a two-channel receiver, with one channel adaptively matched to the interference acting alone and with the other channel adaptively matched to the target signal plus interference,

* the use of a “learning to learn” procedure as the basis for designing the two channels of the receiver.

The net result is a receiver that learns from its environment through the use of representative examples of the received signal, thereby accounting simultaneously for the properties of nonlinearity, nonstationarity, and non-Gaussianity that char- acterize many real-life signals. As such, the new receiver is capable of achieving a detection performance that is superior to the classical approach. A case study is presented in the paper that validates this statement.

The paper is organized as follows. Sections 11-IV address the issues involved in the above-mentioned functional blocks of t-f analysis, feature extraction, and pattern classification. Section V describes the rationale for the modular detection strategy, and Section VI is devoted to its algorithmic consid- erations. The discussion up to this point is of a generic nature. In Section VII, we present a case study involving a coherent radar environment; results are presented comparing the new detection strategy to classical ones. The paper concludes in Section VI11 with some final remarks.

11. TIME-FREQUENCY ANALYSIS Nonstationary signals have time-varying spectral properties,

mandating the use of some form of joint t-f analysis. The technique employed for this purpose should do two things: 1) bring out the nonstationary behavior of the signal in a discernible fashion and 2) allow the separation of multiple components contained in the signal. A possible approach is through the use of t-f distributions. Ideally, the aim is to approximate the elusive t-f energy density function E, ( t , f ) , which is defined as the energy contained in a signal x ( t ) within an infinitesimally small neighborhood around time t and frequency f . We say “approximate” because of the disjoint nature of time-frequency concentration. That is, a signal cannot be concentrated in both time and frequency simultaneously by virtue of the uncertainty principle [3]. A thorough treatment of t-f distributions is beyond the scope of the present paper; a detailed exposition of the subject is presented in [3]-[5]. However, since this signal processing step is highly critical

to the whole detection scheme described herein, we present a brief overview of its relevant characteristics.

A. The Wiper-Ville Distribution

Time-frequency distributions are usually classified into lin- ear and nonlinear methods. The latter category includes the important subclass of bilinear t-f representations (BTFR), the formulation of which has been developed into a general framework by Cohen [3]. Specifically, the t-f distribution C,(t, f ) for a signal x ( t ) is said to exhibit the bilinear property if for every ( t , f ) E R2 there exists a linear operator O(t , f ) such that

At first glance, the importance of choosing this subclass as the basis of our detection strategy may not appear that obvious. However, if we recall that for our specific application, we are seeking a behavior akin to the t-f energy density function, although not necessarily in its exactly true form, then we must try to satisfy the t-f localized counterpart of the global energy conservation principle. That is, for any pair of signals u(t) and v(t) and an arbitrary pair of constants Q and p, we should strive to satisfy the condition

(2)

where E is a measure of energy in the t-f domain, and the asterisk denotes complex conjugation. Correspondingly, in terms of the t-f distribution Cz(t , f ) , we want

&u+pv = lQI2Eu + IPI’Ev + aP*Eu, v + PQ*&,U

Cbru+pv(t, f ) = /QI2CU(t, f ) + IPI2C&, f ) + Qp*Cu,v(t, f ) + P Q * G , U ( t , f ) (3)

which is just a statement of bilinearity imposed on C&) f ) . The first two terms on the right-hand side of (3) represent the auto terms, and the remaining two terms represent the cross terms.

For additional justification for the choice of a BTFR, we may refer to [20] and [21]. In these two papers, Hlawatsch has considered the general class of BTFR’s and formulated the condition for a BTFR to be called regular.’ The BTFR is said to be singular when it is not regular. It is only when the BTFR is regular that it is possible to recover the original signal from its BTFR to within a phase constant [20]. This is an important property since our primary design objective is to preserve the information content of the signal. It is now

2Let C, ( t , f ) denote a bilinear t-f representation (BTFR) of a signal a( t ) . According to Hlawatsch [20], [Zl], C,(t, f ) is defined in terms of a bilinear signal representation (BSR) operator f ; t l , t 2 ) as follows.

The BTFR C,(t, f ) is said to be regular if a bounded inverse operator uC1(t, f , t l , t z ) exists such that

where 6 ( t ) is the Dirac delta function.

HAYKIN AND BHATTACHARYA: MODULAR LEARNING STRATEGY FOR SIGNAL DETECTION IN A NONSTATIONARY ENVIRONMENT 1621

obvious that it is only by choosing a regular BTFR that the loss of information as a result of the transformation is minimized.

A particular BTFR that is regular is the Wigner-We distribution (WVD). Given a signal ~ ( t ) , its WVD is defined by [31-[51

where the lag variable 7 plays the role of a dummy variable. It is important to note that any other BTFR derived from the WVD by smoothing in the t-f plane is singular, in which case, there is information loss due to the smoothing operation. This certainly points to the optimum information-preserving property of the standard WVD.

Although there are other BTFR’s (e.g., the Rihaczek and Page distributions) that are also regular, the WVD amongst them all is the only one that satisfies two additional properties that are highly desirable from a signal detection viewpoint [17]: The WVD is real valued for any complex-valued input signal, and it exhibits the least amount of spread in the t-f plane.

For these two reasons and, most importantly, because of the optimum information-preserving property of the WVD, we have chosen it as the tool for performing the t-f analysis that represents the first step in our modular detection strategy. A criticism that is often leveled against the WVD is the generation of cross terms or, more precisely, cross WVD’s, due to the combined presence of two (or more) signal components. Recognizing that in an interference-dominated environment the cross terms arise only when a target signal is present, it can be argued that the presence of cross terms is, in fact, an asset. We say so because they provide another feature that can enhance the visibility of the target signal in the t-f image resulting from the application of the WVD. Indeed, cross terms contribute to the optimal information-preserving property of the WVD in their own distinct way.

111. FEATURE EXTRACTION

Unfortunately, the use of the WVD leads to a profound increase in the dimensionality of the resulting t-f space, due to the generation of redundant information. For the detec- tion strategy to be computationally efJicient, this redundant information would have to be removed. One way in which this objective can be accomplished is to assume a model for the target signal, which is equivalent to the extraction of its auto term from the WVD image [22]. However, the resulting solution is suboptimal since the original signal cannot be recovered from the auto term alone. The complete WVD is required to do that.

With information preservation as the primary design objec- tive, the preferred method is therefore to use some optimum form of dimensionality reduction on the WVD image. In this paper, we have opted for the use of principal components analysis (PCA) [ 161. Basically, this operation involves per- forming an eigendecomposition on the covariance matrix of a data vector (in our case, a vector obtained by scanning the WVD image on a column-by-column basis with each column

representing a temporal slice), arranging the eigenvalues in decreasing order, and retaining only those eigenvectors that are associated with the dominant eigenvalues. From here on, these particular eigenvectors are referred to as dominant eigenvectors. Given such a set of eigenvectors, the WVD image and, therefore, the original received signal, can be reconstructed with a minimum mean-squared error. In other words, information loss brought on by the extraction of significant features from the WVD by using PCA is kept to a minimum, which is precisely the goal of the information preservation rule.

Fig. 1 shows a block diagram of the modular detection scheme. It has two channels, each consisting of two functional blocks, one for feature extraction and the other for pattern classification. In this diagram, we show principal components analyzers for the feature extraction; other approaches to feature extraction are, of course, possible. The important point to note here is that PCA(O) in the left-hand channel, which is termed the interjGerence channel, is trained by presenting it with WVD images of input data known to contain different realizations of the interference on its own. Once this training is completed, the free parameters of PCA(O) are fixed, thereafter. We may, therefore, speak of PCA(O) as being adaptively “matched to the interference acting alone.” The training of PCA(l) in the right-hand channel, which is termed the target channel, follows a similar procedure, except for the fact that its training examples consist of WVD images of input data known to contain different realizations of the target signal plus interference. Thus, we may speak of PCA(l) as being adaptively “matched to the target signal plus interference.”

The design of both PCA’s is accomplished by using a self-organized learning procedure, the details of which are presented later in the paper.

IV. PATTERN CLASSIFICATION

The final task is that of pattem classification, which is required to distinguish between the following two hypotheses (classes) in an optimum statistical sense:

the null hypothesis Ho, which is the received signal

the other hypothesis H I , which is the received signal

In other words, we have a binary hypothesis testing problem on our hands.

In the modular learning strategy of Fig. 1, this problem is tackled as follows. The two sets of features, extracted by PCA(O) and PCA(l) from the WVD image of a received signal, are applied to a corresponding pair of multilayer perceptrons: MLP(O) and MLP(l). Their use is motivated by the fact that an MLP is able to construct arbitrary decision boundaries between the two classes of interest HO and HI. The resulting “analog” outputs of the two MLP’s are then linearly combined to produce an overall output denoted by z . A final decision is made by comparing z against a preset threshold denoted by A. As indicated in Fig. 1, if the threshold is exceeded, a decision is made that a target signal is present (i.e., hypothesis

consists of interference only,

consists of a target signal plus interference.

1622 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 6, JUNE 1997

[ lzl > A. : Target signal is present (hypothesis HI) m F j Comparator 1 Id < k Target signal is not present(hypothesis I&)

Panem classification

FeatUre extraction

Interference channel

matched to the WVD of interference

W(real valued)

Wigner-Ville distribution (WVD) computer

x : Received signal vector (comple valued)

Fig 1 Block diagram of the two-channel receiver.

H1 is true); otherwise, a decision is made that there is no target signal present (i.e., hypothesis HO is true).

In such a signal detection scenario, the following two decisions are of particular concern:

Missed Detection: The receiver decides in favor of hy- pothesis HO when H I is true. False Alarm: The receiver decides in favor of hypotheses H I when HO is true.

Ordinarily, it is difficult to assign costs to these two wrong decisions. Accordingly, the customary practice is to follow the Neyman-Pearson criterion, in which the probability of detection (i.e., saying H I when H1 is true) is maximized, subject to a constant probability of false alarm [231.

The design of the two MLP’s and linear combiner is accomplished by using a supervised learning procedure, details of which are presented later in the paper.

v. RATIONALE FOR THE MODULAR DETECTION STRATEGY

In mathematical terms, the two-channel receiver of Fig. 1 (i.e., the combination of WVD computer, PCA’s, and MLP’s) maps the multidimensional input (data) space into a 2N- dimensional output (target) space, where N is the number of output nodes per channel. In the input space, there is a precise separation between the two classes of received signal HO and H1, which is determined by monitoring the environment. However, in mapping the input space onto the output space, this precise separation between the classes HO and H I is learned in an imprecise fashion, with the result that decision (classification) errors are made at the final receiver outp

The exact details of the mapping from the input space to the output space depend, among other things, on the choice of N , which is the number of output nodes per channel, as will be explained next.

HAYKIN AND BHATTACHARYA: MODULAR LEARNING STRATEGY FOR SIGNAL DETECTION IN A NONSTATIONARY ENVIRONMENT I623

With binary hypothesis testing as the ultimate objective, an obvious choice for the number of output nodes per channel is N = 2. Under such a scheme, each channel of the receiver can only discern one of two possible outcomes. That is, given the received signal, each channel can only say that hypothesis Ho or H1 is true, depending on which particular output is higher. This “restricted” form of pattern classification results in a region of uncertainty, where the points representing classes HO and H I in the output space may overlap significantly.

In a real-life situation, however, we usually find that the target signal varies over a wide dynamic range. Specifically, a qualitative description of the received signal may fall under one of three likely categories:

The received signal consists of a strong target signal plus

The received signal consists of a weak target signal plus

The received signal consists solely of interference. This suggests the use of N = 3 output nodes per channel.

This alternative scheme permits a “finer” form of pattern classification with some beneficial effects: a more compact clustering of the points representing classes HO and H1 in the output space and a reduced overlap between them. This, in turn, means that the two-channel receiver of Fig. 1 with three output nodes per channel has the potential. to outperform the same receiver with two output nodes per ~hannel .~ The experimental results presented in Section VI1 bear out the validity of this statement. Indeed, it is for this reason that we have used N = 3 in Fig. 1.

Another question that needs to be addressed is why the receiver of Fig. 1 has two distinct channels in the first place. To answer this question, we note that in the traditional approach to radar target detection in a clutter-dominated environment, for example, we may use a “best” mismatched Jilter for clutter discrimination [25] . In such an approach involving a single channel in the receiver, the requirement for best performance in additive noise is traded for an improvement in performance in clutter by purposely mismatching the filter. We may avoid the need for this tradeoff in performance by using two nonlinear matched filters, as depicted in Fig. 1, with each filter being adaptively matched to the received signal that arises under one of two hypotheses: HO or H I . In addition, the use of two different channels as described herein provides two independent assessments of the received signal, exploiting the fact that the WVD images of the received signal under hypotheses Ho and H I look different even when the signal- to-interference ratio is relatively low. The simplest method of integrating the two channel outputs is through the use of linear

interference.

interference.

31n a loose sense, the arguments presented here, suggesting that the receiver of Fig. 1 with N = 3 can produce a better classification accuracy than the same receiver with N = 2, remind one of the issue of softdecision coding versus hard-decision coding in digital communications [24]. In hard-decision coding, binary quantization is applied to the demodulator output, resulting in an irreversible loss of information in the receiver. To reduce this loss, multilevel quantization (as an approximation to soft-decision coding) is used.

In Fig. 1, we may go on and increase the number of output nodes per channel beyond N = 3 to provide a finer description of the target strength and, therefore, better pattern classification. However, it is considered that N = 3 is the best compromise between improved classification performance and increased computational complexity.

combining, which is applicable to a wide class of optimization costs [26] . This is precisely what has been done in designing the modular learning strategy of Fig. 1.

The training of the two MLP’s and linear combiner in Fig. 1 proceeds in a supervised manner. To do this, we may use one of two methods:

Each of the two MLP’s is trained separately, with hard decisions being made at their respective outputs. For example, in the case of three output nodes per channel, the MLP in the target channel is constrained to classify the received signal as containing a strong target signal, containing a weak target signal, or simply consisting of interference on its own. The “digital” outputs of the two MLP’s are then linearly combined to produce an overall output, where the final decision is made whether a target is present or not. The two MLP’s and linear combiner are all trained simultaneously. The outputs of each MLP are now free to assume “analog” values within the limited range set by the activation functions of its output neurons (processing units) and in accordance with the training data. Under this second method, hard decision making is deferred to the final output of the receiver.

The attractive feature of the first method is that the decision boundaries between the different classes of received signal are well defined at the outputs of the two channels. Nevertheless, in the study reported in this paper, we opted for the second method for two important reasons:

Hard decisions are accompanied by an irreversible loss of information. In light of the information preservation rule saying that decision making should be deferred to the very final output of the receiver, it may therefore be argued that the second method preserves the information content of the received signal better than the first method. The second method requires a single stage of supervised learning, which is perhaps computationally more effi- cient than the two different stages of supervised learning needed to implement the first method.

To summarize, the receiver of Fig. 1 “learns to learn” about its environment by proceeding as follows. First, the two PCA networks are individually trained on their respective examples of received signal, using a self-organized learning algorithm. Next, the two MLP’s and linear combiner are simultaneously trained on a fresh set of examples using a supervised learning algorithm.

VI. TRAINING ALGORITHMS

Computation of the WID, which is assumed to be of size M-by-L, is based on its discrete version written as

k = l 1 = 1, 2 , ’ . . , L m = 1, 2 , . . . , M (5 )

where 2[1] denotes a sample of the received signal ~ ( t ) at time t = IT. The sampling intervals along the time and frequency

1624 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 45, NO 6, JUNE 1997

axes of the WVD image are denoted by T and F , respectively. The WVD image is scanned on a column-by-column basis, with each column referring to a particular time instant. Let the M-by-1 vector u l denote the Zth column of the WVD image. By scanning the WVD image, we thus construct a time series of vectors (u1, u 2 , . . . , UL], which constitutes one epoch of examples to be used for performing PCA on the WVD image. Note that even though the input 2[1] is originally complex valued, the vector UI is real valued, which means that all subsequent processing performed on u l can be done with real parameters.

The PCA itself is performed by means of the generalized Hebbian algorithm (GHA) [27]. Let qt3 denote the synaptic weight of a linear neuron i connected to a source node j , where = 1, 2, a . . , M , and 1. = 1, 2, .” , p, with p < M . Then, the synaptic weight qZ3 is updated according to the rule

where

ug y2 7 learning-rate parameter, n iteration step.

j th component of an input vector U,

output of the ith neuron,

For a sufficiently small 7 and large enough n, the algorithm converges to a steady-state condition, in which the M-by- 1 synaptic weight vectors ql, q 2 , . - , qp computed by the GHA converge to the eigenvectors associated with the p largest eigenvalues of the covariance matrix of the input vector U [27].

The GHA algorithm is used to design both PCA networks of the receiver. Specifically, for PCA(O) in the interference channel, a training set of A 0 epochs is constructed, with each epoch representing the discrete WVD image of a particular realization of the received signal for which the null hypothesis Ho is known to be true. An image is picked at random from this set, and its column vectors u1, 7.12, . . , , U L are fed into the GHA algorithm one at a time and in that order. When the processing of that image is completed, it is put back in the training set. The images are then shuffled to randomize their arrangement. Next, another image is picked at random from the set, and the procedure is repeated once more. This training process is continued until the PCA(O) network reaches a steady state with no further noticeable changes in its weights, at which point, its training is terminated. The training of the PCA(l) network proceeds in exactly the same fashion, except that this time, the training set consists of A1 epochs, with each epoch representing the discrete WVD image of a particular realization of the received signal for which the other hypothesis H I is known to be true.

Let the M-by-L matrix W denote the discrete WVD image of a received signal not seen previously by these two PCA networks. Then, with

the p-by-L matrix of outputs Y produced by the PCA network is defined by

Y = QTW (8) where the superscript T denotes transposition. Typically, the dimension p is much smaller than M , thereby realizing the desired dimensionality reduction. In any event, the (i, j)th element of matrix Y represents the projection of the j th column of matrix W onto the ith dominant eigenvector q,. The matrix Y is applied by the PCA network to the 2-D input layer of the associated MLP.

As pointed out earlier, the two MLP’s (in the interference and target channels) and linear combiner are all trained si- multaneously in a supervised manner. To do this training, we use a fresh set of examples of the received signal not seen previously by the two PCA networks. This new set is made up of Bo examples of the received signal for which hypothesis HO is known to be true and B1 examples of the received signal for which hypothesis H1 is known to be true; each example is L time samples long. The supervised training of the two MLP’s and linear combiner, which are treated as one entity, is accomplished by means of the well-known back-propagation (BP) algorithm [28], [29]. The batch mode of the algorithm is employed to do the training, as described here. A batch of examples (representing a mixture of hypotheses HO and H I ) is picked from the training set completely at random and presented to the WVD computer one example at a time. For hypothesis Ho, the desired response at the receiver output is put equal to 0.1, and for hypothesis HI, it is put equal to 0.9. On the presentation of each example, the error signal (i.e., the difference between the desired response and actual receiver output) is recorded. When the processing of the whole batch of examples is completed, adjustments to the synaptic weights of the two MLP’s and linear combiner are computed in accordance with the BP algorithm. Then, another batch of examples of the received signal, picked at random from the training set, is presented to the WVD computer, and the whole sequence of computations is repeated. This procedure is continued until the two MLP’s and linear combiner reach a steady state with no further noticeable changes to their synaptic weights.

Once the training of the entire receiver is completed, its parameters are all fixed, whereupon the receiver is ready for testing with different realizations of the received signal not seen previously by the receiver; that is, the test data are completely independent of both the input data used to train the PCA networks and those used to train the MLP’s. The test data consist of CO examples of the received signal for which hypothesis HO is known to be true and C1 examples of the received signal for which hypothesis H1 is known to be true. As before, each example is L time samples long. Then, using these two subsets of test data, we may calculate the conditional probability density functions f ( z l H o ) and f(z(H1), given that hypotheses HO and HI are true, respectively; the receiver output z may assume positive as well as negative values. The probability of false alarm PFA is defined by

Purpose Number -4-

TABLE I

Radar parameters on transmission: radio frequency: 9.39GHz pulse repetition frequency: 1 kHz puke duration: 200 ns

Analoe-to-dieital conversion at the receiver: - - sampling rate: 30 MHz wordlength: 8 bits

Environmental Variables

Significant wave height (4

1.59 1.57 1.47 1.46

1.47 1.47 1.46 2.38 2.6 2.23 2.41 2.67 2.42

15 8 18 20

18 21 20 23 20 8

20 18 12

Max. wave height (m)

2.6 2.4 2.34 2.27

2.34 2.5 2.27 3.24 3.71 3.47 3.42 3.85 3.61

Peak period, (SI

11.11 11.11 11.11 11.11

11.11 11.35 11.11 7.69 9.09 10.00 8.00 7.69 8.00

where X is the threshold. Using (9), we may determine the value of X that results in a prescribed false alarm rate. Then, knowing A, we may calculate the corresponding probability of detection PO using the definition

The probabilities PO and PFA thus provide a quantitative measure for the detection performance of the receiver.

VII. CASE STUDY: RADAR TARGET DETECTION OF A SMALL TARGET IN SEA CLUTTER

The two-channel receiver of Fig. 1 defies a statistical anal- ysis of detection performance along traditional lines due to its nonlinear nature. Therefore, to evaluate the practical merit of this new receiver, we performed a case study involving the detection of a growler floating in an ocean environment. A growler is a small piece of ice that is broken off an iceberg. The above-surface visible portion of it is about the size of a grand piano (i.e., a radar cross-section of about 1 m2). However, recognizing that about 90% of the volume of ice lies below the water surface, a growler represents an object large enough to be hazardous to navigation in ice-infested waters, such as those encountered on the East Coast of Canada during the Spring and early Summer. The radar task at hand is that of detecting the radar echo from a growler in the presence of interference represented by sea clutter.

For the collection of radar data representative of this envi- ronment, an instrument-quality radar system called the ZPZX radar was used. The IPIX radar [30] is a fully coherent,

-5 0 5 10

30 20 10 0

0.00 0.05 0.10 0.15 0.20 0.25

t s

HAYKIN AND BHATTACHARYA: MODULAR LEARNING STRATEGY FOR SIGNAL DETECTION IN A NONSTATIONARY ENVIRONMENT 1625

~~

polarimetric, X-band radar system equipped with computer study is confined to the use of coherent data collected under

-10 -5 0 5 10

40 35 30 25 20 15

0.00 0.05 0.10 0 15 0.20 0.25

(b)

-10 -5 0 5 10

35 30 25 20 15

0.00 0.05 0.10 0.15 0.20 0.25

(-)

(C)

Fig. 2. growler. (c) WVD for sea clutter.

(a) WVD for a clearly visible growler. (b) WVD for a barely visible

control and digital data acquisition capability. The present

I626 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 6, JUNE 1997

the polarimetric condition of horizontal transmit and horizontal receive only. The radar was operating in a staring mode (i.e., pointing onto a patch of the ocean surface). A series of experiments using the IPIX radar were performed at a site located on the East Coast of Canada. The radar was mounted at a height above sea level that would be representative of a ship-mounted radar. Ground truthing of the data collected was maintained throughout the experiments, thereby providing knowledge of the conditions under which the various datasets were collected. A summary of the radar parameters and the environmental conditions under which the radar data were collected is presented in Table I.

This case study was chosen for the application at hand be- cause both the target of interest (a growler) and the background interference (sea clutter) are known to exhibit nonstationarity, which would require the use of adaptivity. Moreover, the generation of sea clutter is governed by a nonlinear dynamical process, which would therefore require the use of nonlinear processing. Thus, the detection of a growler in sea clutter provides a suitable medium for testing the capab new detection strategy.

A. Experimental Results

To appreciate the importance of the WVD for the radar detection problem at hand, we present sample WVD images of real radar returns representing three different situations:

strong radar returns from a large growler, which is shown in Fig. 2(a), relatively weak radar returns from a small growler, which is shown in Fig. 2(b), sea clutter alone, which is shown in Fig. 2(c).

Each figure also includes plots of the actual time series and its power spectrum, which is shown along the horizontal and vertical axes, respectively. From these figures, we see that the WVD image presents a much clearer picture about the presence or absence of a target than either the time series or the power spectrum viewed alone. In particular, the WVD images in Fig. 2(a) and (b) exhibit a zebra-like pattern alternating between black and white narrow stripes, which occupy an area located between the instantaneous frequency plot of the target (growler) centered around 0 Hz and that of the clutter. This pattern is indeed a manifestation of the cross WVD’s. The important point to note is that the presence of the zebra-like pattern is found to be 1) quite pronounced at relatively low target signal-to-clutter ratios and 2) relatively robust to variations in the target signal-to-clutter ratio. To verify this assertion, a computer simulation was performed in which the target signal (modeled as a chirp that is typical of the radar return from a growler) was artificially added to radar returns representing sea clutter alone. By varying the signal energy and, thus, the target signal-to-clutter ratio, the cross-terms were found to be the most distinguishing features of the WVD image, even at a signal-to-clutter ratio as low as -20 dB. This remarkable result confirms what we said earlier: The generation of cross WVD’s due to the presence of multicomponents in the received signal is helpful to the radar detection task.

Fig. 3 shows a display of the two most dominant eigenvec- tors convolved with a sample WVD image containing both sea clutter and growler. Two interesting observations can be made from Fig. 3:

The growler does not appear to produce a response in the clutter (interference) subspace, and the clutter is markedly suppressed in the projection onto the second dominant eigenvector in the growler (target) subspace. The two projections onto the growler (target) subspace retain the significant structure of the cross-terms, as evidenced by the presence of black and white (zebra-like) stripes.

These two observations clearly show that the two PCA networks do indeed pick out the relevant features, and the cross-terms generated due to the bilinear nature of the WVD are important manifestations of those features.

Earlier, we pointed out that the modular learning strategy of Fig. 1 works better with three output nodes per channel than two output nodes per channel. To validate this assertion experimentally, the analog outputs of the MLP’s in both chan- nels (i.e., four outputs in one series of tests, and six outputs in another series of tests) were computed for an independent set of test data containing examples from both classes HO and H1. Unfortunately, it is difficult to display the information contained in the 4-D and 6-D output spaces resulting from these tests. To get around this problem, principal components analysis was performed on the covariance matrix of the vector of MLP outputs, and only the eigenvectors associated with the first three dominant eigenvalues were retained.4 Thus, the vector of MLP outputs could be projected onto a 3-D space represented by these three dominant eigenvectors. The results obtained are shown in Figs. 4-9.

Figs. 4-6 were obtained for the case of a receiver using two output nodes per channel. In particular, Fig. 4(a) and (b) show the 2 : 1 projections (i.e., the projections on a plane defined by the second and first dominant eigenvectors) for two classes of input data: HO (clutter) and H1 (growler plus clutter), respectively. Three distinct areas may be identified in Fig. 4(b) for which hypothesis H I is true:

An area of uncertainty (shown shaded), where decision errors are likely to be made. In this region, the growler is barely visible such that the received signal is almost indistinguishable from a received signal consisting of clutter only (hypothesis Ho). An area to the right of the uncertainty region, where the target signal is strong and the probability of detection is therefore high. An area to the left of the uncertainty region, where the growler is simply invisible (being hidden behind an ocean wave) and nothing can be done to detect it; in such an event, the received signal is, for all practical purposes, the same as sea clutter.

4The principal component’s analysis performed here has nothing to do with the design of the receiver. Rather, it was used here as a graphical tool to gain insight into its behavior.

HAYKIN AND BHATTACHARYA: MODULAR LEARNING STRATEGY FOR SIGNAL DETECTION IN A NONSTATIONARY ENVIRONMENT 1621

First PI

Second

.incipal

F’rincil:

VI

Growler subspace Clutter subspace /\

ector

Vector

Fig, 3. and frequency scales of each WVD image, see Fig. 2.

Convolution of the two most dominant eigenvectors of each channel of the receiver with a sample WVD of growler and clutter. For the time

Fig. 5(a) and (b) show the 1 : 3 projections (i.e., the projections on a plane defined by the first and third dominant eigenvectors) for two sets of input data: H I (growler plus clutter) and HO (clutter), respectively. Once again, shading has been used in Fig. 5(a) to mark the area of uncertainty. However, in order to explain the separation between classes in the 3-D space (formed by the three dominant eigenvectors), a lighter shading is used to mark the area of uncertainty that was obtained in Fig. 4(b). The darker shade in Fig. 5(a) marks the region of uncertainty in the plane of this figure. It is evident that the region of uncertainty still contains a large number of samples that may lead to erroneous decisions. In other words, no significant separation between the two classes HO and H I has been achieved. This is also clearly seen in Fig. 6(a) and (b) pertaining to the 3 : 2 projections (i.e., the projections on a plane defined by the third and second dominant eigenvectors). The conclusion to be drawn from Figs. 4-6 is as follows.

Conclusion 1: The use of two output nodes per channel is unable to provide a significant separation between the points representing the HO and H I classes in the output space.

Carrying out this same analysis for the case of three output nodes per channel, we find that the experimental results are now dramatically different, as shown in Figs. 7-9. Examination of these figures reveals that we still have a region of uncertainty (shown shaded) where the two classes HO and H I overlap, but there are now fewer samples that may lead to erroneous decisions. The conclusion to be drawn from Figs. 1-9 is as follows.

Conclusion 2: The use of three output nodes per channel improves on the separation between the points representing the HO and H I classes in the output space.

Conclusions 1 and 2 drawn from the experimental results presented in Figs. 4-9 confirm the observations we made in Section VI concerning the effect that the number of out-

1628 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 45, NO 6, JUNE 1997

t 2 0.055 n

dcrtter +

+ ++

Mapping of Input Vedon 14 3rd Hidden Layer Vedors 2 nod-

0.07

0.065

0.06

2 0.055 + +

0.05

0.045

0.04

-02 0.035 -1 2 -1 -0.8 -0.6 -0.4

1st Pc

0.07

0.065

0.06

2 0.055

? (U

0.05

0.045

0.04

0.035

Mapping of lnpt Vectors b 3rd Hidden Layer Vectors 2 nodes

-1.2 -1 -0.8 -0.6 -0.4 -0.2 1st PC

(b)

Fig. 4. Two output nodedchannel. (a) Clutter scatter plot for the 2 1 projection. (b) Growler plus clutter scatter plot for the 2 ' 1 projection.

put nodes per channel in the receiver of Fig. 1 has on the separability of classes Ho and H I in the output space.

B. Network Connectivity and Training

The 2-D WVD plane has a time dimension L = 256 with spacing T = 1 ms and a frequency dimension M = 256 with spacing F = 4 Hz.

Each of the two PCA networks consists of a feedforward network with an input layer of M = 256 source nodes (fed from the WVD image of the received signal) and a single computation layer of p = 5 linear neurons. Both networks are fully connected, in that each neuron of either network is connected to all the source nodes of its respective input layer. The total number of connectionshndependent weights for each PCA network is 1280. Both networks were trained using the

HAYKIN AND BHATTACHARYA: MODULAR LEARNING STRATEGY FOR SIGNAL DETECTION IN A NONSTATIONARY ENVIRONMENT 1629

-1.2

-1.4

Mapping of lnpul Vectors lo 3rd Hidden Layer Vectors 2 nodes

+ + + +

+ ++

++ + + + + + + - + +

+ ++A

I I + :+ I I I

- + ++A+++ + +k+& + +*t %+++++?& ++:++ +:.

-0.2

-0.4

-0.6

-0.8

-1

GHA algorithm. The training set for the PCA(O) network was made up of A0 = 2000 epochs, representing hypothesis Ho. The training set for the PCA(l) network was made up of A1 = 500 epochs, representing hypothesis H I . The individual epochs of WVD images were generated using examples of the received signal, each being made up of 256 samples.

Fig. 10 shows the architectural details of the two MLP’s. The input layer of each MLP consists of an array of p x L

source nodes fed from a compressed image with p = 5 and L = 256. The first hidden layer consists of an array of 5 x 15 neurons, as indicated in Fig. 10(a). The network architecture of this layer was constrained by incorporating the following concepts for improved training and perhaps better generalization performance [29], [3 11:

1) Receptivejeld, which means that a neuron in each row of the first hidden layer is connected only to a certain

1630 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 6, JUNE 1997

-1.109

-1.1092

-1.1094

-1.1096

-1.1098

-1.11

Mapping of Input VeclOFs to 3rd Hidden Layer vedocs 2 nodes -1.1088 I I I I I I

+

+ + + * + + - ++

,-++ + + + +

- + + + + + t

+ + ++

+ -

-

t -1.1102

+ + + +

+ +

+ +

+ + + + + + + +

+

+ +

+t +

+ + + *+ +*+q +

+ + % +"+ ++*

+ ++

-1.1104 I I I I I I I

0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 2nd PC

(a)

Mapping d kpt#Vecicusto3rd Hidden Layer V e d m 2 nodes : + I 1 I I I I I + ++ ++u c & dutw+ I

-1.109

-1 .I 092

-1.1094

2 k -1.1096

-1.1098

-1 .I 1

+ I

' + +

1 + + +

+ + 1 1

-1.1102 1 I I I I I I I

0.035 0.04 0.045 0.05 0.055 0.06 0.065 0.07 2nd PC

(b)

Fig. 6. Two output nodeskhannel (a) Growler plus clutter scatter plot for the 3 2 projection (h) Clutter scatter plot for the 3 : 2 projection.

number of source nodes (denoted by R) that lie in its local neighborhood in the corresponding row of the input layer, synaptic weights.

3 ) Weight sharing, which means that the receptive fields of all the neurons in a particular row share the same set of

For our present study, we chose R = 32 and S = 16, as indicated in Fig. 10(b). In addition, each MLP has a second hidden laver of 25 neurons and outwut laver of three neurons.

Overlap of receptive fields, which means that the re- ceptive Of adjacent in a particular 'Ow

overlap by a certain ~ ~ m b e r of Source nodes (denoted by SI,

both of which are fully connected. Fig. 1O(c) summarizes the number of connections and number of independent weights

HAYKIN AND BHATTACHARYA: MODULAR LEARNING STRATEGY FOR SIGNAL DETECTION IN A NONSTATIONARY ENVIRONMENT

Mapping of Input Vectors to 3rd Hidden Layer Vectors 3 nodes

2 P cu

0.086

0.0855

0.085

0.0845

0.084

0.0835

0.083

0.0825

0.082

0.081 5

0.086

0.0855

0.085

0 . M

0.084

0.0835

0.083

0.0825

0.082

0.0815

+' + '

+ + + + + +

+ # + +

+

p '++ I +

I I I I 1 -0.25 -0.245 -0.24 -0.235 -0.23

1st PC

(a)

Maouna of lnwt Vectors to 3rd Hidden Laver Vectors 3 nodos

+ -

+

c

+ ++rc

+

+ + + +

++

* +

+

+ +

1631

-025 -0245 -0.24 -0235 -0.23

(b)

1st PC

Fig. 7. Three output nodeskhannel (a) Clutter scatter plot for the 2 . 1 projection (b) Growler plus clutter scatter plot for the 2 ' 1 projection

in each MLP on a layer-by-layer basis. The neurons in both MLP's are all nonlinear, using a sigmoidal activation function defined by the logistic function [29]

potential v includes a threshold, which is represented by an adjustable weight connected to an input fixed at - I , as indicated in Fig. 10(b). The two MLP's and linear combiner, which are treated as one entity, were trained using the BP algorithm. A total of 10 1515 examples of the received signal were used to do the training. They were made up as follows: Bo = 7150 examples representing hypothesis HO and B1 =

1 (11)

where v is the activation potential of the neuron. The activation

= 1 + exp (-v)

1632 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 45, NO 6, JUNE 1997

M a w h Of Input Vectors to 3rd Hidden Laver Vectors 3 nodes -0.225

-023

-0.235

6 -0.24 In r

-0.245

-025

-0.255

0 a c

7 tn

.. ~ .

+ + +

+ + + + + + * +

+ * + + + + + + + + +++ +

+ ++ =+ ++ ++I+ + 5 ; + * + +

+ +++&p+ + + + ++

I 1 I I I I I

16 o.ai65 0.817 0.8175 0.818 0.81 a5 0.819 0.81 95 3rd PC

(a)

Mapping of input Vectors to 3rd Hidden Layer Vectors 3 ncdes

0.816 0.8165 0.817 0.8175 0.818 0.8185 0.81 9 0.81 95 3rd PC

(b)

Fig. 8. Three output nodeskhannel. (a) Growler plus clutter scatter plot for the 1 : 3 projection. (b) Clutter scatter plot for the 1 : 3 projection.

3006 examples representing hypothesis H I . Each example of the received signal was 256 samples long. This training dataset

networks .

C. Detection Results channel.

1) a noncoherent receiver, 2) a Doppler constant false-alarm rate (CFAR) receiver

P21, 3) a neural network (NN) implementation of the two-

channel receiver of Fig. 1 with three output nodes per

The results of Fig. 11 were obtained for a long dwell time (approximately 35 s) along a range swath of 200 m and a

was completely different from that used to train the PCA

Fig. 11 presents a visual display of the detection statistics for three different receivers:

HAYKIN AND BHATTACHARYA: MODULAR LEARNING STRATEGY FOR SIGNAL DETECTION IN A NONSTATIONARY ENVIRONMENT 1633

0.82

0.8195

0.81 9

0.8185

" 0.818

R a

0.8175

0.81 7

0.8165

0.81 6

2 E

Mapping of lnput Vectors to 3rd Hidden Layer Vectors 3 nodes

+ 1 0.8155

0.0815 0.082 0.0825 0.083 0.0835 0.084 0.0845 0.085 0.0855 0.086 2nd PC

(a)

Mapping of input Vectors to 3rd Hidden Layer Vedors 3 nodes

0.8195

0.819

0.81 85

0.818

0.8175

0.817

0.81 65

0.816

1

++

+ +

+ +

I 0.0815 0.082 0.0825 0.083 0.0835 0.084 0.0845 0.085 0.0855 0.086

2nd PC (b)

Fig. 9. Three output nodes/channel. (a) Growler plus clutter scatter plot for the 3 : 2 projection. (b) Clutter scatter plot for the 3 : 2 projection.

range gate (resolution) of 5 m; the total number of radar samples represented here is 2.68 x lo6. The test data used here were completely different from the data used to train the PCA networks and those used to train the MLP's. The darkness of the display in Fig. 11 is a measure of the actual power of the receiver output before thresholding. All three parts of the figure have been normalized separately to remove any bias

introduced by changes in dynamic ranges of the receivers. The noncoherent receiver, which is reliant on amplitude infor- mation only, has been included in Fig. 11 to emphasize the impact that the addition of Doppler (realized through the use of radar coherence) has on detection statistics in a visual sense. Another important point to note is that the Doppler CFAR and NN receivers paint the two classes Ho and H I in dramatically

1634 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 6, JUNE 1997

Projection onto id dominant eigenvector, i = 1,2, ..., 5

A m y of neurons

L = 256

Reduced WVD image

(a)

Threshold--, ~ A

(b)

Fig. 10. Architectural details of each multilayer perceptron (MLP).

Do D D I e r C FAR

0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 ronge gote (5m aport) ronge gote (5m aport) ronge gote (5m aport)

Fig. 11. Detection statistics for three different receivers

different colors. In particular, the discrimination between the clutter background (class rS,) and growler (class H I ) is far more pronounced in the NN receiver than it is in the Doppler CFAR receiver. This is the direct result of the fact that the

Doppler CFAR receiver is basically linear, whereas the NN receiver is highly nonlinear.

To further emphasize the performance difference between these two receivers, Fig. 12 shows the postdetection results

HAYKIN AND BHATTACHARYA: MODULAR LEARNING STRATEGY FOR SIGNAL DETECTION IN A NONSTATIONARY ENVIRONMENT 1635

-- I

noncoherent - - 200 - U

.Ice E (D e

150- c 0 m 4) U)

a. @

.- c 100- v

.- f -.

0 0 10 20 30 40

range gote (5m apart)

Fig. 12.

h U

E U3 rc) t-4

0 ..- m a. c

C v 4)

.-

E .- c

WN based detector

0 10 20 30 40 0 10 20 30 40 range gate (5m apart)

Postdetection results for three different receivers.

range gate (5m apart)

obtained by comparing the amplitude of each receiver output against a threshold. The threshold was set for a false alarm rate of which is considered typical for the operation of a surveillance radar. (The noncoherent receiver is again included merely for comparison.) The color black in Fig. 12 signifies the presence of the growler (hypothesis H I ) , and white signifies its absence (hypothesis Ho). With the radar operating in a dwelling mode, the growler should ideally be visible to the radar all of the time, that is, we should ideally see a continuous black strip extending all along the time axis. With this ideal picture in mind, we see a remarkable improvement in the behavior of the NN receiver in that it fills in the periods of “silence” frequently seen in the detection performance of the conventional Doppler CFAR receiver. This so-called silence is obviously caused by the partial obstruction of the growler (target) by an ocean wave in front of it or the dipping of the growler in a trough. The detection performance displayed in Fig. 12 is indeed quite remarkable. It shows that the NN receiver is able to perform well, even in a situation when the radar returns from the growler are weak. In other words, a “barely visible target is made visible in signal processing terms.” The other important observation is the occasional blanking of a signal from the growler (as seen, for example, in the middle of the plot); in such cases, there is no way any method would be able to detect the target since insofar as the radar is concerned, the target is just not there to be seen.

To describe the detection performance of the receiver of Fig. 1 in traditional quantitative terms, we used a test dataset consisting of a total of 32 292 examples with each example consisting of 256 radar samples, which (as mentioned previ- ously) were completely different from the data used to train the PCA networks and those used to train the MLP’s. The detection performance of the receiver is summarized in Table

11. This table also includes the corresponding performances of three other receivers:

neural network implementation of the receiver of Fig. 1, but with two output nodes per channel, neural network implernentation of a single-channel re- ceiver that operates by extracting the signal autoterm from the WVD image of received signal [22], Doppler CFAR receiver [32].

Table I1 shows that for the prescribed false-alarm rate of lop3, neural network implementation of the receiver of Fig. 1 with N = 3 output nodes per channel yields the best detection performance, namely, PO = 0.91. It is also of interest to note that in [33], receiver operating characteristics are presented showing that 1) the receiver of Fig. 1 with N = 3 output nodes per channel consistently performs better than the same receiver with N = 2 output nodes per channel, and 2) the receiver with N = 3 outperforms the conventional CFAR receiver for PFA > 0.03.

D. Robustness of the Detector

Table I tabulates some relevant radar and environmental parameters in the database that was used for the study. The database was made up of four training datasets and nine test datasets. The test datasets had not been previously seen by the receiver, either for self-organized training of the PCA networks or for supervised training of the MLP classifiers. Although the training datasets correspond to more-or-less similar environmental conditions, the point to note from this table is that the significant wave height is approximately 1.5 m. However, the test datasets pertain to wave heights varying from 1.5-2.6 m. Since the growler protrudes only about 1 m above the water line, the differences in waveheights are

1636 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 45, NO. 6, JUNE 1997

TABLE I1

Test data: C, = 29,800 examples representing hypothesis % C, = 2,492 examples representing hypothesis Hi

Length of each example: 256 samples Probabihty of false alarm, PFA =

Neural network implementation of receiver of Fig. 1 with 3 output ncdedchannel

Neural network implementation of receiver of Fig. 1 with 2 output nodeskhannel

Neural network implementauon of single-channel receiver using signal auto-term of WVD image I221

Conventional Doppler constant false-alarm rate (CFAR) receiver [32]

0.91

0.89

0.87

0.70

significant. Table I thus clearly points to the robustness of the neural network-based receiver. The robust behavior of the modular detection strategy is attributed directly to the adaptive nature of the receiver, which results from the combined use of self-organized learning and supervised learning for its design.

for the detection of a small piece of ice (growler) floating in an ocean environment, using a coherent radar. Highlights of the case study, based on the use of test data completely different from the training data, are as follows:

* The performance of the adaptive modular receiver is supe- rior compared to a conventional Doppler CFAR receiver.

* The adaptive behavior of the new receiver permits a ro- bust detection performance with respect to wide variations in environmental conditions. Implementation of the modular receiver with three out- put nodes per channel performs better than the same configuration using two output nodes per channel.

Some final comments are in order. In this paper, we have not sought an optimal solution to a practical detection problem that is made very difficult by the properties of nonlinearity, nonstationarity, and non-Gaussianity that characterize many real-life signals. Indeed, it may well be that such a solution is nonexistent. What we have done is to describe a novel modular detection strategy built around a neural network with a constrained architecture, which learns from its environment and thereby acquires the ability to outperform a traditional receiver.

ACKNOWLEDGMENT VIII. CONCLUSIONS The authors are grateful to B. Currie and V. Kezys for their

help in collecting the radar datasets used in the study. They also wish to acknowledge the many helpful and critical inputs received from Dr. W. J. Williams, University of Michigan, Ann Arbor, and from anonymous reviewers, which have had a significant impact on shaping the paper in its present form.

REFERENCES

In this paper, we have described a modular receiver structure for the detection of a target signal buried in a nonstationary background. The primary design objective is to fully exploit the information content of the received signal in a com- putationally efficient manner. To achieve this objective, the receiver integrates the following tools in a principled fashion:

e the WVD, acting as the “carrier” of the full information content of the received signal,

for which the WVD image of the received signal provides a common input. Each channel is made up of a principal components analyzer for dimensionality reduction on the WVD image and a multilayer perceptron for pattern classification. One channel is adaptively matched to the interference acting alone, and the other channel is adaptively matched to the target signal plus interference,

e linear combining to combine the analog outputs of the two channels into a single overall output where the decision that a target signal is present or not is finally made.

[ 11 A J Viterbi, “Wireless digital communications A view based on three lessons leamed,” IEEE Commun. Mag, vol 29, pp. 33-36, Sept 1991.

[2] S. Haykin, “Neural networks expand SP’s horizons,” IEEE Signal Proessing Mag, vol. 13, Mar 1996

[3j L. Cohen, “Time-frequency distributions-A review,” Proc. IEEE, vol. 77, pp 941-981, 1989

[4] - , Tzme-Frequency Analysis. Englewood Cliffs, NJ: Prentice- Hall, 1995.

[51 B. Boashash, “Time-frequency signal analysis,” in Advances zn Spectrum Analysis and Array Processing, S. Haykin, Ed Englewood Cliffs, NJ Prentice-Hall, 1991, vol. 1, pp 418-517

[6] L Cohen, “Generalized phase-space distribution functions,” J Math Phys, vol 7, pp 781-786, 1966

[7] P Handnn, “Non-destructive evaluation in the time-frequency domain by means of the Wigner-Ville distribution,” in Signal Processing and Pattern Recognition in Nondestructive Evaluation of Materials, C H. Chen. Ed New York Springer-Verlag, 1988, pp 109-116

via multidimensional filter representation,” SPIE, Advanced Algorithllzs Architectures Signal Processing, vol. 1152, pp 437448, 1989

[9j L. B. White, “Time frequency filtering and synthesis from convex pro- jection:;,” SPIE Advanced Signal Processing Algorithms, Architectures Implementations, vol. 1348, pp 158-169, 1990

[lo] B D Forrester, “Time-frequency analysis in machine fault detection,” in Tzme-Frequency Signul Analysis, B Boashash, Ed London, U.K Longman, 1992, pp 406-423

U11 W. J . 7Vi11iams, H. P. Zaveri, and J C. Sackallares, “Time frequency analysis of electrophysiology signals in epilepsy,” IEEE Eng. Med Biol., pp. 13 1-143, Mar./Apr. 1995

[12] H. P. Zaveri, W J Williams, L D Jasemedis, and J. C. Sackallares, “Time-frequency representation of electrocardiograms in temporal like epilepsy,” IEEE Trans. Biomed. Eng , vol 39, pp 502-509, 1992.

[13] R. A Altes, “Signal processing for target recognition in bisonar,” Neural Networks, vol 8, pp 1275-1295

e different

Successful design of the new receiver rests on the premise that there is a sufficient number of real-life examples repre- sentative of the environment in which the receiver operates. Part of this database is used to train the receiver, and the remaining part is used to test it. In particular, the receiver undergoes a learning session that proceeds in two stages, where one is unsupervised and the other supervised. Accordingly, the synaptic weights (free parameters) of the receiver are adjusted in a systematic fashion whereby information contained in the examples about the environment is extracted and stored in those weights.

case study. Specifically, experimental results were presented

[8] Amin M, T Schiavonl, ctTime-varying

Operation Of the new receiver was using a

HAYKIN AND BHATTACHARYA: MODULAR LEARNING STRATEGY FOR SIGNAL DETECTION IN A NONSTATIONARY ENVIRONMENT 1631

N. Suga, “Bisonar and neural computation in bats,” Sri. Amer., vol. 262, no. 6, pp. 60-68, 1990. J. A. Simmons et al., “Composition of bisonar images for target recog- nition by echolocation bats,” Neural Networks, vol. 8, pp. 1239-1261, 1995. J. Fukunaga, Statistical Pattern Recognition, 2nd ed. New York: Aca- demic, 1990. P. Flandrin, “A time-frequency formulation of optimum detection,” IEEE Trans. Signal Processing, vol. 36, pp. 1377-1384, 1988. L. B. White and B. Boashash, “Time-frequency coherence in a theoret- ical basis for cross-spectral analysis of nonstationary signals,” in Proc. IASTED Int. Symp. Signal Processing. Appl., Brisbane, Australia, 1987,

S. S. Abeysekera and B. Boashash, “Methods of signal classification using the images produced by the Wigner-Ville distribution,” Pattern Recogn. Lett., vol, 12, pp. 717-729, 1991. F. Hlawatsch, “Regularity and unitarity of bilinear time-frequency signal representations,” IEEE Trans. Inform. Theory, vol. 38, pp. 82-94, Jan. 1992. - , “Bilinear time-frequency representations of signals: The shift- scale invariant class,’’ IEEE Trans. Signal Processing, vol. 42, pp. 357-366, Feb. 1994. T. K. Bhattacharya and S. Haykin, “Neural network-based radar detec- tion for an ocean environment,” IEEE Trans. Aerospace Electron. Syst., to be published. H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I. New York: Wiley, 1968. A. J. Viterbi and J. K. Omura, Principles of Digital Communication. New York McGraw-Hill, 1979. C. A. Stutt and L. J. Spafford, “A ‘best’ mismatched filter response for radar clutter discrimination,” IEEE Trans. Inform. Theory, vol. IT- 14, pp. 280-287, 1968. M. P. Perrone and L. N. Cooper, “Learning what’s been learned Supervised learning from multi-neural network systems,” in Proc. World Congr. Neural Networks, 1993, vol. 3, pp. 354-357. T. D. Sanger, “Optimal unsupervised learning in a single-layer linear feedfonvard neural network,” Neural Networks, vol. 1, pp. 459473, 1989. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning repre- sentations by back-propagation errors,” Nature, vol. 323, pp. 533-536, 1986. S. Haykin, Neural Networks: A Comprehensive Foundation. New York: Macmillan, 1994. S. Haykin, C. Krasnor, T. Nohara, B. Currie, and D. Hamburger, “A coherent dual-polarized radar for studying the ocean environment,” IEEE Trans. Geosci. Remote Sensing, vol. 29, pp. 189-191, 1991. Y. LeCun et al., “Handwritten digit recognition with a back-propagation network,” in Advances in Neural Information Processing Systems, D. S.

pp. 18-23.

Touretzky, Ed. San Francisco, CA: Morgan Kaufmann, 1989, vol. 2, pp. 39-04.

[32] P. Weber and S. Haykin, “IF’IX radar signal processing and detection,” CRL Rep. 319, Commun. Res. Lab., McMaster Univ., Hamilton, Ont., 1992.

[33] S. Haykin and T. K. Bhattacharya, “Wigner-Ville distribution: An important functional block for radar target detection in clutter,” in Con$ Rec. Twenty-Eighth Asilomar Con$ Signals Syst., Comput., Pacific Grove, CA, 1994, vol. 1, pp. 68-72.

Simon Haykin (F’82) received the B.Sc. degree (with first-class honors) in 1953, the Ph.D. degree in 1956, and the D.Sc. degree in 1967, all in electrical engineering, from the University of Birmingham, Birmingham, U.K.

He is the founding director of the Communica- tions Research Laboratory and Professor of electri- cal and computer engineering at McMaster Univer- sity, Hamilton, Ont., Canada. His research interests include nonlinear dynamics, neural networks, and adaotive filters and their aoolications. He is the edi- I.

tor of Adaptive and Learning Systeins for Signal Processing, Communications, and Control, a new series of books for Wiley-Interscience.

Dr. Haykin was elected Fellow of the Royal Society of Canada in 1980. He was awarded the McNaughtori Gold Medal of the IEEE (Region 7) in 1986.

Tarun Kumar Bhattacharva received the B.E. and Ph.D. degrees, both in electrical engineering, from the Indian Institute of Science, Bangalore, in 1984 and 1990, respectively.

From 1991 to 1994, he was at the Communica- tions Research Laboratory, McMaster University, Hamilton, Ont., Canada, developing applications of neural networks for signal processing. Since 1994, he has been with the Advanced Systems Development Group at Raytheon Canada Limited, Waterloo. Ont.. as a Senior Svstems Engineer. He , , Y

has been working in the field of signal processing and radars for the past 10 years. His research interests are in the general areas of neural networks, time- frequency signal analysis, adaptive signal processing, radar system design, and clutter modeling.