\u003ctitle\u003ecomparison of information theoretic divergences for sensor...

$: \u003ctitle\u003eComparison of information theoretic divergences for sensor management\u003c/title\u003e$
Comparison of Information Theoretic Divergences for Sensor Management

Chun Yang*a, Ivan Kadarb, Erik Blaschc, Michael Bakichc

aSigtem Technology, Inc., San Mateo, CA 94402 bInterlink Systems Sciences, Inc., Lake Success, NY 11042

cAir Force Research Lab/RYAAX, WPAFB, OH 45433

ABSTRACT In this paper, we compare the information-theoretic metrics of the Kullback-Leibler (K-L) and Renyi (�) divergence formulations for sensor management. Information-theoretic metrics have been well suited for sensor management as they afford comparisons between distributions resulting from different types of sensors under different actions. The difference in distributions can also be measured as entropy formulations to discern the communication channel capacity (i.e., Shannon limit). In this paper, we formulate a sensor management scenario for target tracking and compare various metrics for performance evaluation as a function of the design parameter (�) so as to determine which measures might be appropriate for sensor management given the dynamics of the scenario and design parameter. Keywords: Sensor management, Simultaneous tracking and ID, Performance evaluation, Information-theoretic measures, Kullback-Leiber divergence, Renyi �-divergence, Csiszar f-divergence, Entropy, Rényi �-entropy.

1. INTRODUCTION Layered sensing (LS) is a recent Air Force construct for an integrated-interacting-hierarchical system with dynamically controllable resolution [1, 2]. It is hierarchical in both resolution pyramid and assignment-tasking (i.e., resource management) senses for a multitude of sensors and sensor platforms. The fundamental salient attribute of LS is its purported capability to provide global coverage, persistent surveillance, timely detection, accurate localization and tracking, and the resolution required to achieve high confidence classification/identification of time critical and high value targets.

Tasks assigned to the sensing assets may include first identifying a region of interest/activity (i.e., detect where a high valued target is located) by space-based sensors and then tasking specific air or ground platforms via a resource manager. The assignment can be carried out in either a hierarchical resolution capability order or in a specific order depending on both the knowledge of selected/available sensors/platforms capability and availability, the relative geometry between the platforms and targets, the a priori information about the target, weather, and phenomenology, and the mission requirements so as to optimize specific task-dependent performance metrics within time constraints.

There are a large number of performance metrics that have been proposed in the past using the utility theory, information theory, and even simply geometry. The utility theory based metrics can be considered as a superset of techniques while others can be viewed as a “utility metric or objective function” to be optimized in some sense, thus being a subset.

However, a significant issue with utility theory based metrics is that the metrics to be optimized are task-specific and they are not necessarily commensurate in dimensions and units. For example, in target detection, the metrics are based on maximization of PD and minimization of PFA; in target classification, the objectives are to maximize probability of correct classification, and in target identification, the decision is based on the corresponding allegiance of the designated target. For target position location and tracking, the goals are to minimize mean squared errors of target state estimates.

The problem is alleviated when information-theoretic measures are used to serve as the common denominator. The information theory based metrics include entropy, Fisher information, Kullback-Leibler, Renyi (�) and Csiszar (f) divergences, and mutual information [3-8]. Indeed, these constructs have been applied to distributed resource management, communications management in decentralized target tracking and identity fusion [9, 10].

* [email protected]; Phone/Fax: (650) 312-1132; www.sigtem.com

Signal Processing, Sensor Fusion, and Target Recognition XX, edited by Ivan Kadar, Proc. of SPIE Vol. 8050, 80500C · © 2011 SPIE · CCC code: 0277-786X/11/$18 · doi: 10.1117/12.883745

Proc. of SPIE Vol. 8050 80500C-1

Downloaded From: http://spiedigitallibrary.org/ on 10/22/2013 Terms of Use: http://spiedl.org/terms

https://www.researchgate.net/publication/238973858_A_General_Class_of_Coefficients_of_Divergence_of_One_Distribution_from_Another?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

https://www.researchgate.net/publication/2820405_On_Information_and_Sufficiency?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

https://www.researchgate.net/publication/3086062_On_Divergences_and_Informations_in_Statistics_and_Information_Theory?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

https://www.researchgate.net/publication/265442753_Eine_Informationstheoretische_Ungleichung_und_ihre_Anwendung_auf_den_Bewis_der_Ergodizitat_on_Markhoffschen_Ketten?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

https://www.researchgate.net/publication/224330458_Metric_selection_for_information_theoretic_sensor_management?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

https://www.researchgate.net/publication/252731391_Information-theoretic_approach_to_management_in_decentralized_data_fusion?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

https://www.researchgate.net/publication/234812444_Data_Fusion_and_Sensor_Management_A_Decentralized_Information-Theoretic_Approach?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

In this paper, we consider the information-theoretic metrics of the Kullback-Leibler (K-L) and Renyi (�) divergence formulations for sensor management. To this end, we formulate a sensor management scenario for target tracking and compare the above information-theoretic metrics for performance evaluation as a function of the design parameter (�) so as to determine which measures might be appropriate for sensor management given the dynamics of the scenario and design parameter. The study leads to a novel approach that is different from prior approaches using Renyi (�) divergence in that it can adaptively estimate the parameter � over time as opposed to pre-setting it as a constant value independent of the changing underlying distributions.

The rest of the paper is organized as follows. In Section 2, various information-theoretic metrics are introduced. A tracking scenario is formulated in Section 3 with which the information-theoretic metrics of Kullback-Leibler (K-L) and Renyi (�) divergences are compared. In Section 4, we outline a novel approach that adaptively estimates the parameter � over time according to the changing underlying distributions. Finally, the paper is concluded with a summary and future work.

2. INFORMATION-THEORETIC METRICS We start out by first introducing the most general divergence, the f-divergence (Csiszar divergence), which forms a basis of a family of divergences, and subsequently focus on the description and application of two members of the family: K-L and �-divergences.

In probability theory, an ƒ-divergence is a function Df(P||Q) that measures the difference between two probability distributions P and Q. Intuitively, the divergence is an average, weighted by the function f, of the odds ratio given by P and Q [Wikipedia: f-divergence]. The divergences were introduced and studied independently by Csiszár [5], Morimoto [11], and Ali and Silvey [6], thus also known as Csiszár ƒ-divergences, Csiszár-Morimoto divergences or Ali-Silvey distances.

Let P and Q be two probability distributions over a space Ω such that P is absolutely continuous with respect to Q. Then, for a convex function f such that f(1) = 0, the f-divergence of Q from P is defined as:

��

��

��

�

��

� )()(

)()()||( xdxq

xqxpfdQ

dQdPfQPDf � (1)

where p and q are probability densities that satisfy dP = pdμ and dQ = qdμ and μ is a reference distribution on Ω with respect to which P and Q are both absolutely continuous.

Many common divergences are special cases of f-divergence for a particular choice of f. Examples include K-L divergence, Hellinger distance, total variation distance, and 2-divergence, as shown in Table 1 for a partial list of common divergences between probability distributions and their f function [4].

Table 1. Common Divergences and Their Choice of f Function [Wikipedia: f-divergence]

Divergence Corresponding f(t)

K-L divergence

Hellinger distance

Total variation distance

χ2-divergence

α-divergence



https://www.researchgate.net/publication/243552254_Markov_Processes_and_the_H-Theorem?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

https://www.researchgate.net/publication/238973858_A_General_Class_of_Coefficients_of_Divergence_of_One_Distribution_from_Another?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

https://www.researchgate.net/publication/3086062_On_Divergences_and_Informations_in_Statistics_and_Information_Theory?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

https://www.researchgate.net/publication/265442753_Eine_Informationstheoretische_Ungleichung_und_ihre_Anwendung_auf_den_Bewis_der_Ergodizitat_on_Markhoffschen_Ketten?el=1_x_8&enrichId=rgreq-e8b93620-ec6a-4a1c-ab09-d1135c938082&enrichSource=Y292ZXJQYWdlOzI1MzAzMDA4NztBUzo5OTQxMjcwNTU0NjI0MEAxNDAwNzEzMjQ0Nzg4

Of particular interest is the Kullback–Leibler (K-L) divergence (also known as information divergence, information gain, relative entropy) [3] used in probability theory and information theory as a non-symmetric measure of the difference between two probability distributions P and Q. With f(t) = -Log t from Table 1, the divergence from “P to Q” is defined as:

��

��

� )()(

)()(log)||( xdxp

xqxpQPDKL � (2)

where q > 0 and 0Log0 is interpreted as 0. The logarithm is taken to base 2 if the information is measured in units of bits, to base e if measured in nats, or to base 10 if measured in hartleys. In communications, K-L measures the expected number of extra bits required to code samples from P when using a code based on Q rather than using a code based on P.

As a generalization of the Kullback–Leibler divergence, the Renyi divergence of order � from a distribution P to a distribution Q (also known as the Renyi α-divergence) is defined as:

��

��

�

� )()()()(log

11)||( xdxp

xqxpQPD �

�

�

� (3)

The Renyi �-divergence is a special case of the Csiszar f-divergence as shown in Table 1. Furthermore, as � � 1, the Renyi �-divergence becomes the K-L divergence at the limit [3]. The parameter � places different emphasis on the tails of the two distributions.

The discrete version of the Renyi �-divergence is given as:

��

�

��

��

��

�

�n

iii

n

i i

i qpqpQPD

1

1

11 log

11log

11)||( ��

�

�

� �� (4)

Table 2 lists some special cases of the discrete Renyi divergence for different �.

Table 2. Special Cases of the Discrete Renyi �-Divergence [Wikipedia: Renyi Entropy]

Divergence for Special �� Meaning

minus the log probability under Q that pi>0

minus twice the logarithm of the Bhattacharyya coefficient

Kullback-Leibler divergence

log of the expected ratio of the probabilities

log of the maximum ratio of the probabilities

As an example, consider two Gaussian densities p0 and p1 with mean values ��0 and �1 and covariance matrices �0 and �1, respectively. Then, the Renyi �-divergence is given by:

� � μΣΣμΣΣ

ΣΣ��

��

�1

1010

110

01 )1(2|)1(|

||||log)1(2

1)||( ��

��

�TppD (5)

where �� = � 1 - �0 and |▪| stands for the determinant of a square matrix.





Similarly, the K-L divergence for two Gaussian densities is:

��

��

�� μΣμΣΣ

ΣΣ 1

011

01

001 )(

||||log

21)||( T

KL dTrppD (6)

where d is the dimension of mean vectors.

3. COMPARISON OF TWO GAUSSIAN DISTRIBUTIONS IN 1D TARGET TRACKING SCENARIO Numerical examples are constructed to gain insight into information-theoretic metrics with the ultimate goal to use them for resource management in target tracking. The key is to appreciate how the optimization based on information-theoretic metrics is carried out in the trade space for resource management in target tracking.

Consider a one-dimensional (1D) case where a target is moving along the x-axis. The prior is a Gaussian with mean �0 and variance �0, that is, x0 ~ N(�0, �0). Assume that a sensor is located at the origin of the x-axis, which has several sensing modes, each providing linear measurements of the target as:

iiii vbxhz �� (7)

where hi = 1, bi = �i is a constant bias, and the measurement error vi is Gaussian with zero mean and variance �i, and i = 1, …, m denotes one of the m possible sensing actions.

Under this measurement model, each sensing action will lead to a posterior error distribution: xi ~ N(�i, �i). We are interested in evaluating the divergence between N(�0, �0) and N(�i, �i) as a function of the design parameter �. This leads effectively to the case with two Gaussian distributions.

We consider three cases. In Case 1, �� = �i - �0 = 0 and the �- and KL divergences respectively become:

201

)1(201

10

110

01 )/)(1()/(ln

)1(21

|)1(|||||ln

)1(21)||(

��

��

��

� ��

��

��

ΣΣΣΣppD (8)

��

�

�

��

��

�

��

��

�

��

��

�

��

�� 1ln2

21)(

||||ln

21)||(

2

0

1

2

11

10

1

001 �

��dTrppDKL ΣΣ

ΣΣ

(9)

From (8) and (9), it is clear that both D� and DKL are function of the ratio �1/�0. However, they are not symmetric in terms of p1 and p0, which is one reason that the divergences are not a distance measure. To ensure the positivity of the denominator 10 )1( ΣΣ �� or 2

01 )/)(1( �� in (8), it is important to set the range of value for � such that 0 < � < 1.

Fig. 1 shows D� and DKL as function of the ratio �1/�0 (on the logarithmic scale) for a number of choices of � from 0 to 1. The �-divergence is zero at �1/�0 = 1 and it has different rates of change for �1/�0 > 1 (larger slope) and �1/�0 < 1 (smaller slope).

The details for 0.4 < �1/�0 < 2.5 are shown in Fig. 2. DKL is also zero at �1/�0 = 1. However, the �-divergence appears to show negative values over certain region of �1/�0 > 1 when � > 1.

The left hand side of D� and DKL, i.e., �1/�0 < 1, is useful for the divergences to be used as an optimization metric. In target tracking, consider p0 as the a priori density of the state while p1 is the a posteriori density after a measurement update. Assume any measurement update would not increase the estimation error covariance, the divergence of p1 to p0 is therefore on the left hand side (�1/�0 < 1) and maximizing the divergence of p1 to p0 is equivalent to reducing the covariance of p1 relative to p0. In other words, we gain more information to obtain p1 by reducing the uncertainty of p0.

Actually, we can switch the roles of p1 and p0 in the formulations to use the region with higher slopes (more sensitive to dissimilarity between densities) as the operational region. In this sense, the best choice is � = 1, that is, the K-L divergence.



Fig. 1. Divergences as Function of �1/�0 Fig. 2. Details of Fig. 1 around 0

The right hand side of Fig. 1 or Fig. 2 (i.e., �1/�0 > 1), although showing a growing �-divergence, actually is not desired. This is because in this region the uncertainty is not reduced but increased. A sensor manager should be smart, not to selecting such sensor configurations. It does occur in practice where several sensors have nearly co-linear lines of slight to a target. In other words, the �-divergence can effectively measure the dissimilarity between two distributions but cannot tell which one is “good” or “bad.”

In Case 2, ��1 = �0 = � and the �- and KL divergences respectively become:

2

101 22)||( �

��

� �

��

��

�uppD T μΣμ (10)

2

101 2

121)||( �

��

� �

��

��μΣμT

KL ppD (11)

Fig. 3 shows D� and DKL as function of the ratio ��/� for a number of choices of � from 0 to 1. The details are shown in Fig. 4. The �-divergence attributed to mean errors is symmetric and it is zero at ��= 0. Compared to Fig. 1, the divergence value is close to 1.0 for 3� errors in mean whereas it is about 0.4 (left) and 0.7 (right) for doubled standard deviations. This implies that the mean errors may dominate the variance errors when measuring dissimilarity between two Gaussian distributions.

Again, consider target tracking where p0 is the a priori density of the state and p1 is the a posteriori density after a measurement update. Assume that any measurement update would push the state estimate �1 toward the true value, then maximizing the divergence of p1 to p0 is equivalent to moving the mean �1 away from �0 to the true value.

In Case 3, �1 ≠ �0 and �� ≠ 0 and the �- and KL divergences assume the general forms given in (5) and (6), respectively, as:

201

20

201

)1(201

01 )/)(1()/(

2)/)(1()/(ln

)1(21)||(

��

��

�

�

� ��

��

�� uppD (12)

��

�

�

��

��

��

��

�

��

��

�

��

��

2

0

2

0

1

0

101 1ln2

21)||(

��

��

��ppDKL (13)

Consider the variance ratio �1/�0 = 0.1 to 10 and the mean error to variance ratio ��/�0 = -4 to 4 with � = 0.25, 0.5, 0.75, 0.9 and 0.9999. Fig. 5 shows the K-L divergence as a function of �1/�0 and ��/�0, which is symmetric about ��/�0 = 0 and asymmetric about �1/�0 = 1. The minimum of zero is reached at ��/�0 = 0 and �1/�0 = 1.

-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1

0

5

10

15

20

25

30

35

40

45

log10(�1/�0)

dive

rgen

cetwo 1D gaussians with �1 = �0

� = 0

� = 1

K-L divergence� = 0.5

-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4

0

0.1

0.2

0.3

0.4

0.5

0.6

log10(�1/�0)

dive

rgen

ce

two 1D gaussians with �1 = �0




Fig. 3. Divergences as Function of ��/� Fig. 4. Details of Fig. 3 around 0

Figs. 6 through 10 show the �-divergence as a function of �1/�0 and ��/�0 for�� = 0.25, 0.5, 0.75, 0.9 and 0.9999, respectively. The surfaces are also symmetric about ��/�0 = 0 but asymmetric about �1/�0 = 1. The minimum of zero is reached at ��/�0 = 0 and �1/�0 = 1. As � goes close to 1, the �- divergence surface in Fig. 10 becomes close to the K-L divergence surface in Fig. 6.

The effect of�� on the �-divergence is evident from these figures. As � increases from a small value of 0.25 to a large value of 0.9999, the asymmetric slopes switch sides relative to �1/�0 = 1. Such an effect was not visible from the curves in Fig. 4 with ��/�0 = 0. That is, away from ��/�0 = 0, the large slopes appear in the side of �1/�0 < 1 for small � but in the side of �1/�0 > 1 for large �.

Since �1/�0 < 1 is likely the region the measurement updating occurs, the use of a small � thus provides a large sensitivity (or selectivity) of the �-divergence for different sensors and updating configurations. This may explain why the �-divergence with � = 0.5 (see Fig. 8) sometimes outperforms the K-L divergence, which is equivalent to the �-divergence with � = 1 (see Fig. 6 vs. Fig. 10). More results can be found in [12].

From the above 3D plots, one can see that a first sensor producing a posterior distribution N(�1, �0) with �1 � �0 may generate the same �-divergence value as a second sensor producing a posterior distribution N(�0, �1) with �1 < �0 relative to the same prior distribution N(�0, �0). Clearly the second sensor is preferred from the point of view of uncertainty reduction. However, the use of��-divergence cannot discern the two. The first sensor may represent biased measurements due to multipath or clutter arising in that sensor’s particular operating environment.

4. INFORMATION-THEORETIC SENSOR MANAGEMENT ANALYSIS [13] Several papers have addressed the use of information theoretic methods based on, e.g., entropy, mutual information, K-L divergence, and Renyi (��-divergence to resource management (RM), with applications in target tracking, a combination of tracking and identification, and distributed and decentralized systems and networks. Hintz [14, 15] applied the information theory to sensor management cueing. Schmaedeke and Kastella utilized information-theory methods for sensor management [16-19]. Blasch applied information theory to image registration for target classification [20] and subsequently for target detection and identification in a simultaneous target tracking and identification (STID) scenario [21]. Other areas where information theoretic measures have been applied include measurement request for tracking with UGS [22], ontology alignment [23], and image fusion [24, 25]. Methods developed with the �-divergence can enhance multiple areas in information fusion.

Research over the past few years has demonstrated the advantages in the application of (��-divergence with respect to Kullback-Leibler (K-L) implementations to resource management (RM) but the selection of � was based in part on empirical methods. One paper by Kreucher, Kastella, and Hero [26] with applications to target tracking and sensor RM using Monte Carlo simulation showed that over a range of � (at values of 0.1, 0.5, and 0.9999) there was no significant

-10 -8 -6 -4 -2 0 2 4 6 8 100

5

10

15

20

25

30

35

40

45

50

��/�

dive

rgen

cetwo 1D gaussians with �1 = �0

� = 0

� = 1


-3 -2 -1 0 1 2 30

0.2

0.4

0.6

0.8

1

��/�

dive

rgen

ce

two 1D gaussians with �1 = �0




Fig. 5. K-L Divergence vs �1/�0 and ��/�0 Fig. 6. �-Divergence as vs �1/�0 and ��/�0

(� = 0.25)

Fig. 7. �-Divergence as vs �1/�0 and ��/�0

(� = 0.5)


(� = 0.75)


(� = 0.9) Fig. 10. �-Divergence as vs �1/�0 and ��/�0

(� = 0.9999)

-4-2

02

4

-1-0.5

0

0.510

10

20

30

40

50

60

��/�0

two gaussian distributions

log10(�1/�0)

K-L

div

erge

nce

-4-2

02

4

-1-0.5

0

0.510

1

2

3

4

5

6

��/�0

two gaussian distributions: �= 0.25

log10(�1/�0)

� d

iver

genc

e

-4-2

02

4

-1-0.5

0

0.510

2

4

6

8

��/�0


log10(�1/�0)

� d

iver

genc

e

-4-2

02

4

-1-0.5

0

0.510

2

4

6

8

10

12

��/�0


log10(�1/�0)

� d

iver

genc

e

-4-2

02

4

-1-0.5

0

0.510

10

20

30

40

50

60

��/�0


log10(�1/�0)

� d

iver

genc

e



change in performance in specific problems. In addition, empirical comparison has shown that α = 0.5 provided best RM performance, illustrated by comparing target position and velocity RMS errors given similar densities over different values of α [26]. Using the above methods, they applied their techniques to STID [27] and sensor scheduling [28].

Four related papers [9, 29- 31] applied (��-divergence to content-based image retrieval, geo-registration of imagery, and entropic spanning graphs and proved, in the case of applications to entropic spanning graphs by asymptotic analysis, that the value of ��= 0.5 emphasizes the tails of the similar distributions and allows the maximum discrimination between two similar densities, which clearly depends on the application. Therefore, in all known applications of (��-divergence, � was assigned at constant values, as well in the application of (��-divergence multi-target tracking and identification [26] as noted above. Furthermore, it has also been implied [e.g., 26, 27] that values of � are related to the tail of the distributions and if the distribution is long-tailed the values of � should be approaching unity since the distributions are likely dissimilar and clearly non-Normal.

Fig. 11 shows a new idea of dynamic/adaptive ��selection for kinematics tracker data. The central idea is to dynamically/adaptively select � over time, based on the underlying changing distribution parameters, to potentially fine tune its application to resource management in tracking and identification functions [13]. The key is how to dynamically select � over time given the time series of measurement data within the tracker without incurring long time delays effecting tracker and overall system operation. That is, one needs to store m sample length of data in batch (memory) to estimate/compute the necessary parameters, which introduces m -units of time delay. In addition, all computations need to be performed within the m-units of delay. The data go through the system sequentially incurring an initial delay of m-units.

As shown, the data values are on-line stream of tracker data stored in batch (memory) of m-sample time lengths, which get continually refreshed every m-sample time intervals. This process introduces an m sample time length delay. During the m sample batch length time delay all indicated computations are performed, and subsequent computations repeatedly performed m-sample delays apart.

Thus, the mean, median, associated shift of location between distributions, covariance and tail of the distributions parameters are estimated over m-sample intervals as depicted in the Fig. 11. The shift in location is tested by the Mann-Whitney-Wilcoxon-Nonparametric-Statistics (MWWNS) [32] to assess a potential undetected change in the mean or median values between the distributions, while being independent of scale. The � value is dynamically/adaptively selected by using the distribution parameters as a key in the LUT in every m-sample interval. The LUT is based in part on the simulations results depicted in Section 3, coupled with a threshold-setting of the kurtosis value to measure the tail of the distribution and to determine potential non-Normality [13]. The selected � value is used in the expected value (conditional mean) of (��-divergence compute function.

Fig. 11 Dynamic/Adaptive ��Selection Construct: Parameter Estimation Coupled with KBS LUT [13]



5. CONCLUSIONS In this paper, we compared the information-theoretic Kullback-Leibler (K-L) and Renyi (�) divergences for sensor management in target tracking. The divergence measures the dissimilarity between two distributions but cannot tell a “good” from a “bad.” It implies that all sensor configurations that are presented to the sensor manager for ranking according to the information-theoretic divergence need to be feasible in the sense that every sensor configuration can reduce the uncertainty but not the opposite as in a poor geometry with nearly co-linear observations. In addition, both the difference in mean and variance affect the divergence, which is emphasized by the selection of � in one direction or the other. The findings are consistent with those in [10] that the selection of � shows preference of size over orientation and vice versa. As part of our ongoing efforts, an approach to dynamical/adaptive selection of � over time based on the underlying changing distribution parameters was outline for fine tuning of a resource management system. The ��preference will be further studied for its uniqueness in sensor management solution, ability against measurement outliers (large tail events), connection to other methodologies, and geometrical interpretation of uncertainty reduction. Ultimately it will be applied to simultaneous tracking and identification (STID) with GMTI systems coupled with an HRR mode [33].

ACKNOWLEDGEMENT Research supported in part under Contract No. FA8650-08-C-1407, which is greatly appreciated.

REFERENCES [1] Bryant, M., Johnson, P., Kent, B., Nowak, M. and Rogers, S., “Layered Sensing,” AFRL Sensors Directorate,

WPAFB, Dayton, OH, May (2008). [2] Sciabica, J., “Cyberspace”, Presentation at INFOTech 2007. [3] Kullback, S. and Leibler, R.A., “On Information and Sufficiency,” Annals of Mathematical Statistics, 22 (1), 79–

86, (1951). [4] Liese, F., and Vajda, I., “On Divergences and Information in Statistics and Information Theory,” IEEE

Transactions on Information Theory, 52 (10): 4394–4412, 2006. [5] Csiszár, I., “Eine informationstheoretische Ungleichung und ihre Anwendung auf den Beweis der Ergodizitat von

Markoffschen Ketten,” Magyar. Tud. Akad. Mat. Kutato Int. Kozl, (8), 85–108, (1963). [6] Ali, S.M., and Silvey, S.D., “A General Class of Coefficients of Divergence of One Distribution from Another,”

Journal of the Royal Statistical Society, Series B 28 (1), 131–142, (1966). [7] Manyika, J. M., and Durrant-Whyte, H. F., “An Information-Theoretic Approach to Management in Decentralized

Data Fusion,” Proc. SPIE 1828, (1992). [8] Manyika, J. M., and Durrant-Whyte, H. F., [Data Fusion and Sensor Management: A Decentralized Information-

Theoretic Approach], Prentice Hall, New York (1994). [9] Hero, A.O., Ma, B., Michel, O., and Gorman, J.D., Alpha Divergence for Classification, Indexing and Retrieval

(Revised 2), Communications and Signal Processing Laboratory Report, CSPL-328, Department of EECS, University of Michigan, Ann Arbor, May, 2001 (Revised Dec.2002).

[10] Aughenbaugh, J.M., and La Cour, B.R., “Metric Selection for Information Theoretic Sensor Management,” Proc. of 11th Int. Conf. on Information Fusion, Fusion’2008, (2008)

[11] Morimoto, T., “Markov Processes and the H-theorem,” J. Phys. Soc. Jap., 18 (1), 328–331, (1963). [12] Yang, C., Performance Monitoring and Prediction for Active Management of Distributed Sensors Fusion in

Target Tracking, Status Report, Oct. (2010). [13] Kadar, I., Information-Theoretic Sensor Management Analysis: Comparison of Information Theoretic Divergences

for Sensor Management, Progress Report, March (2011). [14] Hintz, K.J. and McVey, E.S., “Multi-Process Constrained Estimation,” IEEE Trans. on Systems, Man, and

Cybernetics, 21(1), 237-244, (1991). [15] Hintz, K. J., "A Measure of the Information Gain Attributable to Cueing", IEEE Trans. on Systems, Man, and

Cybernetics, 21 (2), 434-442, (1991). [16] Schmaedeke, W., “Information Based Sensor Management,” Proc. SPIE 1955, (1993).



[17] Schmaedeke, W. and Kastella, K. D., “Event-Averaged Maximum Likelihood Estimation and Information-Based Sensor Management, Proc. SPIE 2232, (1994).

[18] Kastella, K., “Discrimination Gain to Optimize Detection and Classification,” IEEE Sys. Man, and Cybernetics-A, 27 (1), 112-116, (1997).

[19] Schmaedeke, W. and Kastella. K., “Information Based Sensor Management and IMMKF,” Proc. SPIE 3373, (1998).

[20] Blasch, E. P. and Bryant, M., “Information Assessment of SAR Data for ATR,” Proceedings of IEEE National Aerospace and Electronics Conference, 414 – 419, (1998).

[21] Blasch, E., Derivation of a Belief Filter for Simultaneous HRR Tracking and Identification, Ph.D. Thesis, Wright State University, (1999).

[22] Blasch, E. P., Maupin, P., and Jousselme, A-L., “Sensor-Based Allocation for Path Planning and Area Coverage Using UGSs,” Proc. IEEE NAECON, (2010).

[23] Blasch, E. P., Dorion, É., Valin, P., and Bossé, E., “Ontology Alignment using Relative Entropy for Semantic Uncertainty Analysis,” Proc. IEEE NAECON, (2010).

[24] Liu, Z., Blasch, E., Xue, Z., Langaniere, R. and Wu, W., “Objective Assessment of Multiresolution Image Fusion Algorithms for Context Enhancement in Night Vision: A Comparative Survey,” to Appear in IEEE Trans. on Pattern Analysis and Machine Intelligence, (2011).

[25] Blasch, E., and Liu, Z., “LANDSAT Satellite Image Fusion Metric Assessment,” Submitted to Proc. Int. Conf. on Information Fusion, (2011).

[26] Kreucher, C., Kastella, K. and Hero, A., “Information-based Sensor Management for Multitarget Tracking,” Proc. SPIE 5204, 480 - 489, (2003).

[27] Kreucher, C., Hero, A., Kastella, K., and Shapo, B., “Information-Based Sensor Management for Simultaneous Multitarget Tracking and Identification”, Proc. of The Thirteenth Annual Conference on Adaptive Sensor Array Processing (ASAP), (2005).

[28] Kreucher, C., Blatt, D., Hero, A., and Kastella, K., “Adaptive Multi-Modality Sensor Scheduling for Detection and Tracking of Smart Targets,” Digital Signal Processing, 16 (5), 546-567, (2006).

[29] Hero, A.O., Ma, B., Michel, O., and Gorman, J.D., “Applications of Entropic Spanning Graphs,” IEEE Signal Processing Magazine (Special Issue on Math. Imaging), 19 (5), 85-95, (2002).

[30] Hero, A. O., Costa, J., and Ma, B., Asymptotic Relations between Minimal Graphs and Alpha-Entropy, Technical Report CSPL-334 Communications and Signal Processing Laboratory, The University of Michigan, 48109-2122, Mar. (2003).

[31] Neemuchwala, H., Hero, A., and Carson, P., Image Matching Using Alpha-Entropy Measures and Entropic Graphs, Dept. of Biomedical Engineering, Dept. of EECS , Dept. of Statistics, and Dept. of Radiology, The University of Michigan Ann Arbor, MI 48109, USA. August 26, (2004).

[32] http://en.wikipedia.org/wiki/Mann-Whitney_U [33] Blasch, E., “Simultaneous Tracking and Identification for Persistent Surveillance,” Plenary Talk in Proc. IEEE

NAECON, Dayton, OH, July (2010).



\u003ctitle\u003ecomparison of information theoretic divergences for sensor...

Documents