riemannian optimization and the computation of the divergences and the karcher...

51
Riemannian Optimization and the Computation of the Divergences and the Karcher Mean of Symmetric Positive Definite Matrices Xinru Yuan 1 , Wen Huang 2 , Kyle A. Gallivan 1 , and P.-A. Absil 3 1,Florida State University 2,Rice University 3,Universit´ e Catholique de Louvain May 7, 2018 SIAM Conference on Applied Linear Algebra Means of SPD Matrices 1

Upload: others

Post on 22-Mar-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Riemannian Optimization and the Computation ofthe Divergences and the Karcher Mean of

    Symmetric Positive Definite Matrices

    Xinru Yuan1, Wen Huang2, Kyle A. Gallivan1, and P.-A. Absil3

    1,Florida State University 2,Rice University 3,Université Catholique de Louvain

    May 7, 2018SIAM Conference on Applied Linear Algebra

    Means of SPD Matrices 1

  • Symmetric Positive Definite (SPD) Matrix

    Definition

    A symmetric matrix A is called positive definite iff all its eigenvalues arepositive.

    2× 2 SPD matrix

    u√λu

    v√λv

    3× 3 SPD matrix

    u√λu v√

    λv

    w√λw

    Means of SPD Matrices 2

  • Motivation of Averaging SPD Matrices

    Possible applications of SPD matrices

    - Diffusion tensors in medical imaging[CSV12, FJ07, RTM07]

    - Describing images and video[LWM13, SFD02, ASF+05, TPM06,HWSC15]

    Motivation of averaging SPD matrices

    - Aggregate several noisy measurements of the same object

    - Subtask in interpolation methods, segmentation, and clustering

    Means of SPD Matrices 3

  • Averaging Schemes: from Scalars to Matrices

    Let A1, . . . ,AK be SPD matrices.

    Generalized arithmetic mean: 1K

    K∑i=1

    Ai

    → Not appropriate in many practical applications

    A A+B2 B

    detA = 50 det(A+B2 ) = 267.56 detB = 50

    Generalized geometric mean: (A1 · · ·AK )1/K

    → Not appropriate due to non-commutativity→ How to define a matrix geometric mean?

    Means of SPD Matrices 4

  • Averaging Schemes: from Scalars to Matrices

    Let A1, . . . ,AK be SPD matrices.

    Generalized arithmetic mean: 1K

    K∑i=1

    Ai

    → Not appropriate in many practical applications

    A A+B2 B

    detA = 50 det(A+B2 ) = 267.56 detB = 50

    Generalized geometric mean: (A1 · · ·AK )1/K

    → Not appropriate due to non-commutativity→ How to define a matrix geometric mean?

    Means of SPD Matrices 4

  • Averaging Schemes: from Scalars to Matrices

    Let A1, . . . ,AK be SPD matrices.

    Generalized arithmetic mean: 1K

    K∑i=1

    Ai

    → Not appropriate in many practical applications

    A A+B2 B

    detA = 50 det(A+B2 ) = 267.56 detB = 50

    Generalized geometric mean: (A1 · · ·AK )1/K

    → Not appropriate due to non-commutativity→ How to define a matrix geometric mean?

    Means of SPD Matrices 4

  • Desired Properties of a Matrix Geometric Mean

    The desired properties are given in the ALM list1, some of which are:

    G (Aπ(1), . . . ,Aπ(K)) = G (A1, . . . ,AK ) with π a permutation of (1, . . . ,K)

    if A1, . . . ,AK commute, then G(A1, . . . ,AK ) = (A1, . . . ,AK )1/K

    G(A1, . . . ,AK )−1 = G(A−11 , . . . ,A

    −1K )

    det(G(A1, . . . ,AK )) = (det(A1) · · · det(AK ))1/K

    1T. Ando, C.-K. Li, and R. Mathias, Geometric means, Linear Algebra and ItsApplications, 385:305-334, 2004

    Means of SPD Matrices 5

  • Geometric Mean of SPD Matrices

    A well-known mean on the manifold of SPD matrices is the Karchermean [Kar77]:

    G (A1, . . . ,AK ) = arg minX∈Sn++

    1

    2K

    K∑i=1

    δ2(X ,Ai ), (1)

    where δ(X ,Y ) = ‖ log(X−1/2YX−1/2)‖F is the geodesic distanceunder the affine-invariant metric

    g(ηX , ξX ) = trace(ηXX−1ξXX

    −1)

    The Karcher mean defined in (1) satisfies all the geometricproperties in the ALM list [LL11]

    Means of SPD Matrices 6

  • Algorithms

    G (A1, . . . ,Ak) = arg minX∈Sn++

    1

    2K

    K∑i=1

    δ2(X ,Ai ),

    Riemannian steepest descent [RA11, Ren13]

    Riemannian Barzilai-Borwein method [IP15]

    Riemannian Newton method [RA11]

    Richardson-like iteration [BI13]

    Riemannian steepest descent, conjugate gradient, BFGS, and trustregion Newton methods [JVV12]

    Limited-memory Riemannian BFGS method [YHAG16]

    Means of SPD Matrices 7

  • Conditioning of the Objective Function

    Hemstitching phenomenonfor steepest descent

    well-conditioned Hessian ill-conditioned Hessian

    Small condition number ⇒ fast convergence

    Large condition number ⇒ slow convergence

    Means of SPD Matrices 8

  • Conditioning of the Karcher Mean Objective Function

    Riemannian metric:

    gX (ξ, η) = trace(ξX−1ηX−1)

    Euclidean metric:

    gX (ξ, η) = trace(ξη)

    Condition number κ of Hessian at the minimizer µ:

    Hessian of Riemannian metric:

    - κ(HR) ≤ 1 + ln(maxκi )2

    , where κi = κ(µ−1/2Aiµ

    −1/2)

    - κ(HR) ≤ 20 if max(κi ) = 1016

    Hessian of Euclidean metric:

    -κ2(µ)

    κ(HR)≤ κ(HE) ≤ κ(HR)κ2(µ)

    - κ(HE ) ≥ κ2(µ)/20

    Means of SPD Matrices 9

  • BFGS Quasi-Newton Algorithm: from Euclidean to Riemannian

    replace by Rxk (ηk)

    Update formula:

    y

    xk+1 = xk + αkηk

    Search direction:ηk = −B−1k grad f (xk)

    Bk update:

    Bk+1 = Bk −Bksks

    Tk Bk

    sTk Bksk+

    ykyTk

    yTk sk,

    forspacewhere sk = xk+1 − xk , and yk = grad f (xk+1)− grad f (xk)

    x xreplaced by R−1xk (xk+1) on different tangent spaces

    Means of SPD Matrices 10

  • BFGS Quasi-Newton Algorithm: from Euclidean to Riemannian

    replace by Rxk (ηk)

    Update formula:y

    xk+1 = xk + αkηk

    Search direction:ηk = −B−1k grad f (xk)

    Bk update:

    Bk+1 = Bk −Bksks

    Tk Bk

    sTk Bksk+

    ykyTk

    yTk sk,

    forspacewhere sk = xk+1 − xk , and yk = grad f (xk+1)− grad f (xk)

    x xreplaced by R−1xk (xk+1) on different tangent spaces

    Means of SPD Matrices 10

  • BFGS Quasi-Newton Algorithm: from Euclidean to Riemannian

    replace by Rxk (ηk)

    Update formula:y

    xk+1 = xk + αkηk

    Search direction:ηk = −B−1k grad f (xk)

    Bk update:

    Bk+1 = Bk −Bksks

    Tk Bk

    sTk Bksk+

    ykyTk

    yTk sk,

    forspacewhere sk = xk+1 − xk , and yk = grad f (xk+1)− grad f (xk)x x

    replaced by R−1xk (xk+1) on different tangent spaces

    Means of SPD Matrices 10

  • Riemannian BFGS (RBFGS) Algorithm

    Update formula:

    xk+1 = Rxk (αkηk) with ηk = −B−1k grad f (xk)

    Bk update [HGA15]:

    Bk+1 = B̃k −B̃k sk(B̃k sk)[

    (B̃k sk)[sk+

    yky[ky[k sk

    ,

    where sk = Tαkηkαkηk , yk = β−1k grad f (xk+1)− Tαkηk grad f (xk),

    and B̃k = Tαkηk ◦ Bk ◦ T −1αkηk .

    Stores and transports B−1k as a dense matrix

    Requires excessive computation time and storage space forlarge-scale problem

    Means of SPD Matrices 11

  • Limited-memory RBFGS (LRBFGS)

    Riemannian BFGS:

    Bk+1 = B̃k − B̃k sk (B̃k sk )[

    (B̃k sk )[sk+

    yky[ky[k sk

    ,

    where sk = Tαkηkαkηk , yk = β−1k grad f (xk+1)− Tαkηk grad f (xk),

    and B̃k = Tαkηk ◦ Bk ◦ T −1αkηk

    Limited-memory Riemannian BFGS:

    Stores only the m most recent sk and yk

    Transports those vectors to the new tangent space rather than theentire matrix B−1kComputational and storage complexity depends upon m

    Means of SPD Matrices 12

  • Implementations

    Representations of tangent vectors

    : TX Sn++ = {S ∈ Rn×n|S = ST}

    Extrinsic representation: n2-dimensional vector

    Intrinsic representation: d-dimensional vector where d = n(n + 1)/2

    Retraction

    Exponential mapping: ExpX (ξ) = X1/2 exp(X−1/2ξX−1/2)X 1/2

    Second order approximation retraction [JVV12]:

    RX (ξ) = X + ξ +1

    2ξX−1ξ

    Vector transport

    Parallel translation: Tpη (ξ) = QξQT , with Q = X12 exp(

    X−12 ηX−

    12

    2)X−

    12

    Vector transport by parallelization [HAG15]: essentially an identity

    Means of SPD Matrices 13

  • Implementations

    Representations of tangent vectors: TX Sn++ = {S ∈ Rn×n|S = ST}

    Extrinsic representation: n2-dimensional vector

    Intrinsic representation: d-dimensional vector where d = n(n + 1)/2

    Retraction

    Exponential mapping: ExpX (ξ) = X1/2 exp(X−1/2ξX−1/2)X 1/2

    Second order approximation retraction [JVV12]:

    RX (ξ) = X + ξ +1

    2ξX−1ξ

    Vector transport

    Parallel translation: Tpη (ξ) = QξQT , with Q = X12 exp(

    X−12 ηX−

    12

    2)X−

    12

    Vector transport by parallelization [HAG15]: essentially an identity

    Means of SPD Matrices 13

  • Implementations

    Representations of tangent vectors: TX Sn++ = {S ∈ Rn×n|S = ST}

    Extrinsic representation: n2-dimensional vector

    Intrinsic representation: d-dimensional vector where d = n(n + 1)/2

    Retraction

    Exponential mapping: ExpX (ξ) = X1/2 exp(X−1/2ξX−1/2)X 1/2

    Second order approximation retraction [JVV12]:

    RX (ξ) = X + ξ +1

    2ξX−1ξ

    Vector transport

    Parallel translation: Tpη (ξ) = QξQT , with Q = X12 exp(

    X−12 ηX−

    12

    2)X−

    12

    Vector transport by parallelization [HAG15]: essentially an identity

    Means of SPD Matrices 13

  • Implementations

    Representations of tangent vectors: TX Sn++ = {S ∈ Rn×n|S = ST}

    Extrinsic representation: n2-dimensional vector

    Intrinsic representation: d-dimensional vector where d = n(n + 1)/2

    Retraction

    Exponential mapping: ExpX (ξ) = X1/2 exp(X−1/2ξX−1/2)X 1/2

    Second order approximation retraction [JVV12]:

    RX (ξ) = X + ξ +1

    2ξX−1ξ

    Vector transport

    Parallel translation: Tpη (ξ) = QξQT , with Q = X12 exp(

    X−12 ηX−

    12

    2)X−

    12

    Vector transport by parallelization [HAG15]: essentially an identity

    Means of SPD Matrices 13

  • Complexity Comparison for LRBFGS

    Extrinsic approach:

    Function

    Riemannian gradient

    Intrinsic approach:

    Function

    Riemannian gradient

    Both approaches have the same complexities: f + ∇f cost

    Means of SPD Matrices 14

  • Complexity Comparison for LRBFGS

    Extrinsic approach:

    Function

    Riemannian gradient

    Retraction

    - Evaluate RX (η)

    Intrinsic approach:

    Function

    Riemannian gradient

    Retraction

    - Compute η from η̃d

    - Evaluate RX (η)

    Intrinsic cost = Extrinsic cost + 2n3 + o(n3)

    Means of SPD Matrices 15

  • Complexity Comparison for LRBFGS

    Extrinsic approach:

    Function

    Riemannian gradient

    Retraction

    Riemannian metric

    - 6n3 + o(n3)

    Intrinsic approach:

    Function

    Riemannian gradient

    Retraction

    Reduces to Euclidean metric

    - n2 + o(n2)

    Means of SPD Matrices 16

  • Complexity Comparison for LRBFGS

    Extrinsic approach:

    Function

    Riemannian gradient

    Retraction

    Riemannian metric

    (2m) times of vector transport

    Intrinsic approach:

    Function

    Riemannian gradient

    Retraction

    Reduces to Euclidean metric

    No explicit vector transport

    Means of SPD Matrices 17

  • Complexity Comparison for LRBFGS

    Extrinsic approach:

    Function

    Riemannian gradient

    Retraction

    Riemannian metric

    (2m) times of vector transport

    Intrinsic approach:

    Function

    Riemannian gradient

    Retraction

    Reduces to Euclidean metric

    No explicit vector transport

    Complexity comparison:

    f +∇f +27n3 + 12mn2+

    2m × Vector transport cost

    f +∇f +22n3/3 + 4mn2

    Means of SPD Matrices 18

  • Numerical Results: Comparison of Different Algorithms

    K = 100, size = 3× 3, d = 6

    1 ≤ κ(Ai ) ≤ 200

    iterations0 5 10 15 20 25 30 35 40 45 50

    dist(µ,X

    t)

    10-15

    10-10

    10-5

    100

    RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 4RBFGS

    time (s)0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04

    dist(µ,X

    t)

    10-15

    10-10

    10-5

    100

    RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 4RBFGS

    103 ≤ κ(Ai ) ≤ 107

    iterations0 5 10 15 20 25 30 35 40 45 50

    dist(µ,X

    t)

    10-15

    10-10

    10-5

    100

    RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 4RBFGS

    time (s)0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04

    dist(µ,X

    t)

    10-15

    10-10

    10-5

    100

    RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 4RBFGS

    Figure: Evolution of averaged distance between current iterate and the exactKarcher mean with respect to time and iterations

    Means of SPD Matrices 19

  • Numerical Results: Comparison of Different Algorithms

    K = 30, size = 100× 100, d = 5050

    1 ≤ κ(Ai ) ≤ 20

    iterations0 5 10 15 20 25 30 35 40 45 50

    dis

    t(7;X

    t)

    10-15

    10-10

    10-5

    100RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 8RBFGS

    time (s)0 1 2 3 4 5 6 7 8

    dis

    t(7;X

    t)

    10-15

    10-10

    10-5

    100RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 8RBFGS

    104 ≤ κ(Ai ) ≤ 107

    iterations0 5 10 15 20 25 30 35 40 45 50

    dis

    t(7;X

    t)

    10-15

    10-10

    10-5

    100RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 8RBFGS

    time (s)0 5 10 15

    dis

    t(7;X

    t)

    10-15

    10-10

    10-5

    100RL IterationRSD-QRRBBLRBFGS: 2LRBFGS: 8RBFGS

    Figure: Evolution of averaged distance between current iterate and the exactKarcher mean with respect to time and iterations

    Means of SPD Matrices 20

  • Numerical Results: Riemannian vs. Euclidean Metrics

    K = 100, n = 3, and 1 ≤ κ(Ai ) ≤ 106.

    iterations0 20 40 60 80 100

    dis

    t(7;X

    t)

    10-6

    10-4

    10-2

    100 RSD-RieRSD-EucLRBFGS-Rie: 0LRBFGS-Euc: 0LRBFGS-Rie: 2LRBFGS-Euc: 2RBFGS-RieRBFGS-Euc

    time (s)0 0.05 0.1 0.15

    dis

    t(7;X

    t)

    10-6

    10-4

    10-2

    100 RSD-RieRSD-EucLRBFGS-Rie: 0LRBFGS-Euc: 0LRBFGS-Rie: 2LRBFGS-Euc: 2RBFGS-RieRBFGS-Euc

    K = 30, n = 100, and 1 ≤ κ(Ai ) ≤ 105.

    iterations0 20 40 60 80 100

    dis

    t(7;X

    t)

    10-6

    10-4

    10-2

    100RSD-RieRSD-EucLRBFGS-Rie: 0LRBFGS-Euc: 0LRBFGS-Rie: 2LRBFGS-Euc: 2RBFGS-RieRBFGS-Euc

    time (s)0 5 10 15 20 25

    dis

    t(7;X

    t)

    10-6

    10-4

    10-2

    100RSD-RieRSD-EucLRBFGS-Rie: 0LRBFGS-Euc: 0LRBFGS-Rie: 2LRBFGS-Euc: 2RBFGS-RieRBFGS-Euc

    Figure: Evolution of averaged distance between current iterate and the exactKarcher mean with respect to time and iterations

    Means of SPD Matrices 21

  • Outline

    Karcher mean computation on Sn++

    Divergence-based means on Sn++

    L1-norm median computation on Sn++

    Application: Structure tensor image denoising

    Summary

    Means of SPD Matrices 22

  • Motivations

    Karcher mean

    K(A1, . . . ,AK ) = arg minX∈Sn++

    1

    2K

    K∑i=1

    δ2(X ,Ai ), (1)

    where δ(X ,Y ) = ‖ log(X−1/2YX−1/2)‖Fpros: holds desired properties

    cons: high computational cost

    Use divergences as alternatives to the geodesic distance due to theircomputational and empirical benefits

    A divergence is like a distance except it lacks

    triangle inequality

    symmetry

    Means of SPD Matrices 23

  • LogDet α-divergence and Associated Mean

    The LogDet α-divergence is defined as

    G (A1, . . . ,Ak) = arg minX∈Sn++

    1

    2K

    K∑i=1

    δ2LD,α(Ai ,X ) , (2)

    where the LogDet α-divergence on Sn++ is given by

    δ2LD,α(X ,Y ) =4

    1− α2log

    det( 1−α2 X +1+α

    2 Y )

    det(X )1−α

    2 det(Y )1+α

    2

    The LogDet α-divergence is asymetric in general, except for α = 0

    (2) defines the right mean. The left mean can be defined in a similarway.

    Means of SPD Matrices 24

  • Karcher Mean vs. LogDet α-divergence Mean

    Complexity comparison for problem-related operations

    function gradient total

    LD α-div. mean2Kn3

    33Kn3

    11Kn3

    3

    Karcher mean 18Kn3 5Kn3 23Kn3

    Invariance properties

    scalinginvariance

    rotationinvariance

    congruenceinvariance

    inversioninvariance

    LD α-div. mean 3 3 3 7

    Karcher mean 3 3 3 3

    Means of SPD Matrices 25

  • Numerical Experiment: Comparions of Different Algorithms

    K = 100, n = 3, and 10 ≤ κ(Ai ) ≤ 106 Fixed Point iteration2: stepsize =1 − α

    2K

    iterations0 100 200 300 400

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0:9

    FPRSDRBBLRBFGSRBFGSRNewton

    time (s)0 0.02 0.04 0.06 0.08

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0:9

    FPRSDRBBLRBFGSRBFGSRNewton

    iterations0 50 100 150 200

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0:5

    FPRSDRBBLRBFGSRBFGSRNewton

    time (s)0 0.005 0.01 0.015 0.02 0.025 0.03

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0:5

    FPRSDRBBLRBFGSRBFGSRNewton

    iterations0 20 40 60 80 100

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0

    FPRSDRBBLRBFGSRBFGSRNewton

    time (s)0 0.005 0.01 0.015 0.02

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0

    FPRSDRBBLRBFGSRBFGSRNewton

    2Z. Chebbi and M. Moakher. Means of Hermitian positive-definite matrices based on thelog-determinant -divergence function. Linear Algebra and its Applications, 436(7):1872C1889, 2012

    2Z. Chebbi and M. Moakher. Means of Hermitian positive-definite matrices based on thelog-determinant -divergence function. Linear Algebra and its Applications, 436(7):1872C1889, 2012

    2Z. Chebbi and M. Moakher. Means of Hermitian positive-definite matrices based on thelog-determinant -divergence function. Linear Algebra and its Applications, 436(7):1872C1889, 2012

    2Z. Chebbi and M. Moakher. Means of Hermitian positive-definite matrices based on thelog-determinant -divergence function. Linear Algebra and its Applications, 436(7):1872C1889, 2012

    Means of SPD Matrices 26

  • Numerical Experiment: Comparions of Different Algorithms

    K = 100, n = 3, and 10 ≤ κ(Ai ) ≤ 106

    iterations0 10 20 30 40 50

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0:9

    FPRSDRBBLRBFGSRBFGSRNewton

    time (s)0 0.002 0.004 0.006 0.008 0.01

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0:9

    FPRSDRBBLRBFGSRBFGSRNewton

    iterations0 10 20 30 40 50

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0:5

    FPRSDRBBLRBFGSRBFGSRNewton

    time (s)0 0.002 0.004 0.006 0.008 0.01

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0:5

    FPRSDRBBLRBFGSRBFGSRNewton

    iterations0 10 20 30 40 50

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0

    FPRSDRBBLRBFGSRBFGSRNewton

    time (s)0 0.002 0.004 0.006 0.008 0.01

    dist(7;X

    t)

    10-8

    10-6

    10-4

    10-2

    100

    102, = 0

    FPRSDRBBLRBFGSRBFGSRNewton

    Means of SPD Matrices 27

  • Outline

    Karcher mean computation on Sn++

    Divergence-based means on Sn++

    L1-norm median computation on Sn++

    Application: Structure tensor image denoising

    Conclusion

    Means of SPD Matrices 28

  • Motivations

    The mean of a set of points is sensitive to outliers

    The median is robust to outliers

    0 1 2 3 4 5 6 7 8 9 100

    1

    2

    3

    4

    5

    6

    7

    8

    9

    10

    points

    outliers

    mean

    median

    Figure: The geometric mean and median in R2 space.

    Means of SPD Matrices 29

  • Riemannian Median of SPD Matrices

    The Riemannian median of a set of SPD matrices is defined as:

    M(A1, . . . ,AK ) = arg minX∈Sn++

    1

    2K

    K∑i=1

    δ(Ai ,X ),

    where δ(X ,Y ) is a distance or the square root of a divergence function.

    The cost function is non-smooth at X = Ai

    There may exist multiple local minima for some δ

    Means of SPD Matrices 30

  • Numerical Experiment

    Original data (K = 50) Outliers

    outliers 0 1 5 10 25 50

    δR

    Median

    Mean

    Means of SPD Matrices 31

  • Outline

    Karcher mean computation on Sn++

    Divergence-based means on Sn++

    Riemannian L1-norm median computation on Sn++

    Application: Structure tensor image denoising

    Conclusion

    Means of SPD Matrices 32

  • Application: Structure Tensor Image Denoising

    A structure tensor image is a spatialstructured matrix field

    I : Ω ⊂ Z2 → Sn++

    Noisy tensor images are simulated byreplacing the pixel values by an outliertensor with a given probability Pr

    Denoising is done by averagingmatrices in the neighborhood of eachpixel

    Mean Riemannian Error:

    MRE =1

    #Ω

    ∑(i,j)∈Ω

    δR(Ii,j , Ĩi,j)

    Original tensor image

    Noisy tensor image

    Means of SPD Matrices 33

  • Structure Tensor Image Denoising: Pr = 0.02

    Means of SPD Matrices 34

  • Structure Tensor Image Denoising: MRE and Time

    MRE comparison

    Means

    A-m LE-m J-m K-m R-median α-m α-median

    MRE

    0

    0.2

    0.4

    0.6

    0.8

    1

    1.21.089

    0.263

    0.616

    0.264

    0.026

    0.125

    0.029

    Pr = 0.02

    Time comparison

    MeansA-m LE-m J-m K-m R-median α-m α-median

    Tim

    e(s)

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    0.05 0.40 0.30

    7.12

    16.53

    3.62

    11.76

    Pr = 0.02

    Means of SPD Matrices 35

  • Structure Tensor Image Denoising: Pr = 0.1

    Means of SPD Matrices 36

  • Structure Tensor Image Denoising: MRE and Time

    MRE comparison

    Means

    A-m LE-m J-m K-m R-median α-m α-median

    MRE

    0

    1

    2

    3

    4

    5

    3.959

    0.947

    2.071

    0.948

    0.078

    0.407

    0.055

    Pr = 0.1

    Time comparison

    MeansA-m LE-m J-m K-m R-median α-m α-median

    Tim

    e(s)

    0

    2

    4

    6

    8

    10

    12

    14

    16

    0.04 0.40 0.31

    6.47

    14.31

    4.66

    11.32

    Pr = 0.1

    Means of SPD Matrices 37

  • Summary

    Karcher mean for SPD matrices

    Analyze the conditioning of the Hessian of the Karcher mean costfunctionApply a limited-memory Riemannian BFGS method to computingthe SPD Karcher mean with efficient implementationsRecommend using LRBFGS as the default method for the SPDKarcher mean computation

    Other averaging techniques for SPD matrices

    Investigate divergence-based means and Riemannian L1-normmedians on Sn++Use recent development in Riemannian optimization to developefficient and robust algorithm on Sn++

    Evaluate the performance of different averaging techniques inapplications

    Means of SPD Matrices 38

  • Thank you!

    Means of SPD Matrices 39

  • References I

    Ognjen Arandjelovic, Gregory Shakhnarovich, John Fisher, RobertoCipolla, and Trevor Darrell.Face recognition with image sets using manifold density divergence.In Computer Vision and Pattern Recognition, 2005. CVPR 2005.IEEE Computer Society Conference on, volume 1, pages 581–588.IEEE, 2005.

    D. A. Bini and B. Iannazzo.Computing the Karcher mean of symmetric positive definitematrices.Linear Algebra and its Applications, 438(4):1700–1710, 2013.

    Guang Cheng, Hesamoddin Salehian, and Baba Vemuri.Efficient recursive algorithms for computing the mean diffusiontensor and applications to DTI segmentation.Computer Vision–ECCV 2012, pages 390–401, 2012.

    Means of SPD Matrices 40

  • References II

    P. T. Fletcher and S. Joshi.Riemannian geometry for the statistical analysis of diffusion tensordata.Signal Processing, 87(2):250–262, 2007.

    Wen Huang, P-A Absil, and Kyle A Gallivan.A Riemannian symmetric rank-one trust-region method.Mathematical Programming, 150(2):179–216, 2015.

    Wen Huang, Kyle A Gallivan, and P-A Absil.A Broyden class of quasi-Newton methods for Riemannianoptimization.SIAM Journal on Optimization, 25(3):1660–1685, 2015.

    Zhiwu Huang, Ruiping Wang, Shiguang Shan, and Xilin Chen.Face recognition on large-scale video in the wild with hybridEuclidean-and-Riemannian metric learning.Pattern Recognition, 48(10):3113–3124, 2015.

    Means of SPD Matrices 41

  • References III

    Bruno Iannazzo and Margherita Porcelli.The Riemannian Barzilai-Borwein method with nonmonotoneline-search and the Karcher mean computation.Optimization online, December, 2015.

    B. Jeuris, R. Vandebril, and B. Vandereycken.A survey and comparison of contemporary algorithms for computingthe matrix geometric mean.Electronic Transactions on Numerical Analysis, 39:379–402, 2012.

    H. Karcher.Riemannian center of mass and mollifier smoothing.Communications on Pure and Applied Mathematics, 1977.

    J. Lawson and Y. Lim.Monotonic properties of the least squares mean.Mathematische Annalen, 351(2):267–279, 2011.

    Means of SPD Matrices 42

  • References IV

    Jiwen Lu, Gang Wang, and Pierre Moulin.Image set classification using holistic multiple order statisticsfeatures and localized multi-kernel metric learning.In Proceedings of the IEEE International Conference on ComputerVision, pages 329–336, 2013.

    Q. Rentmeesters and P.-A. Absil.Algorithm comparison for Karcher mean computation of rotationmatrices and diffusion tensors.In 19th European Signal Processing Conference, pages 2229–2233,Aug 2011.

    Q. Rentmeesters.Algorithms for data fitting on some common homogeneous spaces.PhD thesis, Universite catholiqué de Louvain, 2013.

    Means of SPD Matrices 43

  • References V

    Y. Rathi, A. Tannenbaum, and O. Michailovich.Segmenting images on the tensor manifold.In IEEE Conference on Computer Vision and Pattern Recognition,pages 1–8, June 2007.

    Gregory Shakhnarovich, John W Fisher, and Trevor Darrell.Face recognition from long-term observations.In European Conference on Computer Vision, pages 851–865.Springer, 2002.

    Oncel Tuzel, Fatih Porikli, and Peter Meer.Region covariance: A fast descriptor for detection and classification.In European conference on computer vision, pages 589–600.Springer, 2006.

    Xinru Yuan, Wen Huang, P.-A. Absil, and Kyle A. Gallivan.A Riemannian limited-memory BFGS algorithm for computing thematrix geometric mean.Procedia Computer Science, 80:2147–2157, 2016.

    Means of SPD Matrices 44