hyperspectral video using cudaon-demand.gputechconf.com/gtc/2013/presentations/s3030...title...

Hyperspectral video using CUDA Robert Dunn, PhD Candidate

Outline Hyperspectral Imaging

Our Research

Problems

Results

Future Work

Imaging Spectroscopy

0 50 100 150 200 2500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Linear Mixing Model

wawsaxM

i

iij

S1

Endmembers

Abundances

Pixel Spectrum

Dimension Reduction Principal Components

Analysis

1. Covariance Matrix

2. Eigendecomposition

3. Transformation

-3 -2 -1 0 1 2 3 -3

-2

-1

0

1

2

3

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1 Intensity

2 I

nte

nsity

N-FINDR

Data

Simplex 1

Simplex 2

Simplex 3

Endmember Determination

Determinant

LU Decomposition

Semi-Parallelisable Search

Abundance Mapping

2minarg xaa

a

S

Current Research Real-time terrestrial HSI

◦ Forensics

◦ Manufacturing

◦ Military

CUDA Simulations

◦ Visualisation Problems

◦ Algorithm Optimisation

Problem - syrk

2.50E-05

2.70E-05

2.90E-05

3.10E-05

3.30E-05

3.50E-05

3.70E-05

2 4 8 16 32 64 128 256

cublasSsyrk Error

0.00%

0.01%

0.10%

1.00%

10.00%

100.00%

2 4 8 16 32 64 128 256

cublasSsyrk Utilisation

p

n

njinij xxy1

Syrk performance with ‘fat matrix’

p=200,000

TXXY

Problem - syrk

0.00E+00

1.00E-08

2.00E-08

3.00E-08

4.00E-08

5.00E-08

6.00E-08

7.00E-08

8.00E-08

9.00E-08

1.00E-07

2 4 8 16 32 64 128 256

Sub-Block Algorithm Error

0.1

1

10

100

2 4 8 16 32 64 128 256

Sp

eed

Im

pro

vem

en

t

Sub-Block vs cublasSsyrk

k

kn

njin

k

n

njinij xxxxy2

11

p=200,000

GPU inefficient for small matrices.

Host-Device transfers inefficient.

◦ Synchronisation

Batch processing not suitable.

Problem – Small matrix

Problem – MAGMA/CULA Advanced Linear Algebra ◦ EVD, LU, QR etc

Hybrid CPU/GPU ◦ Designed for large matrix

Streaming and Synchronisation

‘Assembling Blocks’ vs ‘Complete Kernel’

Problem - Streaming

CPU

GPU

CPU

GPU

Problem - Overheads CPU

Single Frame

Problem – Testing CUDA + C MATLAB

Fast Performance Poor Performance

Unproven Code Proven Code

Developer Intensive Simple Development

CUDA HSI

MEX

MATLAB

Monitoring

Thread

Named

Pipe

Experimental Setup Nvidia Tesla C2070 ‘fermi’

Intel Xeon W3520 @ 2.67Ghz 12GB System RAM

Simulated Data (Moffet Field)

512x512x190 Frame

Results

Future CUDA Optimisations Kepler Architecture (GK110) ◦ Dynamic Parallelism

◦ Hyper-Q

“Complete Kernels”

Portable Platforms / Tegra

Streaming Variants (EVD, QR, LS, etc)

Conclusion Working ‘real-time’ HSI video chain

Further optimisation required

Visualisation challenges

GK110 will help

Questions?

Robert Dunn

[email protected]

Dr. Mark Andrews

[email protected]

hyperspectral video using cudaon-demand.gputechconf.com/gtc/2013/presentations/s3030...title...

Documents