lab project gpu programming ss12 - tum · gpu programming lab project optical flow and super...

45
SS12 GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 Oliver Dunkley 03631802

Upload: others

Post on 18-Jun-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

SS12

GPU ProgrammingLab Project

Optical Flow and Super Resolution

Ross Kidson 03627521Oliver Dunkley 03631802

Page 2: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Introduction

Contents○ Implementation environment, methods○ Optical Flow

■ example■ theory■ implementation■ results & performance

○ Super Resolution■ theory■ implementation■ results

Page 3: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Implementation

● nsight, ubuntu● two code bases● focused on benchmarking/comparing

memory types● python scripts & open office

Page 4: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Benchmarking methods

● GPU - NVidia Visual Profiler● CPU - custom 'benchmark' class to reproduce the same

data

Benchmark::instance()->addEvent(Benchmark::start, "add_flow_fields"); for(unsigned int p=0;p<nx_fine*ny_fine;p++){

_u1[p] += _u1lvl[p];_u2[p] += _u2lvl[p];

}Benchmark::instance()->addEvent(Benchmark::end, "add_flow_fields");

.

.

.

Benchmark::instance()->doBenchmark(); //Save collected data to file

Page 5: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical flow

"Pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and the scene"

Usages● motion detection● object segmentation● time-to-collision ● motion compensated encoding● stereo disparity measurement

Page 6: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical flow: Example input street 1

Page 7: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical flow: Example input street 2

Page 8: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical flow: Example output

Page 9: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical flow

Page 10: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical flow Example input Tron 1

Page 11: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical flow Example input Tron 2

Page 12: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical flow Overlayed output

Page 13: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical flow: Formulation

Given two images I1 and I2 one computes a field vector u that matches image intensities:

Formulate as an energy function to minimize

Page 14: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical flow: Formulation

Apply taylor series expansion and robust penalty term:

This can be solved with Euler-Lagrange equations and reformulated into ax = b form, which can then be solved using SOR

Page 15: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

resampleAreaParallel(u(x)k-1 )

backwardRegistrationBilinearFunctionTex(I2 , u(x)

setKernel(du(x),0)

sorflow_update_robustifications_warp_tex(u(x), du(x))

sorflow_update_righthandside_tex(u(x), b

sorflow_nonlinear_warp_sor_tex(b, ) du(x)

addKernel(u(x),du(x))

inner iteration (SOR) loop

outer iteration (update robustifications) loop

next pyramid level loop

coarse --> fineOutput: u(x)

1. Resize initial u(x) to current level

2. Warp the original image at this level by resized flow field u(x)

3. Set du to 0

4. Update robustification terms based on current u(x) and du(x)

5. Update right hand side of equation to solve - b term

6. Solve for du(x)

7. Update robustifications and b and resolve for du(x)

8. add du to u

9. Continue to next pyramid level

u(x)k

I2_warp

Page 16: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical Flow ResultsOuter Iterations

40

40

101

1

10

Inne

r Ite

ratio

ns

x145

x147

x163x162

x141

x137x104

x143

x156

Page 17: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Applying GPU Techniques

● registers

● constant memory

● shared memory

● texture memory

● global memory

● kernels, blocks, pitch, warp

Page 18: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Variables tested

Already in Texture Memory● Image gradients Ix, Iy, It● backwardRegistration - I2

Added to Texture Memory● ● u(x)

Added to Shared Memory● u(x)● du(x)

resampleAreaParallel(u(x)k-1 )

backwardRegistrationBilinearFunctionTex(I2 , u(x))

setKernel(du(x),0)

sorflow_update_robustifications_warp_tex(u(x), du(x))

sorflow_update_righthandside_tex(u(x), )

sorflow_nonlinear_warp_sor_tex(b, )

addKernel(u(x),du(x))

inner iteration (SOR) loop

outer iteration (update robustifications) loop

next pyramid level loop

Output: u(x)

Page 19: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical Flow Results

Page 20: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical Flow Results

Page 21: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical Flow Results

Page 22: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical Flow Results

Page 23: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

GPU Techniques - Small Hack 1

if (x == 0) shared_mem[0][ty] = shared_mem[tx][ty]; //one at a timevs

if (x == 0){ shared_mem1[0][ty] = shared_mem1[tx][ty]; //1 to N in parallel shared_memN[0][ty] = shared_mem2[tx][ty]; }

Page 24: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical Flow Results

Page 25: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Optical Flow Results

Page 26: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution: Formulation

Given a number of degraded observations of image I that have been transformed by linear operations , a set of linear equations can be obtained:

Page 27: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution: Formulation

Energy function with regularity penalty term:

Page 28: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution: General Method

Super Res image I

Lower resolution images In

I1 I2 I3 I4 I6 I7 I8 I9I

Page 29: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution: General Method

1. Warp inital guess image I to I1

(BackwardRegistation)

I1

Iwarped

I2 I3 I4 I6 I7 I8 I9I

Page 30: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution: General Method

2. Shrink warped image and perform gaussian blur

I1

Iwarped

I2 I3 I4 I6 I7 I8 I9I

Page 31: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution: General Method

3. Subtract images

I1

Iwarped

(-)

Idiff

I2 I3 I4 I6 I7 I8 I9I

Page 32: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution: General Method

4. Resample difference image back to larger resolution

I1

Idiff resampled

I2 I3 I4 I6 I7 I8 I9I

Page 33: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution: General Method

5. Warp back to middle image.

(ForwardRegistration)

I1

Idiff resampled warped

I2 I3 I4 I6 I7 I8 I9I

Page 34: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution: General Method

6. Repeat for all other images and add all difference images together

I1

I2 diff resampled warped

I3 diff resampled warped

I4 diff resampled warped

In diff resampled warped

I2 I3 I4 I6 I7 I8 I9I

Page 35: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution: General Method

7. Repeat the entire process until image converges

I1

I2 diff resampled warped

I3 diff resampled warped

I4 diff resampled warped

In diff resampled warped

I2 I3 I4 I6 I7 I8 I9I

Page 36: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Image loop 1:Calculate diff images

For each image In:

backwardRegistrationBilinearValueTex(uor(x), u(x))

gaussBlurSeparateMirrorGpu() [const memory!]

resampleAreaParallelSeparate()

dualL1Difference()

Results in a vector of original sized difference images

- =

shrink

Page 37: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Image loop 2: Warp difference images forward

For each difference image

resampleAreaParallelSeparateAdjoined()

gaussBlurSeparateMirrorGpu()

forewardRegistrationBilinearAtomic()

addKernel(diff_image, temp_accumulator)

Results in an image of all summed differences together

Page 38: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

calculate for # number of outer iterations

Putting it together (for one image)

Image loop 1: Calculate difference Images

Image loop 2: Morph and sum difference images

xi1, xi2dualTVHuber(uor)

Final Result

primal1N(xi1,xi2,difference_sum,u,uor) uor(x), u(x)

setKernel(differnce_sum,0)

Page 39: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution Output &Cuda Tex2d(...) differences

differencemanual cuda tex2d(...)original

Page 40: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution GPU Performance

22x Speed-up

Page 41: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Super Resolution Performance

Page 42: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

GPU Techniques - Small Hacks 2

const int tx_1 = tx == 0 ? tx : tx - 1;vs

const int tx_1 = tx - 1 * (x > 0);

Page 43: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Debugging techniques: Compare Image class

● Automatically compare cpu to gpu images○ Compares pixel values○ Errors not always visible

● Outputs which failed, stats, display image differences● Facilitates code testing after optimizations/hacks● Problem with texture memory (low similarity thresholds)

ImageComparison* ic = ImageComparison::instance();

CPU:ic->addImage(_I1pyramid->level[rec_depth], "CI1",rec_depth, nx_fine, ny_fine,1, "CPU");ic->addImage(_I2pyramid->level[rec_depth], "CI2",rec_depth, nx_fine, ny_fine,1, "CPU");ic->dumpData("cpu_images"); //save data to disk....

GPU:ic->addImage(_I1pyramid->level[rec_depth], "CI1", rec_depth,nx_fine, ny_fine, pitch,"GPU");ic->addImage(_I2pyramid->level[rec_depth], "CI2", rec_depth,nx_fine, ny_fine, pitch,"GPU");....ImageComparison::instance()->compareImages("cpu_images");

Page 44: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

Difficulties

● Diving into the code● Compiling on a local system● Debugging segaults (printf, __LINE__, __FILE__ helped...)

● Texture offsets● Incorrect in/out pitches● Forgetting to catchkernel (oops)● Evaluating massive amounts of data● Maintaining two code bases● Combining benchmark data

Page 45: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases

References:

Nils Papenberg, Andres Bruhn, Thomas Brox, Stephan Didas, and Joachim Weickert. 2006. Highly Accurate Optic Flow Computation with Theoretically Justified Warping. Int. J. Comput. Vision 67, 2 (April 2006), 141-158.

Markus Unger, Thomas Pock, Manuel Werlberger, and Horst Bischof. 2010. A convex approach for variational super-resolution. In Proceedings of the 32nd DAGM conference on Pattern recognition, Michael Goesele, Stefan Roth, Arjan Kuijper, Bernt Schiele, and Konrad Schindler (Eds.). Springer-Verlag, Berlin, Heidelberg, 313-322.

NVIDIA CUDA C Best Practices Guide DG-05603-001_v4.1 | January 2012