accelerating 3d facial modeling using arrayfire, opencv and...
TRANSCRIPT
![Page 1: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/1.jpg)
Accelerating 3D Facial Modeling using ArrayFire, OpenCV and CUDA
Umar Arshad (@arshad_umar)ArrayFire (@arrayfire)
![Page 2: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/2.jpg)
ArrayFire
● World’s leading GPU experts○ In the industry since 2007○ NVIDIA Partner
● Deep experience working with thousands of customers○ Analysis○ Acceleration○ Algorithm development
● GPU Training○ Hands on course with a CUDA engineer○ Customized to meet your needs
![Page 4: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/4.jpg)
Problem
● Came to us with a slow application○ Made use of OpenCV and OpenMP○ 8 threads: 30+ seconds○ One process○ Developed on OSX
● Required a significant hardware investment○ Increased maintenance○ Financially not viable in production○ Had windows infrastructure
![Page 5: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/5.jpg)
Improvements
● OpenCV - ArrayFire interop● Rendering using GPUs
○ Partial CUDA based estimation○ OpenGL based rendering
● Batching Operations○ Combining data into single operation
● Concurrent Processing○ CPU: small variable length data○ GPU: large fixed length data
![Page 6: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/6.jpg)
Moving to ArrayFire
● OpenCV Mat to ArrayFire array○ Row vs. Column Major
○ http://blog.accelereyes.com/blog/2012/09/19/image-processing-with-arrayfire-and-opencv/
● Similar Interface○ Allowed for quick porting
![Page 7: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/7.jpg)
Rendering
● Software rasterization● Analysis of algorithm
○ Did not require an exact render
● ArrayFire based estimate○ Plot points○ Dilate
![Page 8: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/8.jpg)
Rendering
● Moved to OpenGL for some cases○ Makes use of hardware rasterizer○ ArrayFire -> OpenGL interop using CUDA-OpenGL interop○ See ArrayFire GitHub for sample implementation
![Page 9: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/9.jpg)
Batching
● Used OpenMP for parallelism○ One frame per thread○ Optimized for CPU
● One CPU thread + GPU○ Parallelism on GPU vs. Parallelism on CPU
● Combined OpenMP threads
![Page 10: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/10.jpg)
Batching
● Many small operations○ Individually it didn’t make sense to port to the GPU
● Increase dimensionality of the data○ 2D -> 3D○ GFOR and Strided Access
● Moved to single threaded code
![Page 11: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/11.jpg)
![Page 12: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/12.jpg)
![Page 13: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/13.jpg)
Batching
● Call custom CUDA kernels○ Special indexing
● Specialized Matrix Multiply○ ssyrk vs. gemm○ 2x faster○ concurrent execution using streams
float * bound = boundary.device<float>();kernel<<< threads, blocks >>>(bound, boundary.elements());
![Page 14: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/14.jpg)
Batching
● Results○ 90ms -> 28ms on a GTX 690
● Other Improvements○ Overlapped pinned memory transfers○ Generic to Specialized matrix multiply○ Streams
![Page 15: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/15.jpg)
Concurrent Computation
● Overlap CPU and GPU computation○ CPU handles variable length data sets one frame at a time○ GPU handles fixed length data sets all frames concurrently
#pragma omp sections
{
#pragma omp section
{
// GPU Code
}
#pragma omp section
{
// CPU Code
}
}
![Page 16: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/16.jpg)
Results
● 1 Process (5 threads): 8 seconds● 6 Processes(2 threads): 22 seconds
![Page 17: Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and …on-demand.gputechconf.com/gtc/2014/presentations/S4426... · 2014-05-21 · Umar Arshad Subject: This session will discuss](https://reader034.vdocuments.mx/reader034/viewer/2022050100/5f3fec8bf1e1f427b909c483/html5/thumbnails/17.jpg)
Q & A