overview of nvidia gpu's and cuda · depth-compare and blending data write engine pharr, m....
TRANSCRIPT
![Page 1: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/1.jpg)
Overview of Nvidia GeForce 6 Series Architecture and More
Prepared by: Dustin Balise
![Page 2: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/2.jpg)
Overall System Architecture
Pharr, M. and Fernando, R. (2005). GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems) . Addison-Wesley Professional.
![Page 3: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/3.jpg)
Block Diagram
Pharr, M. and Fernando, R. (2005). GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems) . Addison-Wesley Professional.
![Page 4: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/4.jpg)
Memory Hierarchy
Pharr, M. and Fernando, R. (2005). GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems) . Addison-Wesley Professional.
![Page 5: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/5.jpg)
Graphics Pipeline
Programmable Vertex engineProgrammable fragment engineTexture load/filter engineDepth-compare and blending data write engine
Pharr, M. and Fernando, R. (2005). GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems) . Addison-Wesley Professional.
![Page 6: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/6.jpg)
Graphics Pipeline for Non-Graphics Operations
Vertex and Fragment processor are highly computationally capableTexture unit used as random-access data fetch unit
35 GB/sec
Pharr, M. and Fernando, R. (2005). GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems) . Addison-Wesley Professional.
![Page 7: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/7.jpg)
CPU-GPU Analogies
GPU Textures = CPU ArraysGPU Fragment Programs = CPU “Inner Loops”Render-to-Texture = FeedbackGeometry Rasterization = Computation Invocation
![Page 8: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/8.jpg)
CPU-GPU Analogies
Texture Coordinates = Computational DomainVertex Coordinates = Computational Range
![Page 9: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/9.jpg)
Performance
425 MHz graphics clock550 MHz memory clockVertex Processor
6 four-wide fp32 vector MADs per clock cycleOne scalar multifunction operation (such as sine or reciprocal square root) per clock cycle
![Page 10: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/10.jpg)
Performance
Fragment Processor16 four-wide fp32 vector MADs per clock cycle16 four-wide fp32 multiplies per clock cycle
![Page 11: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/11.jpg)
Branching
Fragment Processor works on many fragments at the same time
Fragments in group may take different branchFragment Processor needs to take both branches6 cycle overhead for if-else-endif control structures
![Page 12: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/12.jpg)
That was in 2005…
Geforce 8 series450-675 MHz core clock speeds400-1080 MHz memory clock speeds256-768 MB of memory6.4-103.7 GB/s memory bandwidthCosts range from about $150-$700
![Page 13: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/13.jpg)
Diagram of High End Nvidia GPU
Nguyen, H. (2007). GPU Gems 3. Addison-Wesley Professional.
![Page 14: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/14.jpg)
HPC Solutions
Tesla C870128 multi-threaded processors per GPU
Full integer and floating point operationsC-language development environment and a suite of developer tools (CUDA)1.5 GB of Dedicated GDDR3 MemoryOver 500 gigaflops of peak floating point performance76.8 GB/s Memory BandwidthParallel data cache
![Page 15: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/15.jpg)
CUDA
Nvidia SDK for general purpose computing on GPU’s (GPGPU)Compatible with Nvidia 8 series, Quadro FX 4600/5600, and Tesla GPU’sRuns on Linux and Windows
![Page 16: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/16.jpg)
Cuda Source Files
Host CodeRuns on generic x86 processorC and C++ source files
Device CodeRuns on GPU“C like” source fileBasically GPU functions
![Page 17: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/17.jpg)
CUDA Compiler (nvcc)
Separates device functions from host codePasses host code to platform compiler (i.e. gcc, g++ …)Embeds compiled GPU functions as load images in the host object fileLinking stage provides support for remote SIMD procedure calling and explicit GPU manipulation
![Page 18: Overview of Nvidia GPU's and CUDA · Depth-compare and blending data write engine Pharr, M. and Fernando, R. (2005). GPU Gems 2: ... Nvidia SDK for general purpose computing on GPU’s](https://reader034.vdocuments.mx/reader034/viewer/2022050602/5fa99e0ce9779010590ea96b/html5/thumbnails/18.jpg)
Bibliography
Nguyen, H. (2007). GPU Gems 3. Addison-Wesley Professional.
Nvidia Corporation (2007). The CUDA Compiler Driver NVCC.
Pharr, M. and Fernando, R. (2005). GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation (Gpu Gems) . Addison-Wesley Professional.