gpu architecture - rochester institute of...

28
GPU Architecture Chris Vuong Long Pham

Upload: doanminh

Post on 07-Feb-2018

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

GPU Architecture Chris Vuong Long Pham

Page 2: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

Agenda

1. What is GPU?

a. Dedicated vs Integrated GPUs

b. GPU structure vs CPU

2. How does GPU work?

3. History & Evolution of GPUs

a. Background

b. 1980’s

c. 1990’s

d. 2000’s

e. 2010’s and beyond

4. OpenGL vs DirectX

5. Recent and Future Trends

Page 3: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

1.What is GPU?

- A graphics processing unit.

- Accelerates creation of images.

- Used in embedded systems, mobile

phones, desktops, workstations and game

consoles.

Page 4: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

a.Dedicated Card vs Integrated Card

Page 5: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

- Interfaces with motherboard by means of

an expansion slot such as PCIe or AGP

- Easily replaceable or upgradeable

- Has its own RAM

- Produces much more heat than IGPs

Page 6: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

Multiprocessor Structure:

- N multiprocessors with M cores each

- SIMD (Single Instruction Multiple Data) -

Cores share an Instruction Unit with other

cores in the same multiprocessor

- Shared memory, constant cache, and

texture cache

Page 7: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More
Page 8: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

How is a pixel drawn on the screen?

Page 9: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More
Page 10: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

Example: 1 million triangles * 100 pixels per triangle * 10 lights * 4 cycles per light computation = 4 billion cycles

Page 11: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

3. History & Evolution of GPUs

a)Background Information

b) 1980’s

c) 1990’s

d) 2000’s

e) 2010’s and beyond

f) Trends

Page 12: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

a)Background Information

- Graphics pipeline: The stages through which the graphics data is sent

+ Usually consists of CPU software + GPU cores

+ 3D coordinates => 2D pixel space

+ Stages in between: Geometry, Rendering

- Adopted by major GPU manufacturers such as NVIDIA, ATI

- Original GPUs used graphics pipeline with GPU performing Rendering only

- Later on GPUs started to take more tasks in the pipeline

Page 13: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

Early GPU Pipeline

Page 14: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

b) 1980’s

- GPUs were “integrated time buffers”

- IBM Professional Graphics Controller (PGA)

+ One of first PC’s 2D/3D video cards

+ Despite mass-market failings, became pivotal in GPU evolution

- Features were added to early GPUs by 1987

- Silicon Graphics Inc. (SGI) emergence

+ Creation of API and OpenGL

Page 15: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

c) 1990’s

- Generation 0:

+ SGI’s RealityEngine

+ Cheap Hardware & Games Combo

+ Performance improvements

- Generation I:

+ 3dfx Voodoo (1996)

Page 16: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

c) 1990’s (continued)

- Generation II: Breakthroughs in the field

+ Released cards could perform the entire pipeline

+ Used Accelerated Graphics Port (AGP) in place of PCI

+ New graphics features

+ Propelled computer gaming and GPU hardware markets

+ Still have room for performance improvements (fixed-function pipeline)

3dfx Voodoo (1996)

- 1 million transistors - 4 MB of 64-bit DRAM

- Core clock 50 MHz

NVIDIA’s GeForce 256 (1999) - 23 million transistors - 32 MB of 128-bit DRAM - Core clock 120 MHz

Page 17: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

d) 2000’s

- Generation III:

+ GeForce 3, Radeon 8500: First GPUs

with programmable pipeline

+ Still limited in programmability

- Generation IV:

+ 2002 - GeForce FX, Radeon 9700: Fully

programmable

- Generation V:

+ GeForce 6, Radeon X800

Page 18: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

Improved GPU Pipeline

Page 19: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

d) 2000’s (continued)

- Generation VI:

+ GeForce 8 series (namely GeForce

8800): Unified shaders

+ SM (Streaming Multiprocessor):

Calculation of vertex, pixel, geometry

- Generation VII:

+ Fermi architecture: More

programmable

+ GPGPU (General Purpose GPU)

Page 20: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

Parallelism in CPUs vs GPUs

CPUs

- Task parallelism

- Multiple tasks map to multiple threads

- Tasks run different instructions

- 10s of relatively heavyweight threads

run on 10s of cores

- Each thread managed and scheduled

explicitly

- Each thread has to be individually

programmed

GPUs

- Data parallelism

- SIMD model

- Same instruction on different data

- 10,000s of lightweight threads on

100 cores

- Threads are managed and

scheduled by hardware

- Programming done for batches of

threads(ie, 1 pixel shader per group

of pixels, or draw call)

Page 21: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

Why Unify?

Page 22: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

e) 2010’s and beyond

- GPU consisted of highly parallel and programmable cores

+ Essentially multi-core, general purpose CPUs

- New cards characterized this:

+ NVIDIA’s Fermi-based GTX 580

+ AMD’s Fusion (CPU+GPUs=APU)

+ Intel’s Larrabee & SandyBridge CPUs integrated GPU

Page 23: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

4. OpenGL vs DirectX

Page 24: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

- Both APIs rely on the use of traditional graphics pipeline.

- DirectX is more than just a graphics API (OpenGL is), it has tools to deal with

sound, music, input networking and multimedia.

- DirectX is exclusively to Windows platform whereas OpenGL is completely

cross platform.

- OpenGL is faster because of smoother and efficient pipeline.

4. OpenGL vs DirectX

Page 25: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

5. Recent and Future Trends

- Moore’s Law applies to the

GPU transistors as well

- The number of transistors

have stopped increasing

recently due to

manufacturing constraints

Page 26: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

5. Recent and Future Trends

- Unified Shader Architecture (center around flexible processor core).

- Extremely high parallel stream processing.

- Higher programmable capability.

Page 27: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

5. Recent and Future Trends

Page 28: GPU Architecture - Rochester Institute of Technologymeseec.ce.rit.edu/551-projects/fall2016/2-3.pdf · 3dfx Voodoo (1996) NVIDIA’s GeForce 256 (1999) ... + Fermi architecture: More

References Sources:

http://mcclanahoochie.com/blog/wp-content/uploads/2011/03/gpu-hist-paper.pdf

http://www.cs.virginia.edu/~gfx/papers/pdfs/59_HowThingsWork.pdf

http://s09.idav.ucdavis.edu/talks/02_kayvonf_gpuArchTalk09.pdf

http://s09.idav.ucdavis.edu/talks/02_kayvonf_gpuArchTalk09.pdf

http://cs.nyu.edu/courses/fall15/CSCI-GA.3033-004/ http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Images:

http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

http://www.hardwarezone.com.sg/feature-nvidia-geforce-8800-gtx-gts-g80-worlds-first-dx10-gpu/embracing-unified-shader-architecture

https://www.cs.utah.edu/~jeffp/teaching/MCMD/S20-GPU.pdf

https://www.directron.com/blog/what-is-pcie/