mali-400 mp: a scalable gpu for mobile devices€¦ · 2 outline arm and mobile graphics design...

17
Mali-400 MP: A Scalable GPU for Mobile Devices Tom Olson Director, Graphics Research, ARM

Upload: vuque

Post on 08-Apr-2018

230 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

Mali-400 MP: A Scalable

GPU for Mobile Devices

Tom Olson

Director, Graphics Research, ARM

Page 2: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

2

Outline

ARM and Mobile Graphics

Design Constraints for Mobile GPUs

Mali Architecture Overview

Multicore Scaling in Mali-400 MP

Results

Page 3: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

3

About ARM

World’s leading supplier of semiconductor IP

Processor Architectures and Implementations

Related IP: buses, caches, debug & trace, physical IP

Software tools and infrastructure

Business Model

License fees

Per-chip royalties

Graphics at ARM

Acquired Falanx in 2006

ARM Mali is now the world’s most widely licensed GPU family

Page 4: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

4

Challenges for Mobile GPUs

Size

Power

Memory Bandwidth

Page 5: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

5

More Challenges

Graphics is going into “anything that has a screen”

Mobile

Navigation

Set Top Box/DTV

Automotive

Video telephony

Cameras

Printers

Huge range of form factors, screen sizes, power budgets,

and performance requirements

In some applications, a huge difference between peak and

average performance requirements

Page 6: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

6

Solution: Scalability

Address a wide variety of

performance points and

applications with a single IP and a

single software stack.

Need static scalability to adapt to different peak

requirements in different platforms / markets

Need dynamic scalability to reduce power when peak

performance isn’t needed

Page 7: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

7

Options for Scalability

Fine-grained: Multiple pipes, wide SIMD, etc

Proven approach, efficient and effective

But, adding pipes / lanes is invasive

Hard for IP licensees to do on their own

And, hard to partition to provide dynamic scalability

Coarse-grained: Multicore

Easy for licensees to select desired performance

Putting cores on separate power islands allows dynamic scaling

Page 8: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

8

Mali 400-MP Top Level Architecture

Scalable pixel performance

1-4 rasterizer cores

32K-128K L2 cache

Pixel Processor

#2

Pixel Processor

#1

Geometry

Processor

AXI

CLKs

RESETs

IRQs

IDLEs

Mali-400 MP Top-Level

MaliL2

Pixel Processor

#3

Pixel Processor

#4

MaliMMUs

Asynch

APB

Page 9: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

9

Mali Tile-Based Rendering

Reduces off-chip framebuffer bandwidth

Rasterize into on-chip 16x16 tile buffer

Z, stencil, MSAA samples never go off-chip

Tradeoff against increased geometry bandwidth

Details: see Real-Time Rendering, 3rd ed.

0 0

0

0 0

0 0

0

0 0

0

0

0 0,1

0

0 0,1

0,1 0,1

0,1

0,1 0

0

0

1 1

Page 10: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

10

Mali-400 MP Geometry Processor

Vertex Shader

Single-threaded, deeply pipelined

System bus interface

Control Registers / Performance Counters

Vertex

Loader

Vertex

ShaderVertex Storer

Polygon List

Builder Unit

Page 11: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

11

Mali-400 MP Pixel Processor

Fragment Shader

128-thread barrel processor

Fully general control flow

VLIW ISA, tuned for graphics

One texture sample per clock

Key Features

16x16 on-chip tile buffer

Renders one pixel per clock: 275M pix/sec @ 275 MHz

No penalty for 4x MSAA

No penalty for blending

4x or 16x MSAA resolve on output

OpenGL ES 2.0 states handled without state dependent shaders

System bus interface

Blending

Color / Depth /

Stencil Buffers

Fragment Shader

Polygon List

Reader

Triangle setup

Rasterization

Thread control

Control Registers / Performance Counters

Page 12: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

12

Fragment

Processor

#2

Fragment

Processor

#1

Vertex

Processor

AXI

CLKs

RESETs

IRQs

IDLEs

Mali-400 MP Top-Level

MaliL2

Fragment

Processor

#3

Fragment

Processor

#4

MaliMMU

Asynch

APB

Going Multi-Core

All cores work in parallel on

separate tasks

Each core processes one tile at a

time until completion – no

communication between cores

Tiles assigned statically to cores

in a swizzled order

Tile processing order maximizes

L2 hit rate for polygon descriptors,

textures

Page 13: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

13

Does it work?

Test content

Industry standard benchmarks for GL ES 1.1, 2.0

Thanks Rightware!

Page 14: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

14

Relative Performance

Single frames in RTL simulation

Artificial memory model (fixed static latency)

1-4 cores, 32KB L2 cache

1920x1080, 32bpp, 4xMSAA

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

MC1 MC2 MC3 MC4

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

MC1 MC2 MC3 MC4

OpenGL ES 1.1 OpenGL ES 2.0

No

rma

lize

d F

ram

e R

ate

*

No

rma

lize

d F

ram

e R

ate

*

*Single core frame rate = 1.0

Page 15: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

15

Impact on Bandwidth

Memory bandwidth per frame is stable

L2 cache is working to prevent redundant memory accesses

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

MC1 MC2 MC3 MC4

OpenGL ES 1.1

*Single core bandwidth per frame = 1.0

No

rma

lize

d B

an

dw

idth

*

Page 16: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

16

Summary

Scalability is an essential feature for mobile GPUs

Static and Dynamic

Multi-core architectures are a good fit to the need

Easy to configure for different performance requirements

Easy to power-gate for dynamic scaling

Mali-400 MP shows that the concept works

Pixel rate scalable from 275 Mpix/s to 1.1 Gpix/s

Linear speedup demonstrated on common mobile benchmarks

Single driver for all configurations – transparent to applications

Power and bandwidth efficient

Page 17: Mali-400 MP: A Scalable GPU for Mobile Devices€¦ · 2 Outline ARM and Mobile Graphics Design Constraints for Mobile GPUs Mali Architecture Overview Multicore Scaling in Mali-400

17

Thanks to…

The HPG 2010 Hot3D organizers

Remi Pedersen, Mashhuda Glencross, and the Mali team

Our Silicon Partners