performance monitoring & queries on intel gpus · opengl performance queries we can’t extract...

24
1 Performance Monitoring & Queries on Intel GPUs Lionel Landwerlin 27 September 2018

Upload: others

Post on 05-Jun-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

1

Performance Monitoring & Querieson Intel GPUs

Lionel Landwerlin27 September 2018

Page 2: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

Hardware overviewi915 interfaceUserspace tools

Page 3: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

3

Hardware overview

Geom/FF GA

EU EU

EU EU

EUEU

EU EU

SP

L3

EU EU

EU EU

EUEU

EU EU

SP

L3

EU EU

EU EU

EUEU

EU EU

SP

L3

VF

DS GS

HS

VFE

TE BLT GAM

Media/FF

VE VD SFCGTI

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol04-configurations.pdf

Page 4: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

4

Hardware overview

Geom/FF GA

EU EU

EU EU

EUEU

EU EU

SP

L3

EU EU

EU EU

EUEU

EU EU

SP

L3

EU EU

EU EU

EUEU

EU EU

SP

L3

VF

DS GS

HS

VFE

TE BLT GAM

Media/FF

VE VD SFCGTI

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol04-configurations.pdf

OA unit

Page 5: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

5

Hardware overview

OA unit :● Writes snapshots of multiple registers to memory on :

○ context switch○ programmed timer○ frequency changes○ request from command streamer (only on 3D engine)

● Snapshots written to :○ OA buffer (circular buffer up to 16Mb)○ application address space

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol14-observability.pdf

Page 6: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

6

Hardware overview

Geom/FF GA

EU EU

EU EU

EUEU

EU EU

SP

L3

EU EU

EU EU

EUEU

EU EU

SP

L3

EU EU

EU EU

EUEU

EU EU

SP

L3

VF

DS GS

HS

VFE

TE BLT GAM

: direct connections

Media/FF

VE VD SFCGTI

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol04-configurations.pdf

OA unit

Page 7: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

7

Hardware overview

● Direct connections examples :○ Vertex Shader Threads Dispatched○ Hull Shader Threads Dispatched○ Pixel Shader Threads Dispatched○ 2x2s Rasterized Pixels○ 2x2s Killed in PS (discard in fragment shader)○ 2x2s Written To Render Target○ Blended 2x2s Written to Render Target○ 2x2s Requested from Sampler○ Sampler L1 Cache Misses○ Flexible EU counters○ …

Mostly 3D counters

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol14-observability.pdf

Page 8: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

8

Introduction

Geom/FF GA

EU EU

EU EU

EUEU

EU EU

SP

L3

EU EU

EU EU

EUEU

EU EU

SP

L3

EU EU

EU EU

EUEU

EU EU

SP

L3

VF

DS GS

HS

VFE

TE BLT GAM

: OA nodes : direct connections : indirect connections

Media/FF

VE VD SFCGTI

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol04-configurations.pdf

OA unit

Page 9: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

9

Hardware overview

● Indirect connections examples :○ GTI Depth Throughput○ Sampler 0/1 Busy○ L3 Cache Misses○ Early Depth Bottleneck○ Hi-Depth Cache Misses○ Multisampling Color Cache misses○ Stencil Cache misses○ …

● HW programming needed to get specific information

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol14-observability.pdf

Page 10: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

10

OA reports

A counters B counters C counters

● Headers : timestamp + context ID + reason● A counters : 32 (40 bits) + 4 (32 bits)

○ Mostly 3D counters● B counters : 8 (32 bits)● C counters : 8 (32 bits)

256 bytes (Broadwell and above)

Hea

ders

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-kbl-vol14-observability.pdf

Page 11: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

11

i915 Interface

Exclusive access to the OA unit because of B/C counters programming.

2 ways to use the i915 API :● Query mode :

○ Have snapshots filtered by context ID○ Use in addition to the MI_REPORT_PERF_COUNT instruction

● Monitoring mode :○ All snapshots available (privileged access)

Page 12: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

12

i915 Interface

DRM Render Node /

masterFD

DRM_IOCTL_I915_PERF_OPEN● sampling period● configuration id● context id (optional)

i915/perfFD

Kernel

Userspace

read()poll()close()ioctl() enable/disable

Page 13: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

13

i915 Interface

i915/perfFDGPU

Snapshot

Snapshot

Snapshot

Snapshot

Snapshot

Snapshot

Snapshot

HW Memory Kernel

Header

Snapshot

Header

Snapshot

Header

Snapshot

Userspace

Page 14: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

14

Userspace

● Metrics Discovery (used by Graphics Performance Analyzers / VTUNE)○ https://github.com/intel/metrics-discovery

● GL_INTEL_performance_query extension○ https://www.khronos.org/registry/OpenGL/extensions/INTEL/INTEL_performance_query.txt

● GPUTop○ https://github.com/rib/gputop

Page 15: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

15

OpenGL performance queries

We can’t extract all the performance counters in one pass.

Counters are grouped in query IDs :

● Render Metrics Basic● Compute Metrics Basic● Render Metrics for 3D Pipeline Profile● Memory Reads Distribution● Memory Writes Distribution● Compute Metrics Extended ● Compute Metrics L3 Cache ● Metric set HDCAndSF● Metric set L3_1

● Metric set L3_2● Metric set L3_3● Metric set RasterizerAndPixelBackend● Metric set Sampler● Metric set TDL_1● Metric set TDL_2● Compute Metrics Extra ● Media Vme Pipe ● Gpu Rings Busyness

Page 16: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

16

OpenGL performance queries

GL_INTEL_performance_query :

● List query IDs :○ glGetFirstPerfQueryIdINTEL() / glGetNextPerfQueryIdINTEL()

● List counters for a given query ID :○ glGetPerfCounterInfoINTEL()

● Query data :○ glCreatePerfQueryINTEL() / glBeginPerfQueryINTEL() / glEndPerfQueryINTEL()

● Get data :○ glGetPerfQueryDataINTEL()

Page 17: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

17

OpenGL performance queries

glUseProgram()

… (more pipeline setup)

glBindBuffer()

glClear()

glBeginPerfQueryINTEL()

glEndPerfQueryINTEL()

glDrawArrays()

glDrawArrays()

A counters B counters C counters

Hea

ders

A counters B counters C counters

Hea

ders

A counters values B countersvalues

C countersvalues

glGetPerfQueryDataINTEL()

Application Driver

Page 18: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

18

OpenGL performance queries

https://github.com/janesma/apitrace

Page 19: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

19

GPUTop

● Client/Server model :○ Server runs on the target system to monitor○ Clients connects to the server and process the extracted data

● 2 clients :○ Command line tool :

■ records accumulated samples in CSV format■ track an application’s usage

○ User interface :■ Observe global usage■ Draw timelines

Page 20: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

20

GPUTop

Server :$ sudo gputop

Global monitoring :$ gputop-wrapper -m RenderBasic -c AvgGpuCoreFrequency,RasterizedPixels,Sampler0Busy

Application monitoring :$ gputop-wrapper -m RenderBasic -c AvgGpuCoreFrequency,RasterizedPixels,Sampler0Busy -- glxgears

Output : AvgGpuCoreFrequency, RasterizedPixels, Sampler0Busy (Hz), (pixels), (%) 295.3 MHz, 145.6 M pixels, 6.44 % 295.6 MHz, 119.5 M pixels, 4.84 % 295.8 MHz, 169.4 M pixels, 7.02 % 295.6 MHz, 97.31 M pixels, 3.97 % 295.6 MHz, 120.1 M pixels, 4.87 %

Page 21: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

21

GPUTop

Page 22: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

22

GPUTop - timelines

Page 23: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

23

GPUTop - high frequency sampling

Page 24: Performance Monitoring & Queries on Intel GPUs · OpenGL performance queries We can’t extract all the performance counters in one pass. Counters are grouped in query IDs : Render

Give performance queries a try :https://github.com/janesma/apitrace

Give GPUTop a try (kernel 4.14 recommended) :https://github.com/rib/gputop

http://gputop.com

Questions?