hparch research group - georgia institute of...

30
HPArch Research Group

Upload: others

Post on 11-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

HPArch Research Group

Page 2: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Part 2. Overview of MacSimIntroduction For black box approach users

| Part 3: Details of MacSimFor computer architecture researchers

| Part 4.MacSim-SST running case studies Ocelot-MacSim case studies Research using Ocelot Research using MacSim

2/30

MacSim Tutorial (In HPCA-18, 2012)

Page 3: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Heterogeneous architecture simulator (x86+PTX)

| Developed from Georgia Tech

| Trace driven simulator Internal RISC style micro-op generation module X86 traces – using Pin, PTX traces – using GPUOcelot

| Cycle-level simulator Cores, caches, memory systems are modeled

| Support various simulations - single/multi-threaded application, multi-program, heterogeneous (CPU+GPU)

3/30

MacSim Tutorial (In HPCA-18, 2012)

Page 4: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Flexible design to support various platforms

| Integration with a parallel simulator (SST) to support high-performance computing systems

| From mobile to Exascale computing systems

4/30

MacSim Tutorial (In HPCA-18, 2012)

Page 5: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

X86 binaries

CUDA code(.cu)

Open GL code PIN(API Generator)

PINTrace Generator

NVCC(Compiler)

GPUOcelotTrace Generator

Attila(OpenGL Emulator)

Heterogeneous Architecture

Timing & Power Simulator

PTX code

Prof. Yalamanchili(Georgia Tech)

InstructionThread information

Ongoing Work

5/30

MacSim Tutorial (In HPCA-18, 2012)

Page 6: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Getting MacSim Stable version – google code projecthttp://macsim.googlecode.com/files/macsim-1.0.tar.gzLatest code from SVN repository

| Directions are explained inhttp://code.google.com/p/macsim/wiki/GettingMacsim

| How to buildhttp://code.google.com/p/macsim/wiki/BuildingMacsimChapter 2 of manual provides an instruction to buildREADME file in the simulator directory

6/30

MacSim Tutorial (In HPCA-18, 2012)

Page 7: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group) is includedCPU trace generator Download PIN separately. Trace generator tool is in the MacSim PackageGPU trace generator Download Ocelot Separately. Trace generator is in the Ocelot’s package

| SST Patch SST needs to be downloaded separately

| Energy Introspector (From Prof. Yalamanchili’s group) EI is a power model based on McPAT, HotSpot. Because of McPAT license issue, currently EI cannot be distributed, but we will resolve this issue soon

7/30

MacSim Tutorial (In HPCA-18, 2012)

Page 8: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

8

MacSim Tutorial (In HPCA-18, 2012)

Page 9: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Once build process is successful, binary will be created inmacsim-top/trunk/bin/macsim

| Screenshot of a simulation

| Now, How to configure simulation models ?

9/30

MacSim Tutorial (In HPCA-18, 2012)

Page 10: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Knob variables need to set up (3 ways)Default value in the source codeParams.inCommand line

Core type 1 Core type 2 Core type 3Core type 1 Core type 2 Core type 3

Core type 1 Core type 2 Core type 3Core type 1 Core type 2 Core type 3

Core type 1 Core type 2 Core type 3

Memory

10/30

MacSim Tutorial (In HPCA-18, 2012)

Page 11: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

num_sim_cores 4   // 4 cores num_sim_small_cores 0num_sim_medium_cores 0num_sim_large_cores 4max_threads_per_large_core 2large_core_type x86repeat_trace 1

| Configuration4 cores2-way SMT

param<NUM_SIM_CORES, num_sim_cores, int, 4>

./macsim –num_sim_cores=4

.def

params.in

commandline

11/30

MacSim Tutorial (In HPCA-18, 2012)

Page 12: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| To configure CPU+GPU arch.Set up number of cores and type accordingly

num_sim_cores 8   // 4 CPUs + 4 GPUsnum_sim_small_cores 4  // 4 GPUnum_sim_medium_cores 0num_sim_large_cores 4  // 4 CPUscore_type ptx // specify small cores large_core_type x86cpu_frequency 3gpu_frequency 1.5repeat_trace 1

| Usually, we use small core for GPU and large for CPU

| GPU has internally multiple processing elements (N-wide SIMD)

12/30

MacSim Tutorial (In HPCA-18, 2012)

Page 13: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Multiple Applications Set up from trace_file_list

MCF GCCMM

thread1

MMthread

2

Blac

ksch

oles

4 <‐‐ number of applications/sample/mcf/trace.txt <‐ appl 1/sample/gcc/trace.txt <‐ appl 2/sample/mm/trace.txt <‐ appl 3/sample/blackscholes/trace.txt <‐ appl 4

13/30

MacSim Tutorial (In HPCA-18, 2012)

Page 14: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Execution time for each application is different. | Provide an option to enable repeat short traces until the

longest trace ends

| Whether it’s the right way to simulate?

mcf

gcc gcc gcc gcc

bfs bfs bfs bfs bfs

Program 1

Program 2

Program 3

14/30

MacSim Tutorial (In HPCA-18, 2012)

Page 15: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Sample configuration files inmacsim-top/trunk/params

File name Contents params_8800gt GeForce 8800 GT (G80)

params_gtx280 GeForce GTX 280 (GT200)params_gtx465 NVIDIA GeForce GTX 465 (Fermi)

params_gtx465 GeForce GTX 465 (Fermi)

params_x86 Intel’s Sandy Bridge (CPU part only)

params_hetero_4c_4g Intel’s Sandy Bridge (CPU + GPU)

15/30

MacSim Tutorial (In HPCA-18, 2012)

Page 16: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Thread spawn is modeled.| Lock is not modeled.

GPU Kernel invocation

core

16/30

Main thread

Threads spawn

Barrier

Host thread

core core core

MacSim Tutorial (In HPCA-18, 2012)

Page 17: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| It will be covered in Part-III

| Trace generator will generate thread execution information is automatically.

| Users do not need to worry about this.

17/30

MacSim Tutorial (In HPCA-18, 2012)

Page 18: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| X86 instructions are mapped to uops| PTX instructions are mapped to uops (almost 1-1 mapping)

| Pipeline stages

Pin

XED

Macro instructions with decoded information from Pin’s XED

MacSim

Trace decoder uops

Timing/power

simulator

Front-end

DEC/Rename Sch. Exe

Mem

Retire

18/30

MacSim Tutorial (In HPCA-18, 2012)

Page 19: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Front-end, DEC/Rename: Just a simple FIFO queue. fetch_latency 5 // front-end depthalloc_latency 5 // decode/allocation depthwidth // pipeline width (same width for all the pipeline)bp_dir_mech gsharebp_hist_length 14 // branch history length

| Rename: create RAW dependency (map structure)rob_size 96 // ROB size

| Scheduler // in-order scheduler, ooo scheduler schedule io, ooo // instruction scheduling policy

19/30

MacSim Tutorial (In HPCA-18, 2012)

Page 20: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Execution latency Fixed uop latency (macsim-top/def/uop_latency_[x86,ptx].def)Variable latency: Cache/Memory latency

| Instruction scheduling ratesisched_rate 4 // # of integer inst. that can be executed per cyclemsched_rate 2 // # of memory inst. that can be executed per cyclefsched_rate 2 // # of FP inst. That can be executed per cycle

20/30

MacSim Tutorial (In HPCA-18, 2012)

Page 21: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Cache configuration# of sets, # of associativity, line size, # of banks, etc. (See manual)

| Cache size = # of sets x assoc x line_size x # of tiles

| DRAM configurationFrequency, bus width, column/activate/precharge latency# of Memory controllers, # banks, # channels, row buffer size, DRAM scheduling policy Simple, but fast DRAM model that models key features

| SST-MacSim is connected with DRAM-SIM2Users can use DRAM-SIM2 for a detailed DRAM timing simulation

21/30

L3 only

MacSim Tutorial (In HPCA-18, 2012)

Page 22: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| StatisticsSimulation outputs: *.stat.outmacsim/trunk/def file has stat definition (more details in Part-III)

| Important StatsIPC = INST_COUNT_TOT/CYC_COUNT_TOTCPI = CYC_COUNT_TOT/INST_COUNT_TOT

| Per Core stats IPC for core 0 INST_COUNT_CORE_0/CYC_COUNT_CORE_0

| Multiple applications stats *.stat.out.<application_id> e.g.) memory.stat.out.0, bp.stat.out.1Each stat file contains stats only for the first running (repeated simulations are ignored)

22/30

MacSim Tutorial (In HPCA-18, 2012)

Page 23: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Memory Systems L[1-3]_HIT_CPU/L[1-3]_HIT_GPUL[1-3]_MISS_CPU/L[1-3]_MISS_GPU

| Front-end BP_ON_PATH_[CORRECT/MISPREDICT/MISFETCH ]

| Instruction profiling Based on instruction category. inst.stat.out

| More details regarding statistics are in the documentation

| We will provide simple script file to fetch stat data

23/30

MacSim Tutorial (In HPCA-18, 2012)

Page 24: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

24

MacSim Tutorial (In HPCA-18, 2012)

Page 25: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| Multi-threading support is already there.| Different ISAs: using micro-ops | Warp ?

One warp is treated as one thread. Each thread generates its own trace file. Active bit information is includedTrace format will be explained in Part-III

| Thread and block scheduling Block-level barrier, block-level scheduling/retirement More details will be explained in Part-III

| Different memory structures Memory systems

25/30

MacSim Tutorial (In HPCA-18, 2012)

Page 26: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

26/30

| Include the memory access by each thread of a warp as aseparate instruction in the trace

| In trace, mark these accesses as coming from the same warp

SIMD load instructionAddr 0 Addr 1 Addr 2 Addr 3 Addr 4 Addr 5 Addr 6 Addr 7

Coalesced Uncoalesced

Mem inst with 128B size 64B Request 32B Req. 32B Req.

TraceInst

TraceInst_beginTraceMem1TraceMem2TraceMem3

TraceInst_end

Trace file Trace filestart of memory

instruction marker

end of memory instruction marker

MacSim Tutorial (In HPCA-18, 2012)

Page 27: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

27/30

| During simulation, form a “parent” uop that holds all theindividual memory accesses as its child uops

| Parent uop flows through the pipeline, only in the memorystage, the individual children uops are issued to the memory

Parent uop is ready for retirement when all children have completed

TraceInst_beginTraceMem1TraceMem2TraceMem3

…TraceMemN

TraceInst_end

Trace filestart of memory

instruction marker

end of memory instruction marker

MacSim

uop

addr0 addr1 addr2 addr3

addr4 addr5 … addrN

Mem_type: ld#children: 8

Parent uop

Children uops

MacSim Tutorial (In HPCA-18, 2012)

Page 28: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

28

MacSim Tutorial (In HPCA-18, 2012)

Page 29: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

| IRIS (From Prof. Yalamanchili’s group)Flit-level interconnection network simulatorVirtual channel, credit-based flow controldeadlock-avoidance, …Part-IV will cover more.

| MacSim-SST Parallel simulation

Node

Node

NodeNode

Topology(Ring, Mesh, Torus, ..)

routerrouter

29/30

MacSim Tutorial (In HPCA-18, 2012)

Page 30: HPArch Research Group - Georgia Institute of Technologycomparch.gatech.edu/hparch/tutorial_slides/hpca... · | Macsim package IRIS (NoC simulator from Prof. Yalamanchili’s group)

2012 ~ 2013

Power/Energy Model

ARM ArchitectureMobile Platform

OpenGL Program 30/30

MacSim Tutorial (In HPCA-18, 2012)