combining simulators and fpgas “an out-of-body experience”

15
Computer Architecture Lab at Combining Simulators and FPGAs “An Out-of-Body Experience” Eric S. Chung, Brian Gold, James C. Hoe, Babak Falsafi {echung, bgold, jhoe, babak}@ece.cmu.edu SIMFLEX/PROTOFLEX

Upload: charis

Post on 12-Feb-2016

30 views

Category:

Documents


0 download

DESCRIPTION

Combining Simulators and FPGAs “An Out-of-Body Experience”. Eric S. Chung , Brian Gold, James C. Hoe, Babak Falsafi {echung, bgold, jhoe, babak}@ece.cmu.edu. S IM F LEX /P ROTO F LEX. The RAMP full-system challenge. RAMP vision for studying systems w/ FPGAs - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Combining Simulators and FPGAs  “An Out-of-Body Experience”

Computer Architecture Lab at

Combining Simulators and FPGAs “An Out-of-Body Experience”

Eric S. Chung, Brian Gold, James C. Hoe, Babak Falsafi

{echung, bgold, jhoe, babak}@ece.cmu.edu

SIMFLEX/PROTOFLEX

Page 2: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 2

The RAMP full-system challenge• RAMP vision for studying systems w/ FPGAs

– functional & cycle-accurate simulation

– scalability, speed, & flexibility on FPGAs

– full-system (run unmodified binaries & OS)

PCI Bus

Ethernetcontroller

Graphics card

I/O MMUcontroller

DiskDisk

DMAcontroller

IRQ controller

Terminal

MemorySCSI

controller

CPU CPU

‘Full-sys’ RAMP will incur large effortyet, not all behaviors frequently used (e.g., I/O)

Page 3: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 3

• Simulators already provide full-system why not simulate infrequent behaviors (e.g., I/O devices)?

Combining simulators & FPGAs

• Advantages– avoid impl. infreq. behaviors lowers full-sys FPGA development

– low impact on scalability & perf. on FPGA

Memory Memory SCSI

disk

SCSI

disk

FPGA SimulatorCPU

Ethernet Ethernet

CPUCPU CPU

Page 4: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 4

Outline

• Motivation

• Migration

• Implementation status

• Conclusion

Page 5: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 5

Migration

• 3 ways to map target object to hostFPGA-only Simulation-only Migratable

• Migratable objects– switch modes between FPGA & simulator hosts

– target behavior need not be 100% in FPGA mode

e.g., impl. 80% target behavior in FPGA, 100% in simulator

1 2 3

Target design FPGA Simulator

1 2

3

“Target objects”ex: func or timing cpu

Page 6: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 6

Migration exampleTarget-to-host mappings:• CPU = migratable

• Memory = FPGA-only

• Devices = SW-only Memory

Memory SCSI

disk

SCSI

FPGA

SimulatorCPU

time

load

Example CPU instruction stream

CPU

addmultiply

I/O SCSI cmdaddsub

..

SCSI cmd

CPU state transfer

loadCPU

Page 7: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 7

Advantages• Lowers development effort

– avoid bring-up of infrequent behaviors

– migrate & validate ref. models from simulator

– tailor impl. to workload (avoid rarely used instrs, good for CISC x86)

• Fast & scalable– perf-critical objects on FPGA (eg, CPU, memory)

– scalable for MPs add migratable CPUs

Memory SCSI

FPGA SimulatorCPU CPU CPU

Memory

CPU

SCSI

disk

CPU CPU

Page 8: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 8

Subtleties• Objects separated in simulator/FPGA interact

– examples: interrupts, DMA

– handle by forwarding messages between FPGA/simulator

– FPGA-only & SW-only mapped objects easy to locate

– migrated objects require tracking

Memory Memory SCSI

disk

SCSI

FPGA SimulatorCPUCPUCPU DMA

Forwarded DMA

Page 9: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 9

Subtleties• Objects separated in simulator/FPGA interact

– examples: interrupts, DMA

– handle by forwarding messages between FPGA/simulator

– FPGA-only & SW-only mapped objects easy to locate

– migrated objects require tracking

Memory Memory SCSI

disk

SCSI

FPGA SimulatorCPUCPUCPU

Interrupt

Option 1:Forwardedinterrupt

Option 2:Forced migration

Cross-host interactions rare low impact on FPGA perf.

Page 10: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 10

Subtleties cont.• Migration cost

– migrating object requires state copy

e.g., migratable CPU has registers & TLBs

– FPGA-to-simulator latency & sim. time limits # migrations/instr

• FPGA & simulator asynchrony– simulated time “ticks” at different rates in FPGA & simulator

– must synchronize for deterministic replay & accurate device timing

Page 11: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 11

Outline

• Motivation

• Migration

• Implementation in progress

• Conclusion

Page 12: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 12

Implementation status• Target system

– Sun Fire[tm] 3800 Server (up to 24-way)

– UltraSPARC III ISA

– Solaris 8

• Proof-of-concept software-to-software migration– run 2 instances of Virtutech Simics

– migration designed & tested in 2 weeks

– can migrate on arbitrary behavior (e.g., ADD instruction)

Page 13: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 13

BlueSPARC core (in progress)• In-order SPARCV9 core

– supports 144 out of 170 integer instr behaviors

– supports partial MMU w/ I- & D-TLBs

– goal: 99.999% of instrs & behaviors in target workloads• SPEC (mostly user-level), OLTP/DB2 (high TLB misses, 40% time in priv-mode)

– CPI ranges 5 to 7 cycles

– synth: 15k LUTs on Virtex-II Pro 30, 85MHz, 12MIPS (worst-case)

– developed in Bluespec HDL, 6000L in 6 weeks

• Core validation – run RTL in lockstep w/ Simics’s UltraSPARC simulation model

– workload validation w/ SPEC, OLTP/DB2, OpenSPARC verif. suite

Page 14: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 14

Migration on FPGA (in progress)

Xilinx XUP Virtex-II Pro 30 Virtutech Simics

Migration& messageinterface

• PowerPC functions– core & memory initialization from Simics checkpoints

– facilitates migration for BlueSPARC

– connects simulated devices to memory (e.g., SCSI DMA)

ethernet

Simics UltraSPARC

Simulated target devices

BlueSPARC PowerPC

DDR memory

Page 15: Combining Simulators and FPGAs  “An Out-of-Body Experience”

June 22, 2006 Eric S. Chung / RAMP 2006 Summer Retreat 15

Conclusion• Contributions

– virtualizes infrequent behaviors using simulation

– simplifies full-system FPGA emulator, still fast/scalable

– incremental validation from reference system

• Future work– support migration in RDL?

– adding cores + scaling across multiple FPGAs

• We are ready for BEE2• Thanks! Questions? [email protected]• PROTOFLEX/SIMFLEX (http://www.ece.cmu.edu/~simflex)