initial observations of hardware/software co-simulation using fpga in architecture research

12
§ Georgia Institute of Technology, Intel Corporation Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research Taeweon Suh Taeweon Suh § Hsien-Hsin S. Lee Hsien-Hsin S. Lee § Shih-Lien Lu Shih-Lien Lu John Shen John Shen February 12, February 12, 2006 2006

Upload: silas-lester

Post on 31-Dec-2015

28 views

Category:

Documents


0 download

DESCRIPTION

Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research. Taeweon Suh § Hsien-Hsin S. Lee § Shih-Lien Lu † John Shen † February 12, 2006. § Georgia Institute of Technology, † Intel Corporation. Hardware/Software Co-simulation. Software simulation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

§ Georgia Institute of Technology, † Intel Corporation

Initial Observations of Hardware/Software Co-

simulation using FPGA in Architecture Research

Initial Observations of Hardware/Software Co-

simulation using FPGA in Architecture Research

Taeweon Suh Taeweon Suh §§ Hsien-Hsin S. Lee Hsien-Hsin S. Lee §§

Shih-Lien LuShih-Lien Lu †† John Shen John Shen ††

February 12,February 12, 20062006

Page 2: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

2Georgia Tech, Intel - WARFP 2006

Hardware/Software Co-simulationHardware/Software Co-simulation Software simulationSoftware simulation

– Advantages: Flexible, observable, easy-to-implementAdvantages: Flexible, observable, easy-to-implement– Disadvantage: Intolerable simulation timeDisadvantage: Intolerable simulation time

Hardware emulationHardware emulation– Advantage: Significant speedup, concurrent executionAdvantage: Significant speedup, concurrent execution– Disadvantages: Much less flexible and observable, Disadvantages: Much less flexible and observable,

low-level design taking longer time to implement and low-level design taking longer time to implement and validatevalidate

Hardware/Software Co-simulationHardware/Software Co-simulation– Try to retain advantages of both approachesTry to retain advantages of both approaches– Basic ideaBasic idea

Implement time-consuming software functions into FPGAImplement time-consuming software functions into FPGA The remaining simulator interacts with FPGAThe remaining simulator interacts with FPGA

Page 3: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

3Georgia Tech, Intel - WARFP 2006

Intel server systemIntel server system

Experiment EquipmentExperiment Equipment

Pentium-IIIPentium-III

ACE FPGA boardACE FPGA board

Logic analyzerLogic analyzerHost PCHost PC

UARTUART

Page 4: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

4Georgia Tech, Intel - WARFP 2006

Communication MethodCommunication Method

Communication between Pentium-III and FPGACommunication between Pentium-III and FPGA– Use FSB as communication mediumUse FSB as communication medium– Allocate one page of memory for communicationAllocate one page of memory for communication– SendSend data to FPGA: data to FPGA: write-throughwrite-through cache mode cache mode– ReceiveReceive data from FPGA: data from FPGA: cache-to-cachecache-to-cache transfer transfer

Front-side bus (FSB)

Pentium-III Pentium-III (MESI)(MESI)

Memorycontroller

2GB SDRAM

FPGAFPGA(Virtex-II)(Virtex-II)

“write” bus transaction

“cache-to-cache transfer”“read” bus transaction

cache line“FLUSH”

Page 5: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

5Georgia Tech, Intel - WARFP 2006

Hardware/Software Implementation Hardware/Software Implementation

Hardware (FPGA) implementationHardware (FPGA) implementation– State machinesState machines

Monitoring bus transactions on FSBMonitoring bus transactions on FSB Checking bus transaction types, i.e., read or writeChecking bus transaction types, i.e., read or write Managing cache-to-cache transferManaging cache-to-cache transfer

– Implementation of software functions to FPGAImplementation of software functions to FPGA– Debugging logic and statistics countersDebugging logic and statistics counters

Software implementationSoftware implementation– Linux device driverLinux device driver

FPGA needs to know when to respond to FSB transactionsFPGA needs to know when to respond to FSB transactions Specific physical address is needed for communication Specific physical address is needed for communication Allocate one page of memory for FPGA access via Allocate one page of memory for FPGA access via Linux Linux

device driverdevice driver– Simulator modification for accessing FPGASimulator modification for accessing FPGA

Page 6: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

6Georgia Tech, Intel - WARFP 2006

Example: Simplescalar Co-simulationExample: Simplescalar Co-simulation Preliminary experiment for correctness checkupPreliminary experiment for correctness checkup

– Implement a simple function (Implement a simple function (mem_access_latencymem_access_latency) ) into FPGAinto FPGA

Co-simulation resultsCo-simulation results

mcf

bzip2

craftyeon-cook

Baseline (h:m:s)Co-simulation

(h:m:s)difference

(h:m:s)2:18:38 2:20:50 + 0:02:12

gcc-166

parser

perl

twolf

3:03:58 3:06:50 + 0:02:52

2:56:38 2:59:28 + 0:02:50

2:43:52 2:45:45 + 0:01:53

3:45:30 3:48:56 + 0:03:26

3:34:57 3:37:27 + 0:02:30

2:42:30 2:45:50 + 0:03:20

2:43:30 2:45:28 + 0:01:58

Page 7: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

7Georgia Tech, Intel - WARFP 2006

Co-simulation Results Analysis Co-simulation Results Analysis

FSB access is expensiveFSB access is expensive– ~ 20 FSB cycles (~ 20 FSB cycles (≈ ≈ 160 CPU cycles) for each transfer160 CPU cycles) for each transfer

One cache line (32 bytes) needs to be transferred for One cache line (32 bytes) needs to be transferred for cache-to-cache transfercache-to-cache transfer

P-III MESI requires to update main memory upon cache-P-III MESI requires to update main memory upon cache-to-cache transferto-cache transfer

““mem_access_latency”mem_access_latency” function is too simple function is too simple– Even software simulation takes at most a few dozen Even software simulation takes at most a few dozen

CPU cyclesCPU cycles Device driver overhead Device driver overhead

– System overhead due to device driverSystem overhead due to device driver– It requires one TLB entry, which would be used in the It requires one TLB entry, which would be used in the

simulation otherwisesimulation otherwise Time-consuming software routines and reasonable Time-consuming software routines and reasonable

FPGA access frequency are needed to benefit from FPGA access frequency are needed to benefit from hardware implementationhardware implementation

Page 8: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

8Georgia Tech, Intel - WARFP 2006

On-going WorkOn-going Work

SoftSDV co-simulation for multi-core researchSoftSDV co-simulation for multi-core research– Implement distributed lowest level caches, and Implement distributed lowest level caches, and

interconnection network such as ring or mesh in FPGAinterconnection network such as ring or mesh in FPGA

L3

L3

CPU0L1,L2

Ring I/F

Ring I/F

CPU4

L1,L2

L3

L3

CPU1L1,L2

Ring I/F

Ring I/F

CPU5

L1,L2

L3

L3

CPU2L1,L2

Ring I/F

Ring I/F

CPU6

L1,L2

L3

L3

CPU3L1,L2

Ring I/F

Ring I/F

CPU7

L1,L2

FPGAFPGA

Page 9: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

9Georgia Tech, Intel - WARFP 2006

Conclusions Conclusions

Proposed a new co-simulation methodologyProposed a new co-simulation methodology Preliminary co-simulation using Simplescalar proves Preliminary co-simulation using Simplescalar proves

the correctness of the methodologythe correctness of the methodology – Hardware/softwareHardware/software implementationimplementation– Communication between P-III and FPGA via FSBCommunication between P-III and FPGA via FSB– Linux driver Linux driver

Co-simulation results indicateCo-simulation results indicate – Bus access (FSB) is expensiveBus access (FSB) is expensive– Linux driver overhead also needs to be overcomeLinux driver overhead also needs to be overcome– Time-consuming blocks need to be emulatedTime-consuming blocks need to be emulated

Multi-core co-simulation would benefit from FPGAMulti-core co-simulation would benefit from FPGA– Implement distributed low-level caches and Implement distributed low-level caches and

interconnection network, which would be complex interconnection network, which would be complex enough to benefit from hardware modelingenough to benefit from hardware modeling

Page 10: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

10Georgia Tech, Intel - WARFP 2006

Questions, Comments?Questions, Comments?

Thanks for your attention!

Page 11: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

11Georgia Tech, Intel - WARFP 2006

Backup Slides

Page 12: Initial Observations of Hardware/Software Co-simulation using FPGA in Architecture Research

12Georgia Tech, Intel - WARFP 2006

Communication DetailsCommunication Details

All FSB signals are mapped to FPGA pinsAll FSB signals are mapped to FPGA pins Encoding software function arguments in the FSB Encoding software function arguments in the FSB

address for Simplescalar exampleaddress for Simplescalar example– For 4KB page,For 4KB page,

Set its attribute as write-through modeSet its attribute as write-through mode Lower 12 bits in FSB address bus are free to useLower 12 bits in FSB address bus are free to use High 24 bits are used for TLB translationHigh 24 bits are used for TLB translation

Front-side bus (FSB)

Pentium-III Pentium-III (MESI)(MESI)

XilinxXilinxVirtex-IIVirtex-II