extensible networking platform 1 liquid architecture cycle accurate performance measurement richard...
TRANSCRIPT
Extensible Networking Platform 1 Liquid Architecture
Cycle Accurate Performance Measurement
Richard HoughPhillip Jones, Scott Friedman, Roger Chamberlain, Jason Fritts, John
Lockwood, and Ron Cytron
[email protected]://liquid.arl.wustl.edu/
Funded by NSF Grant ITR-0313203
Extensible Networking Platform 2 Liquid Architecture
OutlineOutline
• Introduction• Motivation• Background• Architecture• Usage• Results• Future Work• Related Work• Conclusion
Extensible Networking Platform 3 Liquid Architecture
Introduction – What Are We Doing?
• Creating a module for capturing cycle-accurate profiles of hardware events during the runtime of programs on real systems
Extensible Networking Platform 4 Liquid Architecture
Introduction – What Are We Doing?
• Creating a module for capturing cycle-accurate profiles of hardware events during the runtime of programs on real systems
StatisticsModule
Extensible Networking Platform 5 Liquid Architecture
Introduction – What Are We Doing?
• Creating a module for capturing cycle-accurate profiles of hardware events during the runtime of programs on real systems
StatisticsModule
Program Runtime
ProgramBottlenecks
Extensible Networking Platform 6 Liquid Architecture
Introduction – What Are We Doing?
• Creating a module for capturing cycle-accurate profiles of hardware events during the runtime of programs on real systems
StatisticsModule
Program Runtime
ProgramBottlenecks Cache
Hits
MemoryAccesses
ISADecoding
Extensible Networking Platform 7 Liquid Architecture
Introduction – What Are We Doing?
• Creating a module for capturing cycle-accurate profiles of hardware events during the runtime of programs on real systems
StatisticsModule
Program Runtime
ProgramBottlenecks Cache
Hits
MemoryAccesses
ISADecoding
Extensible Networking Platform 8 Liquid Architecture
Background - FPX• Designed and implemented on the FPX platform
• The FPX platform is:– Designed for developing pluggable network circuits– Contains a Virtex 2000e FPGA for design deployment– Possesses a smaller FPGA used as a network interface device
• Can potentially operate at gigabit line rates
Extensible Networking Platform 9 Liquid Architecture
Background - LEON2• Developed by Gaisler Research
– Sparc-V8– Open-Source VHDL– Widely used
• European Space Agency, etc.– Second in popularity only to the Microblaze
Extensible Networking Platform 10 Liquid Architecture
Motivation – Why Not Use Software?
• Software Profiling Is:– Inaccurate
• Many data points estimated• Time slices not absolute• Profiling affects results
– Inefficient• Unreasonable for real-system deployment
– Ineffective• Difficult to separate OS overhead
Extensible Networking Platform 11 Liquid Architecture
Motivation – Why Not Use Simulation?
• Simulation is:– Slow
• A simple simulation could require 100X more time than running the program
– Bound by the quality of the model• The model used may be inaccurate• Processors often tweaked without updating the
documentation [Larus]
Extensible Networking Platform 12 Liquid Architecture
Motivation – Why Use FPGAs?
• ASICs are expensive– FPGAs provide good blend of cost and
accuracy• Software simulation of processors is
incredibly slow• Allows for easy prototyping
– Test new caching methods, tweak the ISA, etc.
Extensible Networking Platform 13 Liquid Architecture
Motivation – Why Put Statsmod In A FPGA?
• The Statistics Module Allows You To:– Pull Event Signals from anywhere– Evaluate both software and hardware
optimizations• Tweak the architecture• Integrate hardware accelerated modules into
software solutions• Adjust the software algorithm
– Gather repeatable and reliable results
Extensible Networking Platform 14 Liquid Architecture
Eve
nt 0
Eve
nt 2
Eve
nt 3
Eve
nt 4
Eve
nt 5
Eve
nt 6
Eve
nt 1
Eve
nt 7
Eve
nt 8
Eve
nt 9
Method 0
Method 2
Method 3
Method 4
Method 5
Method 6
Method 1
Method 7
Method 8
Method 9
Architecture – Naïve Solution• Interested in 10 events
and counters– Naïve solution
implements a counter for each possibility
• 100 counters!– Not scalable for large
systems
Extensible Networking Platform 15 Liquid Architecture
Architecture – Our Solution• Better Approach
– Associate counters to events and methods at run time
– Covers the problem area, but uses less chip space
Eve
nt 0
Eve
nt 2
Eve
nt 3
Eve
nt 4
Eve
nt 5
Eve
nt 6
Eve
nt 1
Eve
nt 7
Eve
nt 8
Eve
nt 9
Method 0
Method 2
Method 3
Method 4
Method 5
Method 6
Method 1
Method 7
Method 8
Method 9
Extensible Networking Platform 16 Liquid Architecture
Architecture – An In Depth Look
Mux Control Register
Selected Event Selected ARR
EventMUX
ARRMUX
Event Signals
PC-ARR Comparison
Signals
32-Bit Counter
CLK
The Internet
Configuration UDP Packets
Statistic Result
Packets
Timer
Extensible Networking Platform 17 Liquid Architecture
Architecture – Scalability
Four Input LUT Usage
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140
N
4 In
put LU
Ts
Address Range
Registers
Counters
Events
Naïve Approach
Extensible Networking Platform 19 Liquid Architecture
Results – What do we get?
• The next few slides contain data from the Linpack benchmark running on the FPGA– Linpack is a FPU intensive benchmark
• While the following slides focus on runtime, it is important to remember that the graphs could in principle be of *any* event
Extensible Networking Platform 20 Liquid Architecture
Runtime Per Method
0
50000000
100000000
150000000
200000000
250000000
300000000
350000000
Method Name
Clo
ck C
ycle
sResults
323,686,726
Clo
ck C
ycle
s
Extensible Networking Platform 21 Liquid Architecture
Runtime per Time Slice
0
2000000
4000000
6000000
8000000
10000000
12000000
2 6 10 14 18 22 26 30 34 38 42 46 50 54 58 62 66 70 74 78
Time(s)
Clock Cycles
run_benchmark()
matgen()
dgefa()
dgesl()
daxpy()
ddot()
dscal()
idamax()
Results
Extensible Networking Platform 22 Liquid Architecture
Results
dscal() Runtime
0
50000
100000
150000
200000
250000
8 16 24 32 40 48 56 64 72 80 88
Time(s)
Clock Cycles
Extensible Networking Platform 23 Liquid Architecture
Results
dscal() Runtime
0
20000
40000
60000
80000
100000
120000
2 4 6 8 10 12 14 16 18 2022 24 26 28 3032 34 36 38 4042 44 46 4850 52 54 56 5860 62 64 66 6870 72 74 76 78
Time(s)
Clock Cycles
Extensible Networking Platform 24 Liquid Architecture
Future Work – Where can we go?
• As of a week ago, the StatsMod was successfully integrated into a Linux 2.6.11 OS running on Leon– Changes have been made to allow a clear
separation between Process IDs• OS, background tasks, threads
– A device driver allows any program, including the program being profiled, to gather the statistics
Extensible Networking Platform 25 Liquid Architecture
Future Work – Where can we go?
• Programs could now potentially collect statistics on themselves perform runtime introspection– Adjust operation to conserve power, memory
accesses, etc.– Deeper integration could occur at the kernel
level to affect scheduler decisions• Adds a new dimension for slicing resources
– Network activity, device activity, page faults, etc.
Extensible Networking Platform 26 Liquid Architecture
Related Work
• SnoopP– Developed by Lesley Shannon and Paul Chow
at the University of Toronto– Collects timing characteristics of programs
running on a Microblaze processor• Focuses on clock cycles only
– Integrated into the EDK
Extensible Networking Platform 27 Liquid Architecture
Conclusion
In closing, I would like to thank:– Phillip Jones for his hard work and support– Ron Cytron for his mentoring and persistence– Scott Friedman for his work on the web
interface– The rest of the Liquid Architecture team– And WISA for the invitation to present
Extensible Networking Platform 30 Liquid Architecture
Usage
1. Connect to a secure web server controlling the FPGA hardware
2. Upload the desired binary executable, associated mapfile, and desired programming bitfile
3. A perl script parses the map file and provides a graphical interface for selecting the desired address ranges and events
4. Statistic results are tabulated at the end of the program’s execution