modeling ion channel kinetics with high-performance computation
DESCRIPTION
Modeling Ion Channel Kinetics with High-Performance Computation . Allison Gehrke Dept. of Computer Science and Engineering University of Colorado Denver. Agenda. Introduction Application Characterization, Profile, and Optimization Computing Framework Experimental Results and Analysis - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/1.jpg)
Modeling Ion Channel Kinetics with High-Performance Computation
Allison GehrkeDept. of Computer Science and Engineering
University of Colorado Denver
![Page 2: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/2.jpg)
Agenda
• Introduction • Application Characterization, Profile, and
Optimization• Computing Framework• Experimental Results and Analysis• Conclusions• Future Research
![Page 3: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/3.jpg)
Introduction Target application – Kingen
Simulates ion channel activity (kinetics) Optimizes kinetic model rate constants to
biological data Ion Channel Kinetics
Transition states Reaction rates
![Page 4: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/4.jpg)
1 10 20 40 100
400
1500
0
200
400
600
800
1000
1200
1400
1600
1800
2000
8 core xeon 5355quad core q6600
Chromosomes
Tim
e (s
econ
ds)
Computational Complexity
![Page 5: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/5.jpg)
AMPA Receptors
![Page 6: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/6.jpg)
Kinetic Scheme
![Page 7: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/7.jpg)
Introduction:Why study ion channel kinetics?
Protein function Implement accurate mathematical models Neurodevelopment Sensory processing Learning/memory Pathological states
![Page 8: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/8.jpg)
Modeling Ion Channel Kinetics with High-Performance Computation
• Introduction
• Application Characterization, Profile, and Optimization
• Computing Framework• Experimental Results and Analysis• Conclusions• Future Research
![Page 9: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/9.jpg)
System-Level
Application-Level
Optimization
Intel Vtune
Intel Pin
Profiling
CPU GPU
NVIDIA
CUDA
Multicore
Intel
TBB
Intel Compiler & SSE2
Parallel Architectures
Adapting Scientific Applications to Parallel Architectures
![Page 10: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/10.jpg)
1 2 3 4 5 6 7 80
50
100
150
200
250
under utilizedspin timewait timeactive time
Core
Tim
e (s
econ
ds)
System Level – Thread Profile
Fully utilized 93% Under utilized 4.8%
Serial: 1.65%
![Page 11: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/11.jpg)
Hardware Performance Monitors
Processor utilization drops Constant available memory
Context switches/sec increases Privileged time increases
![Page 12: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/12.jpg)
System-Level
Application-Level
Optimization
Intel Vtune
Intel Pin
Profiling
CPU GPU
NVIDIA
CUDA
Multicore
Intel
TBB
Intel Compiler & SSE2
Parallel Architectures
Adapting Scientific Applications to Parallel Architectures
![Page 13: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/13.jpg)
Application Level Analysis
Hotspots CPI FP Operations
![Page 14: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/14.jpg)
Hotspots
10.1 11.1calc_funcs_ampa 59.51% 30.45%
runAmpaLoop 40.04% 40.99%
calc_glut_conc 0.45% 2.16%operator[] 0% 25.92%get_delta 0% 0.48%
![Page 15: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/15.jpg)
CPI FP Assist
FP Instructions Ratio
v 10.1 3.464 .85 .13v 11.1 0.536 0.0011 0.0028
FP Impacting Metrics
CPI .75 good 4 poor - indicates instructions
require more cycles to execute than they should
Upgrade ~9.4x speedup
FP assist 0.2 low 1 high
![Page 16: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/16.jpg)
Post compiler Upgrade Improved CPI and FP operations Hotspot analysis
Same three functions still “hot” FP operations in AMPA function optimized
with SIMD STL vector operator get function from a class object
Redundant calculations in hotspot region
![Page 17: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/17.jpg)
Manual Tuning
Reduced function overhead Used arrays instead of STL vectors Reduced redundancies
Eliminated get function Eliminated STL vector operator[ ]
~2x speedup
![Page 18: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/18.jpg)
Application Analysis Conclusions
compiler upgrade manual tuning0
1
2
3
4
5
6
7
8
9
10Sp
eedu
p
runAmpaLoop 91.83 %calc_glut_conc 4.4 %
ge 0.02 %libm_sse2_exp 0.02 %
All others 3.73 %
![Page 19: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/19.jpg)
System-Level
Application-Level
Optimization
Intel Vtune
Intel Pin
Profiling
CPU GPU
NVIDIA
CUDA
Multicore
Intel
TBB
Intel Compiler & SSE2
Parallel Architectures
Observations
![Page 20: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/20.jpg)
Computer Architecture Analysis
DTLB Miss Ratios L1 cache miss rate L1 Data cache miss performance impact L2 cache miss rate L2 modified lines eviction rate Instruction Mix
![Page 21: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/21.jpg)
FP Other Branch0
102030405060708090
100
Instruction Mix%
Ret
ired
Inst
ruct
ions
![Page 22: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/22.jpg)
Computer Architecture Analysis Results
FP instructions dominate Small instruction footprint fits in L1 cache L2 handling typical workloads Strong GPU potential
![Page 23: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/23.jpg)
Modeling Ion Channel Kinetics with High-Performance Computation
• Introduction • Application Characterization, Profile, and
Optimization
• Computing Framework• Experimental Results and Analysis• Conclusions• Future Research
![Page 24: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/24.jpg)
Computing Framework
Multicore coarse-grain TBB implementation
GPU acceleration in progress Distributed multicore in progress (192 core
cluster)
![Page 25: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/25.jpg)
TBB Implementation
Template library that extends C++ Includes algorithms for common parallel
patterns and parallel interfaces Abstracts CPU resources
![Page 26: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/26.jpg)
tbb:parallel_for
Template function Loop iterations must be independent Iteration space broken into chunks TBB runs each chunk on a separate
thread
![Page 27: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/27.jpg)
tbb:parallel_for
parallel_for(blocked_range<int>(0,GeneticAlgo::NUM_CHROMOS),
ParallelChromosomeLoop(tauError, ec50PeakError, ec50SteadyError, desensError, DRecoverError, ar, thetaArray),
auto_partitioner()
);
for (int i = 0; i < GeneticAlgo::NUM_CHROMOS; i++){call ampa macro 11 times calculate error on the chromosome (rate constant set)
}
![Page 28: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/28.jpg)
tbb::parallel_for: The Body Object
Need member fields for all local variables defined outside the original loop but used inside it
Usually constructor for the body object initializes member fields
Copy constructor invoked to create a separate copy for each worker thread
Body operator() should not modify the body so it must be declared as const
Recommend local copies in operator()
![Page 29: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/29.jpg)
Ampa Macro
calc_bg_ampa – defines differential equations that describe ampa kinetics based on rate constant set
GA to solve the system of equations runAmpaLoop Runge-Kutta method
![Page 30: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/30.jpg)
Ampa Macro
calc_bg_ampa – defines differential equations that describe ampa kinetics based on rate constant set
GA to solve the system of equations runAmpaLoop Runge-Kutta method
![Page 31: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/31.jpg)
Initialize Chromosomes
Coarse-grained parallelismGen
0
Serial Execution
Gen 1
Genetic Algo population has better fit on average
Convergence
Gen N
.
.
.
Chromo 0
……Calc Error
Ampa Macro
Chromo 1 + r Chromo N
Chromo 0
……Calc Error
Ampa Macro
Chromo 1 + r Chromo N
![Page 32: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/32.jpg)
Genetic Algorithm Convergence
![Page 33: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/33.jpg)
Runge-Kutta 4th Order Method (RK4)
runAmpaLoop: numerical integration of differential equations describing our kinetic scheme
RK4 Formulas:x(t + h) = x(t) + 1/6(F1+ 2F2 +2F3 + F4)where
F1 = hf(t, x) F2 = hf(t + ½ h, x + ½ F1) F3 = hf(t + ½ h, x + ½ F2) F4 = hf(t + h, x + F3)
![Page 34: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/34.jpg)
RK4
Hotspot is the function that computes RK4 Need finer-grained parallelism to alleviate
hotspot bottleneck How to parallelize RK4?
![Page 35: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/35.jpg)
Modeling Ion Channel Kinetics with High-Performance Computation
• Introduction • Application Characterization, Profile, and
Optimization• Computing Framework
• Experimental Results and Analysis
• Conclusions• Future Research
![Page 36: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/36.jpg)
Experimental Results and Analysis
Hardware and software set-up Domain specific metrics? Parallel speed-up Verification
![Page 37: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/37.jpg)
CPUIntel® Xeon™ CPU X5355 @
2.66 GHz
Intel ® Core™ 2 Quad CPU Q6600
@ 2.40 GHz
Intel ® Core™ 2 Quad CPU Q6600
@ 2.40 GHz
Cores 8 4 4
Memory 3 GB 3 GB 8 GB
OS Windows XP Pro Windows XP Pro Fedora
CompilerIntel C++ Compiler (11.1, 10.1)
Intel C++ Compiler (11.1, 10.1)
Intel C++ Compiler (11.1)
Intel TBB Version 2.1 Version 2.1 Version 2.1
Configuration
![Page 38: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/38.jpg)
1 10 20 40 100
400
1500
0
200
400
600
800
1000
1200
1400
1600
1800
2000
8 core xeon 5355quad core q6600
Chromosomes
Tim
e (s
econ
ds)
Computational Complexity
![Page 39: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/39.jpg)
1 2 4 80
2
4
6
8
10
12
14
quad core q6600 64 bit lin8 core xeon 5355 XPquad core q6600 32 bit win
Cores
Spee
dup
Parallel Speedup
Baseline: 2 generations, after compiler upgrade, prior to manual tuning
Generation number magnifies any performance improvement
![Page 40: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/40.jpg)
Verification
MKL and custom Gaussian elimination routine get different results (sometimes)
Small variation in a given parameter changed error significantly
Non-deterministic
![Page 41: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/41.jpg)
Conclusions
Process that uncovers key characteristics is important
Kingen needs cores/threads – lots of them Need ability automatically (semi-?) identify
opportunities for parallelism in code Better validation methods
![Page 42: Modeling Ion Channel Kinetics with High-Performance Computation](https://reader036.vdocuments.mx/reader036/viewer/2022062815/5681692d550346895de07047/html5/thumbnails/42.jpg)
Future Research
192-core cluster GPU acceleration Programmer-led optimization Verification Model validation Techniques to simplify porting to massively
parallel architectures