mpi+pgas hybrid programmingpire.cct.lsu.edu/documents/tomko.pdf• (mpi + pgas) model – mpi across...
TRANSCRIPT
![Page 1: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/1.jpg)
Slide 1
MPI+PGAS Hybrid Programming Karen Tomko [email protected] In collaboration with DK Panda and the Networked-Based Computing Research group at Ohio State University http://nowlab.cse.ohio-state.edu
![Page 2: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/2.jpg)
Slide 2
Background: Systems and Programming Models
![Page 3: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/3.jpg)
Slide 3
Drivers'of'Modern'HPC'Cluster'Architectures'
• Mul67core'processors'are'ubiquitous'
• InfiniBand'very'popular'in'HPC'clusters'• ''
• Accelerators/Coprocessors'becoming'common'in'high7end'systems'
• Pushing'the'envelope'for'Exascale'compu6ng'''
'
'
Accelerators'/'Coprocessors''high'compute'density,'high'performance/waG'
>1'TFlop'DP'on'a'chip''
High'Performance'Interconnects'7'InfiniBand'<1usec'latency,'>100Gbps'Bandwidth'''
Tianhe'–'2'(1)' Titan'(2)' Stampede'(7)'
Mul67core'Processors'
Piz'Daint'(CSCS)'(6)'
![Page 4: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/4.jpg)
Slide 4
Parallel Programming Models Overview
P1 P2 P3
Shared Memory
P1 P2 P3
Memory Memory Memory
P1 P2 P3
Memory Memory Memory
Logical shared memory
Shared Memory Model Distributed Memory Model
MPI (Message Passing Interface)
Partitioned Global Address Space (PGAS)
Global Arrays, UPC, OpenSHMEM, CAF, …
• Programming models provide abstract machine models • Models can be mapped on different types of systems
– e.g. Distributed Shared Memory (DSM), MPI within a node, etc. • Additionally, OpenMP can be used to parallelize computation
within the node • Each model has strengths and drawbacks - suite different
problems or applications
![Page 5: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/5.jpg)
Slide 5
Partitioned Global Address Space (PGAS) Models • Key'features'
- Simple'shared'memory'abstrac6ons''
- Light'weight'one7sided'communica6on''
- Easier'to'express'irregular'communica6on'
• Different'approaches'to'PGAS''- Languages''
• Unified'Parallel'C'(UPC)'
• Co7Array'Fortran'(CAF)'
• others'
- Libraries'• OpenSHMEM'
• Global'Arrays'
![Page 6: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/6.jpg)
Slide 6
• Hierarchical architectures with multiple address spaces • (MPI + PGAS) Model
– MPI across address spaces – PGAS within an address space
• MPI is good at moving data between address spaces • Within an address space, MPI can interoperate with shared
memory programming models
• Applications can have kernels with different communication patterns
• Can benefit from different models
• Re-writing complete applications can be a huge effort • Port critical kernels to the desired model instead
MPI+PGAS for Exascale Architectures and Applications
![Page 7: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/7.jpg)
Slide 7
Supporting Programming Models for Multi-Petaflop and Exaflop Systems: Challenges
Programming Models MPI, PGAS (UPC, Global Arrays, OpenSHMEM),
CUDA, OpenACC, Cilk, Hadoop, MapReduce, etc.
Application Kernels/Applications
Networking Technologies (InfiniBand, 40/100GigE,
Aries, BlueGene)
Multi/Many-core Architectures
Point-to-point Communication (two-sided & one-sided)
Collective Communication
Synchronization & Locks
I/O & File Systems
Fault Tolerance
Communication Library or Runtime for Programming Models
Accelerators (NVIDIA and MIC)
Middleware
co-design across layers
![Page 8: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/8.jpg)
Slide 8
Can'High7Performance'Interconnects,'Protocols'and'Accelerators'Benefit'from'PGAS'and'Hybrid'MPI+PGAS'Models?''
• MPI'designs'have'been'able'to'take'advantage'of'high7performance'interconnects,'protocols'and'accelerators'
• Can'PGAS'and'Hybrid'MPI+PGAS'models'take'advantage'of'these'technologies?'
• What'are'the'challenges?'
• Where'do'the'boGlenecks'lie?'
• Can'these'boGlenecks'be'alleviated'with'new'designs'(similar'to'the'designs'adopted'for'MPI)?'
![Page 9: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/9.jpg)
Slide 9
PGAS Programming Models – OpenSHMEM Library
![Page 10: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/10.jpg)
Slide 10
SHMEM'• SHMEM:'Symmetric'Hierarchical'MEMory'library'
• One7sided'communica6ons'library'–'had'been'around'for'a'while''
• Similar'to'MPI,'processes'are'called'PEs,'data'movement'is'explicit'through'library'calls'
• Provides'globally'addressable'memory'using'symmetric'memory'objects'(more'in'later'slides)''
• Library'rou6nes'for''
– Symmetric'object'crea6on'and'management'
– One7sided'data'movement'
– Atomics'
– Collec6ves'
– Synchroniza6on'
![Page 11: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/11.jpg)
Slide 11
OpenSHMEM'• SHMEM'implementa6ons'–'Cray'SHMEM,'SGI'SHMEM,'Quadrics'SHMEM,'HP'
SHMEM,'GSHMEM'
• Subtle'differences'in'API,'across'versions'–'example:''
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!SGI!SHMEM!!!!!!!!!!!!Quadrics!SHMEM!!!!!!!!!!!!!Cray!SHMEM!!
Ini3aliza3on!!!!!!!!start_pes(0)''''''''''''''''''shmem_init ' 'start_pes''''
Process'ID''''''''''''''_my_pe'''''''''''''''''''''''''''my_pe'''''''''''''''''''''shmem_my_pe'
• Made'applica6ons'codes'non7portable''
• OpenSHMEM'is'an'effort'to'address'this:''
“A'new,'open'specifica3on'to'consolidate'the'various'extant'SHMEM'versions''into'a'widely'accepted'standard.”'–'OpenSHMEM'Specifica3on'v1.0'
by'University'of'Houston'and'Oak'Ridge'Na6onal'Lab'
SGI'SHMEM'is'the'baseline'
![Page 12: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/12.jpg)
Slide 12
The'OpenSHMEM'Memory'Model'• Symmetric'data'objects'
– Global'Variables'''
– Allocated'using'collec6ve'shmalloc,'shmemalign,'shrealloc''rou6ne''
• Globally'addressable'–'objects'have'same''
– Type''
– Size'
– Same'virtual'address'or'offset'at'all'PEs'
– Address'of'a'remote'object'can'be'calculated'based'on'info'of'local'object'
Symmetric'Objects'
b'
b'
!!PE!0! !!PE!1!
a' a'
Virtual'Address'Space'
(global)'
(alloce’d)'
![Page 13: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/13.jpg)
Slide 13
Data'Movement:'Basic
int'*b;'int'c;''
b!=!!(int!*)!shmalloc!(sizeof(int));!'if'((_my_pe()'=='0)'{''!!!!c!=!shmem_int_g!(b,!1);!}'
b'
b'
!!PE!0! !!PE!1!
• Put'and'Get'–'single'element'- void'shmem_TYPE_p'(TYPE'*ptr,'int'PE)''
- void'shmem_TYPE_g'(TYPE'*ptr,'int'PE)''
- TYPE'can'be'short,'int,'long,'float,'double,'longlong,'longdouble''
'
![Page 14: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/14.jpg)
Slide 14
Data'Movement:'Con6guous
• Block'Put'and'Get'–'Con6guous'- void'shmem_TYPE_put'(TYPE*'target,'const'TYPE*source,'size_t'nelems,'int'pe)'
- TYPE'can'be'char,'short,'int,'long,'float,'double,'longlong,'longdouble'
- shmem_putSIZE'–'elements'of'SIZE:'32/64/128'
- shmem_putmem'7'bytes'
- Similar'get'opera6ons'
'
int'*b;!b!=!!(int!*)!shmalloc!(10*sizeof(int));!'if'((_my_pe()'=='0)'{''!!!!shmem_int_put!(b,!b,!5,!1);!}'
PE'0' PE'1'
![Page 15: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/15.jpg)
Slide 15
Data'Movement:'Non7con6guous '
• Strided'Put'and'Get''- shmem_TYPE_iput'(TYPE*'target,'const'TYPE*source,'ptrdiff_t'tst,'ptrdiff_t'
sst,'size_t'nelems,'int'pe)'
- sst'is'stride'at'source,'tst'is'stride'at'target'
- TYPE'can'be'char,'short,'int,'long,'float,'double,'longlong,'longdouble'
- Similar'get'opera6ons'
'pe1'pe0'
Target'stride:'1''Source'stride:'6'Num.'of'elements:'6'shmem_int_iput(t,'t,'1,'6,'6,'1)'''
![Page 16: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/16.jpg)
Slide 16
Data'Movement'7'Comple6on '
• When'Put'opera6ons'return'
- Data'has'been'copied'out'of'the'source'buffer'object'- Not'necessarily'wriGen'to'the'target'buffer'object'- Addi6onal'synchroniza6on'to'ensure'remote'comple6on'
• When'Get'opera6ons'return'
- Data'has'been'copied'into'the'local'target'buffer'- Ready'to'be'used'''
![Page 17: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/17.jpg)
Slide 17
Collec6ve'Synchroniza6on
'
'
'
'
• Barrier'ensures'comple6on'of'all'previous'opera6ons'
• Global'Barrier'- void'shmem_barrier_all()'
- Does'not'return'un6l'called'by'all'PEs'• Group'Barrier''
- Involves'only'an'“ACTIVE'SET”'of'PEs'- Does'not'return'un6l'called'by'all'PEs'in'the'“ACTIVE'SET”'- void'shmem_barrier'('int'PE_start,''/*'first'PE'in'the'set'*/'
' ' ' '''''''int'logPE_stride,'/*'distance'between'two'PEs*/'
' ' ' '''''''int'PE_size,'/*size'of'the'set*/'
' ' ' '''''''long'*pSync'/*symmetric'work'array*/);'
- pSync'allows'for'overlapping'collec6ve'communica6on'
![Page 18: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/18.jpg)
Slide 18
One7sided'Synchroniza6on
'
'
'
'
• Fence'- void'shmem_fence'(void)'
- Enforces'ordering'on'Put'opera6ons'issued'by'a'PE'to'each'des6na6on'PE'
- Does'not'ensure'ordering'between'Put'opera6ons'to'mul6ple'PEs''
• Quiet'- void'shmem_quiet'(void)'
- Ensures'remote'comple6on'of'Put'opera6ons'to'all'PEs''
• Other'point7to7point'synchroniza6on'- shmem_wait'and'shmem_wait_un6l'–'poll'on'a'local'variable'
![Page 19: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/19.jpg)
Slide 19
Collec6ve'Opera6ons'and'Atomics
'• Broadcast'–'one7to7all''• Collect'–'allgather''• Reduc6on'–'allreduce'(and,'or,'xor;'max,'min;'sum,'product)'''
• Work'on'an'ac6ve'set'–'start,'stride,'count'
'
• Uncondi6onal'7'Swap'Opera6on'- long'shmem_swap'(long'*target,'long'value,'int'pe)'- TYPE'shmem_TYPE_swap'(TYPE'*target,'TYPE'value,'int'pe)'- TYPE'can'be'int,'long,'longlong,'float,'double'
• Condi6onal'7'Compare'and'Swap'Opera6on'
• Arithme6c'–'Fetch'&'Add,'Fetch'&'Increment,'Add,'Increment'
'
![Page 20: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/20.jpg)
Slide 20
Remote'Pointer'Opera6ons
'• void'*shmem_ptr'(void'*target,'int'pe)'
- Allows'direct'load/stores'on'remote'memory'
- Useful'when'PEs'are'running'on'same'node'
- Not'supported'in'all'implementa6ons'
- Returns'NULL'if'not'accessible'for'loads/stores'
'
'
![Page 21: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/21.jpg)
Slide 21
A'Sample'code:'Circular'Shit
'
#include'<shmem.h>''
int'aaa,'bbb;''
int'main'(int'argc,'char'*argv[])'{''
''''int'target_pe;''
''''start_pes(0);'''''target_pe'='(_my_pe()'+'1)%'_num_pes();''''''bbb'= _my_pe() + 1 ' shmem_barrier_all(); '
''''shmem_int_get'(&aaa,'&bbb,'1,'target_pe);'''''shmem_barrier_all();'}'
bbb'
aaa'
1' 3'2' 4' 5'
3'2' 4' 5' 1'
PE0
PE1
PE2
PE3
PE4
![Page 22: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/22.jpg)
Slide 22
The MVAPICH2-X Hybrid MPI-PGAS Runtime
![Page 23: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/23.jpg)
Slide 23
Maturity'of'Run6mes'and'Applica6on'Requirements'
• MPI'has'been'the'most'popular'model'for'a'long'6me'- Available'on'every'major'machine'
- Portability,'performance'and'scaling'
- Most'parallel'HPC'code'is'designed'using'MPI'
- Simplicity'7'structured'and'itera6ve'communica6on'paGerns'
• PGAS'Models'- Increasing'interest'in'community'
- Simple'shared'memory'abstrac6ons'and'one7sided'communica6on''
- Easier'to'express'irregular'communica6on'
• Need!for!hybrid!MPI!+!PGAS!- Applica6on'can'have'kernels'with'different'communica6on'characteris6cs'
- Por6ng'only'part'of'the'applica6ons'to'reduce'programming'effort'
![Page 24: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/24.jpg)
Slide 24
Hybrid'(MPI+PGAS)'Programming'
• Applica6on'sub7kernels'can'be're7wriGen'in'MPI/PGAS'based'on'communica6on'characteris6cs'
• Benefits:'– Best'of'Distributed'Compu6ng'Model'
– Best'of'Shared'Memory'Compu6ng'Model'
• Exascale'Roadmap*:''– “Hybrid'Programming'is'a'prac6cal'way'to''program'exascale'systems”'
*'The'Interna3onal'Exascale'SoHware'Roadmap,'Dongarra,'J.,'Beckman,'P.'et'al.,'Volume'25,'Number'1,'2011,'Interna3onal'Journal'of'High'Performance'Computer'Applica3ons,'ISSN'1094X3420'
Kernel!1!MPI!
Kernel!2!MPI!
Kernel!3!MPI!
Kernel!N!MPI!
HPC!Applica3on!
Kernel!2!PGAS!
Kernel!N'PGAS!
![Page 25: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/25.jpg)
Slide 25
Simple MPI + OpenSHMEM Hybrid Example int'main(int'c,'char'*argv[])'{'''''int'rank,'size;''''''/*'SHMEM'init'*/'''''start_pes(0);''''''/*'fetch7and7add'at'root'*/'''''shmem_int_fadd(&sum,'rank,'0);''''''/*'MPI'barrier'*/'''''MPI_Barrier(MPI_COMM_WORLD);''''''/*'root'broadcasts'sum'*/'''''MPI_Bcast(&sum,'1,'MPI_INT,'0,'MPI_COMM_WORLD);''''''fprin}(stderr,'"(%d):'Sum:'%d\n",'rank,'sum);''''''shmem_barrier_all();'''''return'0;'}'
• OpenSHMEM atomic fetch-add • MPI_Bcast for broadcasting
result
![Page 26: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/26.jpg)
Slide 26
Current'approaches'for'Hybrid'Programming'
• Need'more'network'and'memory'resources'
• Might'lead'to'deadlock!'
• Layering'one'programming'model'over'another'– Poor'performance'due'to'seman6cs'mismatch'
– MPI73'RMA'tries'to'address'
• Separate'run6me'for'each'programming'model'
Hybrid (OpenSHMEM + MPI) Applications
OpenSHMEM Runtime MPI Runtime
InfiniBand, RoCE, iWARP
OpenSHMEM Calls
MPI Class
![Page 27: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/27.jpg)
Slide 27
The'Need'for'a'Unified'Run6me'
• Deadlock'when'a'message'is'siÅng'in'one'run6me,'but'applica6on'calls'the'other'run6me'
• Prescrip6on'to'avoid'this'is'to'barrier'in'one'mode'(either'OpenSHMEM'or'MPI)'before'entering'the'other''
• Or'run6mes'require'dedicated'progress'threads'• Bad'performance!!'
• Similar'issues'for'MPI'+'UPC'applica6ons'over'individual'run6mes'
shmem_int_fadd'(data'at'p1);''/*'operate'on'data'*/''MPI_Barrier(comm);'
/*'''''local''''computa6on'''*/'MPI_Barrier(comm);'
P0' P1'
OpenSHMEM'Run6me'
MPI'Run6me' OpenSHMEM'Run6me'
MPI'Run6me'
Ac6ve'Msg'
![Page 28: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/28.jpg)
Slide 28
Unified'Run6me'for'Hybrid'MPI'+'OpenSHMEM'Applica6ons'
• Goal: Provide'high'performance''and'scalability'for''
– MPI'Applica6ons'
– PGAS'Applica6ons'– Hybrid'MPI+PGAS'Applica6ons'
• Resulting runtime – Op6mal'network'resource'usage'
– No'deadlock'because'of'single'run6me'
– BeGer'performance'
MPI Applications, OpenSHMEM Applications, Hybrid (MPI + OpenSHMEM) Applications
MVAPICH2-X Runtime
InfiniBand, RoCE, iWARP
OpenSHMEM Calls MPI Calls
Hybrid (OpenSHMEM + MPI) Applications
OpenSHMEM Runtime MPI Runtime
InfiniBand, RoCE, iWARP
OpenSHMEM Calls MPI Calls
J. Jose, K. Kandalla, M. Luo and D. K. Panda, Supporting Hybrid MPI and OpenSHMEM over InfiniBand: Design and Performance Evaluation, Int'l Conference on Parallel Processing (ICPP '12), September 2012.
![Page 29: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/29.jpg)
Slide 29
OpenSHMEM'Reference'Implementa6on'Framework'
Communica6on'API'Symmetric'Memory'Management'API'
GASNET'Run6me'/'ARMCI'/'MVAPICH27X'Run6me'...'
Minimal'Set'of'Internal'API'
OpenSHMEM'API'
Network'Layer:'IB,'RoCE,'iWARP'...''
Data''Movement'
Collec6ves'Atomics'Memory'
Management'
Reference:'OpenSHMEM:'An'Effort'to'Unify'SHMEM'API'Library'Development','Supercompu3ng'2010''
OpenSHMEM'Applica6ons'
...''
...''
![Page 30: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/30.jpg)
Slide 30
OpenSHMEM'Design'in'MVAPICH27X'
• OpenSHMEM'Stack'based'on'OpenSHMEM'Reference'Implementa6on'
• OpenSHMEM'Communica6on'over'MVAPICH27X'Run6me'– Uses'ac6ve'messages,'atomic''and'one7sided'opera6ons'and'remote'
registra6on'cache'
Communica6on'API'Symmetric'Memory'Management'API'
Minimal'Set'of'Internal'API'
OpenSHMEM'API'
InfiniBand,'RoCE,'iWARP'
Data''Movement'
Collec6ves'Atomics'Memory'
Management'
Ac6ve'Messages'
One7sided'Opera6ons'
MVAPICH27X'Run6me'
' Remote'Atomic'Ops'
Enhanced'Registra6on'Cache'
J.'Jose,'K.'Kandalla,'M.'Luo'and'D.'K.'Panda,'Suppor3ng'Hybrid'MPI'and'OpenSHMEM'over'InfiniBand:'Design'and'Performance'Evalua3on,'Int'l'Conference'on'Parallel'Processing'(ICPP''12),'September'2012.'
![Page 31: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/31.jpg)
Slide 31
Implementa6ons'for'InfiniBand'Clusters'• Reference'Implementa6on''
– University'of'Houston'– Based'on'the'GASNet'run6me'
• MVAPICH27X'– The'Ohio'State'University'– Uses'the'upper'layer'of'reference'implementa6ons'
– Derives'the'run6me'from'widely'used'MVAPICH2'MPI'library'
– Available'for'download:'hGp://mvapich.cse.ohio7state.edu/download/mvapich2x''
• OMPI7SHMEM'– Based'on'OpenMPI'run6me'
– Available'in'OpenMPI'1.7.5'
• ScalableSHMEM'– Mellanox'technologies'
![Page 32: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/32.jpg)
Slide 32
• Point-to-point Operations – osu_oshm_put – Put latency – osu_oshm_get – Get latency – osu_oshm_put_mr – Put message rate – osu_oshm_atomics – Atomics latency
• Collective Operations – osu_oshm_collect – Collect latency – osu_oshm_broadcast – Broadcast latency – osu_oshm_reduce - Reduce latency – osu_oshm_barrier - Barrier latency
• OMB'is'publicly'available'from:'
– 'hGp://mvapich.cse.ohio7state.edu/benchmarks/'
Support'for'OpenSHMEM'Opera6ons'in'OSU'Micro7Benchmarks'(OMB)
![Page 33: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/33.jpg)
Slide 33
OpenSHMEM'Data'Movement'in'MVAPICH27X'
• Data'Transfer'Rou6nes'(put/get)'– Implemented'using'RDMA'transfers'
– Strided'opera6ons'require'mul6ple'RDMA'transfers'
– IB'requires'remote'registra6on'informa6on'for'RDMA'7'expensive'
Register
Register & Cache Key
Cache Key
Px Py • Remote'Registra6on'Cache'
– Registra6on'request'sent'over'“Ac6ve'Message”'
– Remote'process'registers'and'
responds'with'the'key'
– Key'is'cached'at'local'and'remote'sides'
– Hides'registra6on'costs'
![Page 34: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/34.jpg)
Slide 34
OpenSHMEM'Data'Movement:'Performance''
0 0.5
1 1.5
2 2.5
3 3.5
4
1 2 4 8 16 32 64 128 256 512 1K 2K
Tim
e (u
s)
Message Size
UH-SHMEM MV2X-SHMEM Scalable-SHMEM OMPI-SHMEM
0 0.5
1 1.5
2 2.5
3 3.5
4
1 2 4 8 16 32 64 128 256 512 1K 2K
Tim
e (u
s)
Message Size
UH-SHMEM MV2X-SHMEM Scalable-SHMEM OMPI-SHMEM
shmem_putmem' shmem_getmem'
• OSU'OpenSHMEM'micro7benchmarks - http://mvapich.cse.ohio-state.edu/benchmarks/
• Slightly'beGer'performance'for'putmem'and'getmem'with'MVAPICH27X'
![Page 35: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/35.jpg)
Slide 35
Atomic'Opera6ons'in'MVAPICH27X'
• Atomic'Opera6ons'– Take'advantage'of'IB'network'atomics''
– IB'offer'atomics'for''• compare7swap'
• fetch7add'• limited'to'types'of'647bit'length'
– Other'opera6ons'and'types'are'implemented'using'“Ac6ve'Messages”'
– BeGer'performance'for'647bit'long'types,'eg:'long'
![Page 36: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/36.jpg)
Slide 36
OpenSHMEM'Atomic'Opera6ons:'Performance''
0
5
10
15
20
25
30
fadd finc add inc cswap swap
Tim
e (u
s)
UH-SHMEM MV2X-SHMEM Scalable-SHMEM OMPI-SHMEM
• OSU'OpenSHMEM'micro7benchmarks'(OMB'v4.1)'• MV27X'SHMEM'performs'up'to'40%'beGer'compared'to'UH7SHMEM'
![Page 37: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/37.jpg)
Slide 37
Collec6ve'Communica6on'in'MVAPICH27X!
• Significant'effort'on'op6mizing'MPI'collec6ves'to'the'hilt'
• MVAPICH27X'derives'from'MVAPICH2'MPI'run6me'
• Implements'OpenSHMEM'collec6ves'using'infrastructure'for'MPI'collec6ves'– MPI'collec6ves'operate'on'“communicators”'–'rigid'compared'to'ac6ve'set''
– Communicator'crea6on'is'collec6ve'7'involves'overheads'–'MPI73'introduces'group7based'communicator'crea6on'
– Light7weight'and'low'overhead'transla6on'layer'using'this'– Communicator'cache'to'hide'overheads'of'crea6on'
• Collect'over'MPI_Gather,'Broadcast'over'MPI_Bcast,'Reduc6on'opera6ons'over'MPI_Reduce'J. Jose, K. Kandalla, S. Potluri, J. Zhang and D. K. Panda, Optimizing Collective Communication in OpenSHMEM, PGAS’13 '
![Page 38: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/38.jpg)
Slide 38
Collec6ve'Communica6on:'Performance!Reduce (1,024 processes) Broadcast (1,024 processes)
Collect (1,024 processes) Barrier
0 50
100 150 200 250 300 350 400
128 256 512 1024 2048
Tim
e (u
s)
No. of Processes
1 10
100 1000
10000 100000
1000000 10000000
Tim
e (u
s)
Message Size
180X
1
10
100
1000
10000
Tim
e (u
s)
Message Size
18X
1
10
100
1000
10000
100000
1 4 16 64 256 1K 4K 16K 64K
Tim
e (u
s)
Message Size
MV2X-SHMEM Scalable-SHMEM OMPI-SHMEM
12X
3X
![Page 39: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/39.jpg)
Slide 39
Intra7node'Design'Space'for'OpenSHMEM''
Communica6on'Channels'(truly'one7sided)'
''
'
OpenSHMEM'
Shared'Memory'
CMA'LiMIC'Network'Loopback'
Global'Symmetric'Variables'
Allocated'Symmetric'Objects'
Socket'0'
C0' C1' C2' C3'
C4' C5' C6' C7'
Socket'1'
C8' C9' C10' C11'
C12' C13' C14' C15'Mem
ory'
Mem
ory'
• LiMIC:'kernel'module'developed'at'OSU'for'single'copy'IPC'
• CMA:'Cross'Memory'AGach'7'Linux'3.2'kernel'feature'for'single'copy'IPC'
S. Potluri, K. Kandalla, D. Bureddy, M. Li and D. K. Panda, Efficient Intranode Designs for OpenSHMEM on Multi-core Clusters, PGAS 2012
![Page 40: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/40.jpg)
Slide 40
OpenSHMEM'Applica6on'Performance'DAXPY 2D Heat Transfer Modeling
• DAXPY'Kernel'with'8K'input'matrix'
7''''12X'improved'performance'for'4K'processes'
• Heat'Transfer'Kernel'(32K'x'32K)'7''''45%'improved'performance'for'4K'processes'
1
10
100
1000
10000
256 512 1K 2K 4K
Tim
e (s
)
System Size
UH-SHMEM (Linear) UH-SHMEM (Tree) MV2X-SHMEM Scalable-SHMEM OMPI-SHMEM
1
10
100
1000
10000
256 512 1K 2K 4K
Tim
e (s
)
System Size
![Page 41: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/41.jpg)
Slide 41
A Hybrid MPI+PGAS Case Study: Graph 500
![Page 42: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/42.jpg)
Slide 42
• Identify the communication critical section (mpiP, HPCToolkit)
• Allocate memory in shared address space • Convert MPI Send/Recvs to assignment operations or
one-sided operations – Non-blocking operations can be utilized – Coalescing for reducing the network operations
• Introduce synchronization operations for data consistency – After Put operations or before get operations
• Load balance through global view of data
Incremental'Approach'to'exploit'one7sided'opera6ons'
![Page 43: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/43.jpg)
Slide 43
• Breadth'First'Search'(BFS)'Traversal'• Uses'‘Level'Synchronized'BFS'Traversal'Algorithm'
– Each'process'maintains'–'‘CurrQueue’'and'‘NewQueue’''– Ver6ces'in'CurrQueue'are'traversed'and'newly'discovered'ver6ces'are'sent'to'their'owner'processes'
– Owner'process'receives'edge'informa6on'• If'not'visited;'updates'parent'informa6on'and'adds'to'NewQueue'
– Queues'are'swapped'at'end'of'each'level'– Ini6ally'the'‘root’'vertex'is'added'to'currQueue'– Terminates'when'queues'are'empty'
Graph500'Benchmark'–'The'Algorithm
![Page 44: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/44.jpg)
Slide 44
• MPI_Isend/MPI_Test7MPI_Irecv'for'transferring'ver6ces'
• Implicit'barrier'using'zero'length'message'
• MPI_Allreduce'to'count'number'newqueue'elements'
• Major'BoGlenecks:'– Overhead'in'send7recv'communica6on'model'
• More'CPU'cycles'consumed,'despite'using''non7blocking'opera6ons'
• Most'of'the'6me'spent'in'MPI_Test'
– Implicit'Linear'Barrier'• Linear'barrier'causes'significant'overheads'
MPI7based'Graph500'Benchmark
![Page 45: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/45.jpg)
Slide 45
• Communica6on'and'co7ordina6on'using'one7sided'rou6nes'and'fetch7add'atomic'opera6ons''
– Every'process'keeps'receive'buffer'– Synchroniza6on'using'atomic'fetch7add'rou6nes'
• Level'synchroniza6on'using'non7blocking'barrier'– Enables'more'computa6on/communica6on'overlap'
• Load'Balancing'u6lizing'OpenSHMEM'shmem_ptr''
– Adjacent'processes'can'share'work'by'reading'shared'memory'
Hybrid Graph500 Design
J. Jose, S. Potluri, K. Tomko and D. K. Panda, Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models, International Supercomputing Conference (ISC '13), June 2013
![Page 46: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/46.jpg)
Slide 46
Pseudo Code For Both MPI and Hybrid Versions Algorithm 1: EXISTING MPI SEND/RECV while true do while CurrQueue != NULL do for vertex u in CurrQueue do
HandleReceive() u ← Dequeue(CurrQueue) Send(u, v) to owner end Send empty messages to all others while all_done != N − 1 do HandleReceive() end
// Procedure: HandleReceive if rcv_count = 0 then al_ done ← all done + 1 else update (NewQueue, v)
Algorithm 2: HYBRID VERSION while true do
while CurrQueue != NULL do for vertex u in CurrQueue do
u ← Dequeue(CurrQueue)
for adjacent points to u do
Shmem_fadd(owner, size,recv_index)
shmem_put(owner, size,recv_buf)
end
end
end if recv_buf[size] = done then
Set ← 1
end
![Page 47: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/47.jpg)
Slide 47
Graph500'7'BFS'Traversal'Time
• Hybrid'design'performs'beGer'than'MPI'implementa6ons'• 16,384 processes
- 1.5X improvement over MPI-CSR - 13X improvement over MPI-Simple (Same communication characteristics)
• Strong Scaling Graph500 Problem Scale = 29, Edge Factor 16
0 5
10 15 20 25 30 35
4K 8K 16K
Tim
e (s
)
No. of Processes
13X
7.6X 0.00E+00
1.00E+09
2.00E+09
3.00E+09
4.00E+09
5.00E+09
6.00E+09
7.00E+09
8.00E+09
9.00E+09
1024 2048 4096 8192 Trav
erse
d Ed
ges
Per S
econ
d
# of Processes
MPI-Simple MPI-CSC MPI-CSR Hybrid
Performance Strong Scaling
![Page 48: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/48.jpg)
Slide 48
• Presented an overview of PGAS models and Hybrid MPI+PGAS models
• Outlined research challenges in designing an efficient runtime for these models on clusters with InfiniBand
• Demonstrated the benefits of Hybrid MPI+PGAS models for for an example application
• Hybrid MPI+PGAS model is an emerging paradigm which can lead to high-performance and scalable implementation of applications on exascale computing systems
• MVAPICH2-X http://mvapich.cse.ohio-state.edu/overview/
Concluding Remarks
![Page 49: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/49.jpg)
Slide 49
Networked-Based Computing Research Group Personnel
Current Students – A. Awan (Ph.D.) – A. Bhat (M.S.) – S. Chakraborthy (Ph.D.) – C.-H. Chu (Ph.D.) – N. Islam (Ph.D.)
Past Students – P. Balaji (Ph.D.) – D. Buntinas (Ph.D.) – S. Bhagvat (M.S.) – L. Chai (Ph.D.) – B. Chandrasekharan (M.S.) – N.'Dandapanthula'(M.S.)'
– V. Dhanraj (M.S.) – T. Gangadharappa (M.S.) – K. Gopalakrishnan (M.S.)
– G. Santhanaraman (Ph.D.) – A. Singh (Ph.D.) – J. Sridhar (M.S.)
– S. Sur (Ph.D.)
– H. Subramoni (Ph.D.)
– K. Vaidyanathan (Ph.D.)
– A. Vishnu (Ph.D.)
– J. Wu (Ph.D.)
– W. Yu (Ph.D.)
Past Research Scientist
– S. Sur
Current Post-Doc – J. Lin
Current Programmer – J. Perkins
Past Post-Docs – H.'Wang'
– X.'Besseron'– H.7W.'Jin'
– M.'Luo'
– W. Huang (Ph.D.) – W. Jiang (M.S.) – J. Jose (Ph.D.) – S. Kini (M.S.) – M. Koop (Ph.D.) – R. Kumar (M.S.) – S. Krishnamoorthy (M.S.) – K. Kandalla (Ph.D.) – P. Lai (M.S.) – J. Liu (Ph.D.)
– M. Luo (Ph.D.) – A. Mamidala (Ph.D.) – G. Marsh (M.S.) – V. Meshram (M.S.) – S. Naravula (Ph.D.) – R. Noronha (Ph.D.) – X. Ouyang (Ph.D.) – S. Pai (M.S.) – S. Potluri (Ph.D.) – R.'Rajachandrasekar'(Ph.D.)'
– M. Li (Ph.D.) – M. Rahman (Ph.D.) – D. Shankar (Ph.D.) – A. Venkatesh (Ph.D.) – J. Zhang (Ph.D.)
– E. Mancini – S. Marcarelli – J. Vienne
Current Senior Research Associates – K. Hamidouche – X.'Lu'
Past Programmers – D. Bureddy
– H. Subramoni
Current Research Specialist – M. Arnold
![Page 50: MPI+PGAS Hybrid Programmingpire.cct.lsu.edu/documents/Tomko.pdf• (MPI + PGAS) Model – MPI across address spaces – PGAS within an address space • MPI is good at moving data](https://reader035.vdocuments.mx/reader035/viewer/2022070214/611635d028849834b4477575/html5/thumbnails/50.jpg)
Slide 50
Questions Karen Tomko Interim Director of Research Scientific Applications Manager Ohio Supercomputer Center [email protected]
1224 Kinnear Road Columbus, OH 43212 Phone: (614) 292-2846
ohiosupercomputerctr ohiosupercomputercenter