rcuda 4: gpgpu as a service inrcuda 4: gpgpu as a service in hpc … · 2013. 7. 5. ·...
TRANSCRIPT
rCUDA 4: GPGPU as a service inrCUDA 4: GPGPU as a service in HPC clusters
Rafael MayoAntonio J. PeñaAntonio J. Peña
Universitat Jaume I, Castelló (Spain)
Joint research effort
Outline
GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status
HPC Advisory Council Spain Conference 2012 2/72
Outline
GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status
HPC Advisory Council Spain Conference 2012 3/72
GPU computing GPU computing: defines all the technological issues for
using the GPU computational power for executing l dgeneral purpose code
GPU computing has experienced remarkable growth in th l tthe last years
HPC Advisory Council Spain Conference 2012 4/72
GPU computing
The basic building block is a node with 1 or more GPUsPU em
Main Memory
PU em
CPU
GPU
GP
me
Net
wor
k
GPU
GP
me
PCI-e
U m U m U m U m Main Memory
GPU
GPU
mem
GPU
GPU
mem
GPU
GPU
mem
GPU
GPU
mem Main Memory
CPU
Net
wor
kN
PCI-e
HPC Advisory Council Spain Conference 2012 5/72
GPU computing
From the programming point of view: A set of nodes, each one with:
one or more CPUs (with several cores per CPU) one or more GPUs (1-4)
An interconnection network
GPU GPUmem GPU GPU
memGPU GPUmem GPU GPU
mem GPU GPUmem GPU GPU
mem
PCI-e C
PU
Main
Mem
o
GPU GPUmem
PCI-e C
PU
Main
Mem
o
GPU GPUmem
PCI-e C
PU
Main
Mem
o
GPU GPUmem
PCI-e C
PU
Main
Mem
o
GPU GPUmem
PCI-e C
PU
Main
Mem
o
GPU GPUmem
PCI-e C
PU
Main
Mem
o
GPU GPUmem
ory
Network
Interconnection Network
ory
Network
ory
Network
ory
Network
ory
Network
ory
Network
HPC Advisory Council Spain Conference 2012 6/72
Interconnection Network
GPU computing
Development tools have been introduced in porder to ease the programming of the GPUsTwo main approaches in GPU computing Two main approaches in GPU computing development environments: CUDA NVIDIA proprietary OpenCL open standard OpenCL open standard
HPC Advisory Council Spain Conference 2012 7/72
GPU computing
Basically CUDA and OpenCL have the same working scheme:working scheme: Compilation: Separate CPU code from GPU
code (GPU kernel)code (GPU kernel) Execution:
D t t f CPU d GPU Data transfers: CPU and GPU memory spaces1.Before GPU kernel execution: data from CPU
memory space to GPU memory space2.Computation: Kernel execution3.After GPU kernel execution: results from GPU
memory space to CPU memory space
HPC Advisory Council Spain Conference 2012 8/72
GPU computing
Time spent on data transfers may not be negligible
90
100Influence of data transfers for SGEMM
Pinned MemoryNon-Pinned Memory
%)
60
70
80y
rans
fers
(%
40
50
60
ed to
dat
a tr
10
20
30
Tim
e de
vote
0 2000 4000 6000 8000 10000 12000 14000 16000 180000
10
Matrix Size
T
HPC Advisory Council Spain Conference 2012 9/72
Matrix Size
Outline
GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status
HPC Advisory Council Spain Conference 2012 10/72
GPU computing scenarios
For the right kind of code the use of GPUs brings huge benefits in terms of performance and energybenefits in terms of performance and energy
There must be data parallelism in the code: this is the l t t k b fit f th h d d fonly way to take benefit from the hundreds of processors
in a GPU Different scenarios from the point of view of the
application: Low level of data parallelism High level of data parallelism Moderate level of data parallelism Applications for multi-GPU computing
HPC Advisory Council Spain Conference 2012 11/72
Applications for multi GPU computing
GPU computing scenarios
Low level of data parallelismRegarding GPU computing?
No GPU is needed just proceed with the tradicional HPCNo GPU is needed, just proceed with the tradicional HPC strategies
High level of data parallelismg pRegarding GPU computing?
Add one or more GPUs to every node in the system and rewrite applications to use them
HPC Advisory Council Spain Conference 2012 12/72
GPU computing scenarios
Moderate level of data parallelism: Application presents a data parallelism around [40%-80%]
Regarding GPU computing?
presents a data parallelism around [40% 80%]
The GPUs in the system are used only for some parts of the application, remaining idle the rest of the time and, thus, wasting resources and energy
HPC Advisory Council Spain Conference 2012 13/72
GPU computing scenarios
Applications for multi GPU computing Applications for multi-GPU computingAn application can use a large amount of GPUs
i ll lin parallel
Regarding GPU computing?The code running in a node can only access the GPUs in that node, but it would run faster if it could have access to more GPUsto more GPUs
HPC Advisory Council Spain Conference 2012 14/72
Outline
GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status
HPC Advisory Council Spain Conference 2012 15/72
The wish list
+ GPU+ GPU+ GPU
Current trend in GPU deploymentEach node one or more GPUs
CONCERNS:
+ GPU+ GPU+ GPU+ GPU
Each node one or more GPUs
For applications with moderate levels of data parallelism, the GPUs in the cluster may be idle for
+ GPU+ GPU+ GPU+ GPU
long periods of time
Multi-GPU applications cannot make use of the
+ GPU+ GPU+ GPU+ GPU
tremendous GPU resources available across the cluster
+ GPU+ GPU+ GPU+ GPU
+ GPU
+ GPU+ GPU+ GPU Deployment and energy costs!
HPC Advisory Council Spain Conference 2012 16/72
The wish list
A way of addressing the first concern, the energy concern is by reducing the number
+ GPU
energy concern, is by reducing the number of GPUs present in the cluster + GPU
+ GPUNEW CONCERN:
+ GPU
NEW CONCERN: A lower amount of GPUs noticeably increases
the difficulty of efficiently scheduling jobs
+ GPU
the difficulty of efficiently scheduling jobs (considering global CPU and GPU requirements)
+ GPU
HPC Advisory Council Spain Conference 2012 17/72
The wish list
+ GPU A way of addressing the new scheduling
problem is by reducing the number of+ GPU
problem is by reducing the number of GPUs present in the cluster and sharing the remaining ones among the CPU nodes
+ GPU
the remaining ones among the CPU nodes
Additionally, by appropriately sharing GPU lti GPU ti i l
+ GPU
GPUs, multi-GPU computing is also feasible
+ GPU
This would increase GPU utilization, also lowering power consumption, at the same i h i i i l i i i + GPUtime that initial acquisition costs are reduced
HPC Advisory Council Spain Conference 2012 18/72
The wish list
+ GPU Efficiently sharing GPUs across a cluster
can be achieved by leveraging GPU+ GPU
can be achieved by leveraging GPU virtualization
C A f k
+ GPU
rCUDA
gVirtuS
As far as we kwon, rCUDA is the only remote
virtualization solution ith CUDA 4 s pport
+ GPU
vCUDA
GViM
with CUDA 4 support
+ GPU
GViM
VGPU + GPU
GridCuda
HPC Advisory Council Spain Conference 2012 19/72
The wish list
Going even beyond: consolidating GPUs into dedicated consolidating GPUs into dedicated
servers (no CPU power) and allowing GPU task migrationg g
TRUE GREEN GPU COMPUTING
HPC Advisory Council Spain Conference 2012 20/72
The wish list
GPUs are disaggregated from CPUsandand
global schedulers are enhanced
HPC Advisory Council Spain Conference 2012 21/72
Outline
GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status
HPC Advisory Council Spain Conference 2012 22/72
Introduction to rCUDA
A framework enabling that a CUDA-based A framework enabling that a CUDA based application running in one node can
access GPUs in other nodes
It is useful when you have:
Moderate level of data parallelism
A li ti f lti GPU ti Applications for multi GPU computing
HPC Advisory Council Spain Conference 2012 23/72
Introduction to rCUDA
Moderate level of data parallelism
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
yNetwork
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
Interconnection Network
Adding GPUs at each node makes some GPUs to remain idle for long periods. This is a waste of money and energy
HPC Advisory Council Spain Conference 2012 24/72
Introduction to rCUDA
Moderate level of data parallelism
GPU mem
GPU mem
GPU mem
GPU mem
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
yNetwork
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
Interconnection Network
Add only the GPUs that are needed considering application requirements and their level of data parallelism and
HPC Advisory Council Spain Conference 2012 25/72
Introduction to rCUDA
Moderate level of data parallelism
GPU mem
GPU mem
GPU mem
GPU mem
Logical connections
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
yNetwork
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
Interconnection Network
Add only the GPUs that are needed considering application requirements and their level of data parallelism and make all of them accessible from every node
HPC Advisory Council Spain Conference 2012 26/72
Introduction to rCUDA
Applications for multi-GPU computingpp p g
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
yNetwork
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
Interconnection Network
From a given CPU it is only possible to access the corresponding GPUs in that very same node
HPC Advisory Council Spain Conference 2012 27/72
Introduction to rCUDA
Applications for multi-GPU computing
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
pp p g
Logical connections
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
yNetwork
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
Interconnection Network
Make all GPUs accessible from every node
HPC Advisory Council Spain Conference 2012 28/72
Introduction to rCUDA
Applications for multi-GPU computing
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
GPU mem
pp p g
Logical connections
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
yNetwork
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
CPU
PCI-e
Mai
nM
emor
y
Network
Interconnection Network
Make all GPUs accessible from every node and enable the access from a CPU to as many as required GPUs
HPC Advisory Council Spain Conference 2012 29/72
Outline
GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status
HPC Advisory Council Spain Conference 2012 30/72
rCUDA functionality
CUDA application
Application
CUDAlibraries
HPC Advisory Council Spain Conference 2012 31/72
rCUDA functionality
CUDA applicationClient side Server side
ApplicationApplication
CUDAdriver + runtime
CUDAlibraries
HPC Advisory Council Spain Conference 2012 32/72
rCUDA functionality
CUDA applicationClient side Server side
Application rCUDA daemon
CUDAlibraries
rCUDA library
Networkdevice
Networkdevicedevice device
HPC Advisory Council Spain Conference 2012 33/72
rCUDA functionality
CUDA applicationClient side Server side
Application rCUDA daemon
CUDAdriver + runtime
rCUDA library
Networkdevice
Networkdevicedevice device
HPC Advisory Council Spain Conference 2012 34/72
rCUDA functionality
CUDA applicationClient side Server side
Application rCUDA daemon
CUDAdriver + runtime
rCUDA library
Networkdevice
Networkdevicedevice device
HPC Advisory Council Spain Conference 2012 35/72
rCUDA functionality
rCUDA uses a proprietary communication protocolcommunication protocol
Example:
1) initialization2) memory allocation on the
remote GPUremote GPU3) CPU to GPU memory
transfer of the input data 4) kernel execution4) kernel execution5) GPU to CPU memory
transfer of the results 6) GPU memory release) y7) communication channel
closing and server process finalization
HPC Advisory Council Spain Conference 2012 36/72
p
Outline
GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status
HPC Advisory Council Spain Conference 2012 37/72
rCUDA v3 (stable)
Current stable release
Uses TCP/IP stack
Runs over all TPC/IP networks
Presents some non-negligible overhead due to the Presents some non-negligible overhead due to the
use of TCP/IP
HPC Advisory Council Spain Conference 2012 38/72
rCUDA v3 (stable)Execution time for matrix-matrix multiplication (GEMM)
T l C1060T l C1060
60
70Tesla C1060Intel Xeon E5410 (8 cores)Gigabit Ethernet
Tesla C1060Intel Xeon E5410 (8 cores)Gigabit Ethernet CPU
40
50
kernexecu
rCUDA
(sec
)
20
30
eltionD
tra
Tim
e
0
10
Data
ansfers
512
2048
4096
6144
8192
1024
012
288
1433
616
384
M t i di iJ. Duato, F. D. Igual, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, and F. Silla, “An efficient implementation
HPC Advisory Council Spain Conference 2012 39/72
Matrix dimensionof GPU virtualization in high performance clusters”, in Euro-Par 2009, Parallel Processing –Workshops, 6043, pp. 385-394, Lecture Notes in Computer Science, Springer-Verlag, 2010.
Outline
GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status
HPC Advisory Council Spain Conference 2012 40/72
rCUDA v4 (alpha) Low-level InfiniBand support
InfiniBand is the most used HPC network InfiniBand is the most used HPC network Low latency and high bandwidth
HPC Advisory Council Spain Conference 2012 41/72
Top500 June
rCUDA v4 (alpha)
Same user level functionality: CUDA Runtime Use of IB-Verbs
All TCP/IP stack overflow is out• All TCP/IP stack overflow is out Bandwidth client to/from remote GPU near
th k I fi iB d t k b d idththe peak InfiniBand network bandwidth Use of GPUDirect
Reduce the number of intra-node data movements Use of pipelined transfers Use of pipelined transfers
Overlap intra-node data movements and transfers
GPUDirect RDMA support is coming
HPC Advisory Council Spain Conference 2012 42/72
GPUDirect RDMA support is coming
rCUDA v4 (alpha)
Remote GPU Transfers (GPUDirect)
HPC Advisory Council Spain Conference 2012 43/72
rCUDA v4 (alpha)
Remote GPU Transfers (GPUDirect)
HPC Advisory Council Spain Conference 2012 44/72
rCUDA v4 (alpha)
Remote GPU Transfers (GPUDirect)
HPC Advisory Council Spain Conference 2012 45/72
rCUDA v4 (alpha)
Remote GPU Transfers (GPUDirect)
HPC Advisory Council Spain Conference 2012 46/72
rCUDA v4 (alpha)
Remote GPU Transfers (GPUDirect)
HPC Advisory Council Spain Conference 2012 47/72
rCUDA v4 (alpha)
Remote GPU Transfers (GPUDirect RDMA) CUDA 5, upcoming
HPC Advisory Council Spain Conference 2012 48/72
rCUDA v4 (alpha)
Remote GPU Transfers (GPUDirect RDMA) CUDA 5, upcoming
HPC Advisory Council Spain Conference 2012 49/72
rCUDA v4 (alpha)
Remote GPU Transfers (GPUDirect RDMA) CUDA 5, upcoming
HPC Advisory Council Spain Conference 2012 50/72
rCUDA v4 (alpha)
Bandwidth for a matrix of 4096 x 4096 single precision
5000
Bandwidth for a matrix of 4096 x 4096 single precision
350040004500
B/s
ec) QDR InfiniBand
2000250030003500
wid
th (M
B
IB peak bandwidth 2900 MB/sec
500100015002000
Ban
dw
rCUDA GigaE rCUDA IPoIB rCUDA IBVerbs CUDA0
500
J D t A J P ñ F Sill R M E S Q i t O tí CUDA I fi iB d
HPC Advisory Council Spain Conference 2012 51/72
J. Duato, A. J. Peña, F. Silla, R. Mayo, E. S. Quintana-Ortí, rCUDA InfiniBandperformance, Poster, 2011. International Supercomputing Conference.
rCUDA v4 (alpha)
Execution time for a matrix-matrix product
45
50Tcpu Tgpu TrCUDA Tesla C2050
Intel Xeon E5645QDR InfiniBand
30
35
40
20
25
30
Tim
e (s
)
5
10
15
T
100 2100 4100 6100 8100 10100600
11001600 2600
31003600 4600
51005600 6600
71007600 8600
91009600 10600
1110011600
1210012600
1310013600
1410014600
0
HPC Advisory Council Spain Conference 2012 52/72
Matrix dimension
rCUDA v4 (alpha)
Overhead time for a matrix-matrix product
0,9
1% overhead gpu % overhead rcuda
Tesla C2050Intel Xeon E5645QDR InfiniBand
0 6
0,7
0,8QDR InfiniBand
0,4
0,5
0,6
ove
rhea
d
0 1
0,2
0,3%
100 2100 4100 6100 8100 10100600
11001600 2600
31003600 4600
51005600 6600
71007600 8600
91009600 10600
1110011600
1210012600
1310013600
1410014600
0
0,1
HPC Advisory Council Spain Conference 2012 53/72
100 2100 4100 6100 8100 101001100 3100 5100 7100 9100 11100121001310014100
Matrix dimension
rCUDA v4 (alpha)Execution time for the LAMMPS application, in.eam input scriptscaled by a factor of 5 in the three dimensions
Tesla C2050I t l X E5520Intel Xeon E5520QDR InfiniBand
C. Reaño, A. J. Peña, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Ortí. “CU2rCU: towards the
HPC Advisory Council Spain Conference 2012 54/72
complete rCUDA remote GPU virtualization and sharing solution”, in Proceedings of the 2012 International Conference on High Performance Computing (HiPC 2012), Pune, India, Dec. 2012.
rCUDA v4 (alpha)
Multi-thread capabilities (thread-safe)
HPC Advisory Council Spain Conference 2012 55/72
rCUDA v4 (alpha)
Multi-server capabilities
HPC Advisory Council Spain Conference 2012 56/72
rCUDA v4 (alpha)
Multi-thread and multi-server capabilities
HPC Advisory Council Spain Conference 2012 57/72
rCUDA v4 (alpha)
GPUDirect 2.0 emulation is coming
HPC Advisory Council Spain Conference 2012 58/72
rCUDA v4 (alpha)
Multi-thread and multi-server performance
C. Reaño, A. J. Peña, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Ortí. “CU2rCU: towards the
HPC Advisory Council Spain Conference 2012 59/72
complete rCUDA remote GPU virtualization and sharing solution”, in Proceedings of the 2012 International Conference on High Performance Computing (HiPC 2012), Pune, India, Dec. 2012.
rCUDA v4 (alpha) rCUDA does not support the CUDA extensions to C In order to execute a program within rCUDA, the CUDA p g ,
extensions included in its code must be “unextended” to the plain C API
NVCC inserts calls to undocumented CUDA functions
HPC Advisory Council Spain Conference 2012 60/72
rCUDA v4 (alpha)
CU2rCU tool
HPC Advisory Council Spain Conference 2012 61/72
rCUDA v4 (alpha)
C. Reaño, A. J. Peña, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Ortí. “CU2rCU: towards the
HPC Advisory Council Spain Conference 2012 62/72
complete rCUDA remote GPU virtualization and sharing solution”, in Proceedings of the 2012 International Conference on High Performance Computing (HiPC 2012), Pune, India, Dec. 2012.
Outline
GPU computing GPU computing GPU computing scenarios The wish list Introduction to rCUDA Introduction to rCUDA rCUDA functionality rCUDA v3 (stable) rCUDA v4 (alpha) rCUDA v4 (alpha) Current status
HPC Advisory Council Spain Conference 2012 63/72
Current Status
How compatible with CUDA is rCUDA?
HPC Advisory Council Spain Conference 2012 64/72
Current StatusCUDA Runtime API functions implemented by rCUDA
rCUDA does not provide support for graphic functions
HPC Advisory Council Spain Conference 2012 65/72
Current StatusNVIDIA CUDA C SDK code samples included in rCUDA SDK
SDK samples using graphic functionsor
driver API functions (not suppported)driver API functions (not suppported)
HPC Advisory Council Spain Conference 2012 66/72
Current Status
rCUDA v3 2 stable release rCUDA v3.2 stable release Non-pipelined TCP/IP transfers CUDA 4.2 support:
Only excluding graphics interoperability Not thread-safe Single rCUDA server per application Only one GPU module per application allowed CUDA C extensions unsupportedCUDA C extensions unsupported rCUDA 3.0a experimental build for MS Windows
HPC Advisory Council Spain Conference 2012 67/72
Current Status
rCUDA v4 alpha features (experimental) rCUDA v4 alpha features (experimental) CUDA 4.2 (excluding graphics interoperability) TCP + low-level InfiniBand communications GPU & network pipelined transfers (GPUDirect) Multithread support for applications (thread-safe) Multiserver capabilitiesp Support for CUDA C extensions (CU2rCU) Multiple GPU modules allowed Multiple GPU modules allowed MPI integration (MVAPICH + OpenMPI)
HPC Advisory Council Spain Conference 2012 68/72
Current Status
What’s coming? What s coming? CUDA 5 Inter-node GPUDirect 2 emulation GPUDirect RDMA rCUDA integration into SLURM Full Runtime API support:pp
Graphics interoperability rCUDA 4 for MS WindowsrCUDA 4 for MS Windows …
HPC Advisory Council Spain Conference 2012 69/72
http://www.rcuda.net
Free binary distribution (rCUDA v3.2 + rCUDA v4.0a): o Fedorao Fedorao Scientific Linux (RHEL compatible)o Ubuntuo OpenSuseo MS Windows
http://www.rcuda.net
Antonio Peña Ad i C t llórCUDA Team
Antonio PeñaCarlos ReañoFederico Silla
José Duato
Adrian CastellóRafael MayoEnrique S. Quintana-Ortí
Parallel Architectures
High Performance Computing and
A hi G
Thanks to Mellanox and AIC for their kindly support to this work
Group Architectures Group
Thanks to Mellanox and AIC for their kindly support to this work