large-scale ultrasound simulations using the hybrid … · 2015. 5. 7. · gcs supercomputer...

21
Large-scale Ultrasound Simulations Using the Hybrid OpenMP/MPI Decomposition Jiri Jaros*, Vojtech Nikl*, Bradley E. Treeby *Department of Compute Systems, Brno University of Technology Department of Medical Physics, University College London

Upload: others

Post on 09-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

Large-scale Ultrasound Simulations Using

the Hybrid OpenMP/MPI Decomposition

Jiri Jaros*, Vojtech Nikl*, Bradley E. Treeby†

*Department of Compute Systems, Brno University of Technology

† Department of Medical Physics, University College London

Page 2: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

2

Outline

• Ultrasound simulations in soft tissues – What is ultrasound

– Why do we need ultrasound simulations

– Factors for the ultrasound simulations

– What is the challenge

• k-Wave toolbox – Acoustic model

– Spectral methods

• Large-scale ultrasound simulations – 1D domain decomposition with pure-MPI

– 2D hybrid decomposition with OpenMP/MPI

• Achieved results – FFTW scaling

– Strong scaling

• Conclusions and open questions

Jiri Jaros: Large-scale Ultrasound Simulations…

Page 3: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

Longitudinal (compressional) acoustic waves …

… with a frequency above 20 kHz

3

What Is Ultrasound?

Jiri Jaros: Large-scale Ultrasound Simulations…

Page 4: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

4

Ultrasound Simulation

• Photoacoustic imaging

• Aberration correction

• Training ultrasonographers

• Ultrasound transducer design

• Treatment planning (HIFU)

Jiri Jaros: Large-scale Ultrasound Simulations…

Page 5: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

5

HIFU Treatment Planning

Jiri Jaros: Large-scale Ultrasound Simulations…

Scan

• CT or MR scan of a patient

Parameter setting

• Scan segmentation (bones, fat, skin, …)

• Medium parameters (density, sound speed)

Simulation

• Ultrasound propagation simulation

• Dosage, focus position, aberration correction

Operation

• Application of the ultrasound treatment

Page 6: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

6

Factors for Ultrasound Simulation

• Nonlinear wave propagation

– Production of harmonics

– Energy dependent

• Heterogeneous medium

– Dispersion

– Reflection

• Absorbing medium

– Frequency dependent

– Medium dependent

Jiri Jaros: Large-scale Ultrasound Simulations…

Page 7: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

7

How Big Simulations Do We Need?

Speed of sound in water ≈

1500 m/s

At 1 MHz, 20 cm ≈ 133 λ

At 10 MHz, 20 cm ≈ 1333 λ

At 15 grid points per

wavelength, each matrix is

30 TB!

Jiri Jaros: Large-scale Ultrasound Simulations…

Modeling Scenario Source Freq

[MHz] Source Type

Nonlinear

Harmonics

Max Freq

[MHz]

Domain Size

[mm]

Domain Size

[Wavelengths]

X Y Z X Y Z

Diagnostic Ultrasound:

Abdominal Curvilinear Transducer 3 Tone Burst 5 18 150 80 25 1800 960 300

Diagnostic Ultrasound:

Linear Transducer 10 Tone Burst 5 60 50 80 30 2000 3200 1200

Transrectal Prostate HIFU

Minimal Cavitation 4 CW 15 64 80 60 20 3413 2560 853

MR-Guided HIFU

Minimal Cavitation 1.5 CW 10 15 250 250 150 2500 2500 1500

Histotripsy

Intense Cavitation 1 CW 50 50 250 250 150 8333 8333 5000

Page 8: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

8

• k-Wave Toolbox (http://www.k-wave.org)

– 3,385 registered users

• Full-wave 3D acoustic model – including nonlinearity

– heterogeneities

– power law absorption

• Solves coupled first-order equations

Acoustic Model for Soft Tissues

Jiri Jaros: Large-scale Ultrasound Simulations…

momentum conservation

mass conservation

pressure-density relation

absorption term

Page 9: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

9

• Technique – Medium properties generated by Matlab scripts from a medical scan.

– Input signal is injected by a transducer.

– Sensor data is collected in the form of raw time series or aggregated acoustics values.

– Post processing and visualization handled by Matlab.

• Operations executed in every time step – 6 forward 3D FFTs

– 8 inverse 3D FFTs

– 3+3 forward and inverse 1D FFTs in the case of non-staggered velocity

– About 100 element wise matrix operations (multiplication, addition,…)

• Global data set – 14 +3 (scratch) + 3 (unstaggering) real 3D matrices

– 3+3 complex 3D matrices

– 6 real 1D vectors

– 6 complex 1D vectors

– Sensor mask, source mask, source input

– <0 , 20> real buffers for aggregated quantities (max, min, rms, max_all, min_all)

k-space Pseudospectral Method in C++

Jiri Jaros: Large-scale Ultrasound Simulations…

Page 10: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

10

• Implementation language – C/C++ and MPI parallelization

– MPI-FFTW library– efficient way to calculate distributed 3D FFTs

– HDF5 library – hierarchical data format for parallel I/O

• Data decomposition – Data decomposed along the Z dimension

– Data distributed when read using parallel I/O

– Frequency domain operations work on transposed data to reduce the number of global communications (3D transpositions).

K-Wave++ Toolbox

Distributed 1D decomposition

Jiri Jaros: Large-scale Ultrasound Simulations…

Page 11: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

11

K-Wave++ Toolbox

Strong Scaling (1D decomposition)

Jiri Jaros: Large-scale Ultrasound Simulations…

0,01

0,1

1

10

100

SEQ 8 cores(1 node)

16 cores(2 nodes)

32 cores(4 nodes)

64 cores(8 nodes)

128 cores(16 nodes)

256 cores(32 nodes)

512 cores(64 nodes)

1024 cores(128 nodes)

Tim

e p

er t

imes

tep

[s]

Strong Scaling of Ultrasound Simulations Problem size remains constant as the number of cores is increased

128x128x128 256x128x128 256x256x128 256x256x256 512x256x256

512x512x256 512x512x512 1024x512x512 1024x1024x512 1024x1024x1024

2048x1024x1024 2048x2048x1024 2048x2048x2048 4096x2048x2048

Page 12: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

• 1D decomposition

– The number of cores is limited by the largest dimension

– It makes some simulation run for too long

– It makes some simulation not fit into memory

+ It requires less communication

• 2D decomposition

+ The number of codes is limited by a product of two largest dimensions

+ It is enough for anything we could think of running

– It requires more communication and smaller messages

• Example

– 40963 matrix -> ~256GB in single precision

– Say 32 matrices -> 8TB RAM in total

– Max 4096 cores -> 2GB per core (Anselm – 2GB, Fermi – 1GB)

Jiri Jaros: Large-scale Ultrasound Simulations… 12

Scalability Problem

Page 13: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

Jiri Jaros: Large-scale Ultrasound Simulations… 13

+ high core count limit

+ only 1 MPI global transposition

+ lower number of larger MPI messages

2D Hybrid Domain Decompostion

Page 14: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

Jiri Jaros: Large-scale Ultrasound Simulations… 14

FFT libraries strong scaling

Anselm supercomputer (10243)

Page 15: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

Jiri Jaros: Large-scale Ultrasound Simulations… 15

FFT libraries strong scaling

Fermi supercomputer (10243)

Page 16: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

Jiri Jaros: Large-scale Ultrasound Simulations… 16

Time distribution of hybrid FFT

Page 17: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

Jiri Jaros: Large-scale Ultrasound Simulations… 17

Simulation scaling (Anselm)

4

8

16

32

64

128

256

512

1024

2048

4096

8192

16384

32768

65536

16 32 64 128 256 512 1024 2048

Tim

e p

er

tim

est

ep [

ms]

Cores

128 - Pure

128 - Socket

128 - Node

256 - Pure

256 - Socket

256 - Node

512 - Pure

512 - Socket

512 - Node

1024 - Pure

1024 - Socket

1024 - Node

Page 18: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

Jiri Jaros: Large-scale Ultrasound Simulations… 18

Simulation scaling (SuperMUC)

4

8

16

32

64

128

256

512

1024

2048

4096

8192

128 256 512 1024 2048 4096 8192

Tim

e p

er

tim

est

ep [

ms]

Cores

128 - Pure

128 - Socket

128 - Node

256 - Pure

256 - Socket

256 - Node

512 - Pure

512 - Socket

512 - Node

1024 - Pure

1024 - Socket

1024 - Node

Page 19: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

Jiri Jaros: Large-scale Ultrasound Simulations… 19

Memory scaling (SuperMUC)

1

2

4

8

16

32

64

128

256

512

1024

128 256 512 1024 2048 4096 8192

Me

mo

ry p

er

core

[M

B]

Cores

128 - Pure

128 - Socket

128 - Node

256 - Pure

256 - Socket

256 - Node

512 - Pure

512 - Socket

512 - Node

1024 - Pure

1024 - Socket

1024 - Node

Page 20: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

20

Conclusions

• Clinical relevant results – To get clinically relevant

simulation we need grid sizes of 40963 to 81923 at least for 50k simulation timesteps

• Two different Implementations

– 1D domain decomposition gives better results for small core counts

– 2D domain decomposition works well on Anselm, however there is a room for improvement on SuperMUC

– Memory scaling enables us to run much bigger simulations

• Future work – Communication and

synchronization reduction via overlapping

Jiri Jaros: Large-scale Ultrasound Simulations…

Page 21: Large-scale Ultrasound Simulations Using the Hybrid … · 2015. 5. 7. · GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, ). Title: Slide 1 Author: Jiri Jaros Created

21

Our work has been supported by following institutions and grants

Questions and Comments

Jiri Jaros: Large-scale Ultrasound Simulations…

The project is financed from the SoMoPro II programme. The research leading to this invention has acquired a financial grant from the People Programme (Marie

Curie action) of the Seventh Framework Programme of EU according to the REA Grant Agreement No. 291782. The research is further co-financed by the South-

Moravian Region. This work reflects only the author’s view and the European Union is not liable for any use that may be made of the information contained

therein.

This work was also supported by the research project "Architecture of parallel and embedded computer systems", Brno University of Technology, FIT-S-14-2297,

2014-2016.

This work was supported by the IT4Innovations Centre of Excellence project (CZ.1.05/1.1.00/02.0070), funded by the European Regional Development Fund and

the national budget of the Czech Republic via the Research and Development for Innovations Operational Programme, as well as Czech Ministry of Education,

Youth and Sports via the project Large Research, Development and Innovations Infrastructures (LM2011033).

We acknowledge CINECA and PRACE Summer of HPC project for the availability of high performance computing resources.

The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time on the

GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, www.lrz.de).