mapping biomedical applications onto gpu platforms joseph jaja ... · pdf file•...

26
Mapping Biomedical Applications onto GPU Platforms Joseph JaJa University of Maryland

Upload: lamliem

Post on 23-Feb-2018

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Mapping Biomedical Applications onto GPU Platforms

Joseph JaJa University of Maryland

Presenter
Presentation Notes
The title of our paper is high performance FFT based Poisson Solver on a CPU-GPU heterogeneous platform.
Page 2: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

• Collaboration between GWU (Balaras), UMD (Solares, Wu), and University of Chicago (Dubey).

• Goal: Development of high performance algorithms applicable to fluid-structure interactions in viscous incompressible flows.

• Example application: interactions between the red blood cells and plasma

• Critical Components: Poisson equation solver combined with a multigrid algorithm. Multi-dimensional FFTs and several types of matrix computations

Fluid-Structure Interactions

2

Page 3: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland at Baltimore (Gullapalli, Herskovits, etc.)

• Understanding of brain connectivity differences between subjects with brain disorders and normal subjects using diffusion MRI.

• Dynamics of functional brain connectivity using resting state fMRI, for subjects with moderate TBI.

Data-Driven Understanding of Brain Disorders

3

Page 4: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Connectivity Matrix

4

• Diffusion MRI images with 64 diffusion frames with resolution 128×128×52.

• Probabilistic Tractography • Number of entries in the

sparse connectivity matrix: 100,000,000-200,000,000.

• Number of voxels in ROI: 100,000-200,000.

Page 5: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Inflammatory Responses and Wound Healing in Vocal Fold

N. Seekhao Collaborators: N. Li, C. Shung, L. Mongeau

(McGill U.)

Biomechanical Stress

Mucosal Damage

Cell Recruitment

Cell Function

Presenter
Presentation Notes
Slide 1
Page 6: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Inflammatory Responses & Wound Healing in Vocal Fold

Biomechanical Stress

Mucosal Damage

Cell Recruitment

Cell Function

Force applied on tissue. Talking, shouting etc.

Image from : http://2.bp.blogspot.com/-DI0yRAeRKjA/TrDdREMzr_I/AAAAAAAAH6o/QFgZ7xFFRjg/s320/shout png

Presenter
Presentation Notes
Slide 1
Page 7: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Inflammatory Responses & Wound Healing in Vocal Fold

Biomechanical Stress

Mucosal Damage

Cell Recruitment

Cell Function

Damage in the tissue of the vocal fold

Image from : https://wiki.uiowa.edu/download/attachments/39001206/nodules%20op%205 png?api=v2

Presenter
Presentation Notes
Slide 1
Page 8: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Inflammatory Responses & Wound Healing in Vocal Fold

Biomechanical Stress

Mucosal Damage

Cell Recruitment

Cell Function

Attracting cells such as platelets, neutrophils, and macrophages to the wound site

Image from : http://www.biospectrumasia.com/IMG/362/44362/atherosclerotic-lesions-generates-robust-t-cell-anti-inflammatory-response-262x174 jpg

Presenter
Presentation Notes
Slide 1
Page 9: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Inflammatory Responses & Wound Healing in Vocal Fold

Biomechanical Stress

Mucosal Damage

Cell Recruitment

Cell Function

Each cell perform its duty. One or more of the following: • Secrete chemical (IL-1,

MMP-8 etc.) to attract, excite or inhibit other cells

• Deposit ECM protein (collagen, elastin etc.) to heal damaged tissue

• Clean up cell debris

Presenter
Presentation Notes
Slide 1
Page 10: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Agent-Based Modeling (ABM)

1. Bottom-up, rule-based, discrete-event and discrete-time computational model

2. Initial “World” and a collection of “agents.” 3. Interactions between agents and the world.

– Agents migrate to area of injury – Remove dead cells and tissue debris – Remodel ECM to heal damaged tissue

4. Stochastic moves 5. Emergent behavior

Presenter
Presentation Notes
Emergent behavior – gives you an outcome, also you don’t need to know everything about the system Spatial space – useful for intracellular pathways
Page 11: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

ABMs of Vocal Fold Wound Healing Process Tissue area of

interest (ABMs term: World)

Slices of tissue (ABMs term: Patches)

Presenter
Presentation Notes
Slide 2
Page 12: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Tissue area of interest

(ABMs term: World)

Slices of tissue (ABMs term: Patches)

IL-1 MMP-8

ABMs of Vocal Fold Wound Healing Process

Presenter
Presentation Notes
Slide 2
Page 13: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Tissue area of interest

(ABMs term: World)

Slices of tissue (ABMs term: Patches)

Components of tissue (ECM) such

as Collagen, Elastin, Hyaluronic

Acid

IL-1 MMP-8

ABMs of Vocal Fold Wound Healing Process

Presenter
Presentation Notes
Slide 2
Page 14: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Tissue area of interest

(ABMs term: World)

Slices of tissue (ABMs term: Patches)

Components of tissue (ECM) such

as Collagen, Elastin, Hyaluronic

Acid

IL-1 MMP-8

Chemical Levels (ABMs term: Patches Attributes)

ABMs of Vocal Fold Wound Healing Process

Presenter
Presentation Notes
Slide 2
Page 15: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Fibroblast (Cell) (ABMs term: Agents)

Neutrophil (Cell) (ABMs term: Agents)

Macrophage (Cell) (ABMs term: Agents)

ABMs of Vocal Fold Wound Healing Process

Presenter
Presentation Notes
Slide 2
Page 16: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Fibroblast (Cell) (ABMs term: Agents)

Neutrophil (Cell) (ABMs term: Agents)

Macrophage (Cell) (ABMs term: Agents)

ABMs of Vocal Fold Wound Healing Process

Presenter
Presentation Notes
Slide 2
Page 17: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Fibroblast (Cell) (ABMs term: Agents)

Neutrophil (Cell) (ABMs term: Agents)

Macrophage (Cell) (ABMs term: Agents)

ABMs of Vocal Fold Wound Healing Process

Presenter
Presentation Notes
Slide 2
Page 18: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Problem Scale

* Number of cells increase throughout the simulation due to proliferation. Current model shows a doubling of number of cells after the end of “5-day” simulation.

Presenter
Presentation Notes
Slide 3
Page 19: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

• Computationally demanding applications with irregular memory access patterns and involving large data sizes that cannot fit on the GPU memory

• Need to use heterogeneous platforms involving multicore CPU with one or more many-core GPUs.

• Performance Goal: try to achieve the same performance rate or throughput as in the case when the data fits on the GPU.

Characteristic Features of Applications

19

Presenter
Presentation Notes
The problem we are trying to solve is of size that’s larger than the capacity of a single GPU memory size. Major components of our Poisson Solver are 2D or 3D FFTs which are normally considered as typical memory bounded applications. On the other hand, our target platform is CPU-GPU heterogeneous platform: communication between CPU and GPU memory is done by PCIe bus. Three types of memory are involved and they have rather different bandwidth, the difference is more than one magnitude. In addition, the multithreading level is also imbalanced. On our evaluation platform, the CPU is a 8-core Xeon 5560 and a typical CUDA GPU could have hundreds of processor cores. ----- Meeting Notes (5/20/13 20:28) ----- Another aspect is the multithreading is imbalanced. The CPU typically has a limited number of quite powerful processors and on the other hand, the GPU can easily have hundreds of simpler processor cores.
Page 20: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Heterogeneous Platforms

CPU Mem (128GB)

Dual-socket Multi-core

CPU -16 cores

GPU GDDR5 Mem (5GB)

Massively Parallel

GPU K20

I/O I/O 5.7GB/s

73GB/s 208GB/s

Presenter
Presentation Notes
Here’s an programming abstraction of a heterogeneous platform.
Page 21: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Dense Matrix Multiplication

• DGEMM: where the matrices are of dimensions:

• Why?

– An important kernel for many problems – Optimization ideas can be used in other problems – Perhaps the most-studied algorithm in high

performance computing

• Can we solve very large DGEMM with the same performance throughput as small DGEMM?

nmCnkBkmA ××× :,:,:

Presenter
Presentation Notes
An important kernel in many problems Appears in many linear algebra algorithms Bottleneck for dense linear algebra, including Top500 One of the 7 dwarfs / 13 motifs of parallel computing Closely related to other algorithms, e.g., transitive closure on a graph using Floyd-Warshall Optimization ideas can be used in other problems The best case for optimization payoffs The most-studied algorithm in high performance computing
Page 22: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Block Matrix Multiplication

• Decompose into blocks

A B C

Presenter
Presentation Notes
Blocking is a common strategy for most optimized DGEMM implementations It involves decomposing the matrices into blocks to fit into one or more levels of caches on a given CPU architecture. The basic idea is to bring frequent data closer into the processors. By reusing the data staged in the faster cache it keeps the processor busy and yields good performance. This is an example of blocked matrix multiplication. Here are matrix A, B and C respectively. Say we blocked the output matrix C into 4x4 blocks. We first think about block C(0,0). To compute C(0,0), we would need the blue blocks of matrix A and those of matrix B. For this example, But the question is how big should the sub-blocks be?
Page 23: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Multiple CUDA Stream Scheduling

Page 24: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Performance Evaluation

Page 25: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

Performance Evaluation

Page 26: Mapping Biomedical Applications onto GPU Platforms Joseph JaJa ... · PDF file• Collaboration between the University of Maryland (Varshney and JaJa) and the University of Maryland

• Many biomedical applications can make effective use of heterogeneous platforms.

• But a significant amount of work is required to organize the computation into multi-stream of data transfers and kernel executions with no or very small stall time.

• Portability of high performance code remains a problem.

Conclusion

26