Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling

Download Last time:  Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling

Post on 04-Jan-2016

19 views

Category:

Documents

0 download

Embed Size (px)

DESCRIPTION

Last time: Runtime infrastructure for hybrid (GPU-based) platforms Task scheduling Extracting performance models at runtime Memory management Asymmetric Distributed Shared Memory - PowerPoint PPT Presentation

TRANSCRIPT

  • Last time: Runtime infrastructure for hybrid (GPU-based) platformsTask schedulingExtracting performance models at runtimeMemory management Asymmetric Distributed Shared Memory

    StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines, Cdric Augonnet, Samuel Thibault, and Raymond Namyst. TR-7240, INRIA, March 2010. [link]An Asymmetric Distributed Shared Memory Model for Heterogeneous Parallel Systems, Isaac Gelado, Javier Cabezas, John Stone, Sanjay Patel, Nacho Navarro, Wen-mei Hwu, ASPLOS10 [pdf]

  • Today: Bridging runtime and language supportVirtualizing GPUs

    Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP11 [pdf]Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011

  • Today: Bridging runtime and language supportVirtualizing GPUs

    Achieving a Single Compute Device Image in OpenCL for Multiple GPUs, Jungwon Kim, Honggyu Kim, Joo Hwan Lee, Jaejin Lee, PPoPP11 [pdf]Supporting GPU Sharing in Cloud Environments with a Transparent Runtime Consolidation Framework, Vignesh T. Ravi et al., HPDC 2011 best paper!

  • Context: clouds shift to support HPC applications

    initially tightly coupled applications not suited for could applications todayChinese cloud with 40Gbps infiniband Amazaon HPC instance GPU instances: Amazon, Nimbix

    Challenge: make GPUs shared resources in the could.

  • Challenge: make GPUs a shared resource in the could.

    Why do this?GPUs are costly resources Multiple VMs on a node with a single GPUIncrease utilization app level: some apps might not use GPUs much;kernel level: some kernels can be collocatd

  • Two streamsHow?Evaluate opportunities gains overheads

  • 1. The How?

    Preamble: Concurrent kernels are supported by todays GPUsEach kernel can execute a different taskTasks can be mapped to different streaming multiprocessors (using thread-block configuration)Problem: concurrent execution limited to the set of kernels invoked within a single processor context

    Past virtualization solutions API rerouting / intercept library

  • 1. The How?

    Preamble: Concurrent kernels are supported by todays GPUsEach kernel can execute a different taskTasks can be mapped to different streaming multiprocessors (using thread-block configuration)Problem: concurrent execution limited to the set of kernels invoked within a single processor context

  • 1. The How?

    Architecture

  • 2. Evaluation The opportunity

    The opportunityKey assumption: Under-utilization of GPUs

    Space-sharingKernels occupy different SP Time-sharingKernels time-share same SP (benefit form harware support form context switces)Note: is it not always possible

  • 2. Evaluation The opportunity

    The opportunityKey assumption: Under-utilization of GPUs

    Sharing Space-sharingKernels occupy different SP Time-sharingKernels time-share same SP (benefit form harware support form context switces)Note: resource conflicts may prevent thisMolding change kernel configuration (different number of thread blocks / threads per block) to improve collocation

  • 2. Evaluation The gains

  • 2. Evaluation The overheads

  • Discussion Limitations Hardware support

  • OpenCL vs. CUDAhttp://ft.ornl.gov/doku/shoc/level1http://ft.ornl.gov/pubs-archive/shoc.pdf

Recommended

View more >