software support for advanced computing platforms
DESCRIPTION
Software Support for Advanced Computing Platforms. Ananth Grama Professor, Computer Sciences and Coordinated Systems Lab., Purdue University. [email protected] http://www.cs.purdue.edu/pdsl. Building Applications for Next Generation Computing Platforms. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/1.jpg)
Software Support for Advanced Computing Platforms
Ananth GramaProfessor, Computer Sciences and
Coordinated Systems Lab.,Purdue University.
[email protected]://www.cs.purdue.edu/pdsl
![Page 2: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/2.jpg)
Building Applications for Next Generation Computing Platforms
• Emerging trends point to two disruptive technologies:
– Architecture innovations from the desktop to scalable systems
– Embedded intelligence and ubiquitous processing
• How do we program these platforms efficiently?
Very little of what we have learned over three decades of parallel programming directly applies here.
![Page 3: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/3.jpg)
Evolution of Microprocessor Architectures
• Chip-Multiprocessor Architectures
• Scalable Multicore Platforms
• Heterogeneous Multicore Processors
• Transactional Memory
![Page 4: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/4.jpg)
Multicore Architectures -- An Overview
• The Myth:
– Multicore processors are designed for speed.
• The Reality:
Multicore processors are motivated by power considerations:
– Power is proportional to clock speed
– Power is quadratic in Vdd
– Vdd can be reduced as clock speed is reduced
– Computation speed is generally sublinear in clock speed
![Page 5: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/5.jpg)
Multicore Architectures -- An Overview
• Collocate multiple processor cores on a single chip (a special class of chip-multiprocessors)
• Programming model is typically thread-based
• Many microprocessors are hardware compatible with existing motherboards (memory performance?)
• Memory systems vary widely across various vendors (AMD vs. Intel vs. IBM PowerPC/Cell)
![Page 6: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/6.jpg)
Multicore Architectures -- Trends
• Current generation typically at dual- or quad-core
• Desktops and mobile dual-core variants available
• Scalable multicore: AMD and Intel both plan up to 16 cores in the next two years and up to 64 cores in the medium term.
• Heterogeneous multicore: some of the most commoly used processors today are heterogeneous multicore (network routers, ARM/TI DSPs in cell-phones).
![Page 7: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/7.jpg)
Memory System Architecture
• Trading off latency and bandwidth (the Cell solution)
• Programmable caches
• Transactional Memory
![Page 8: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/8.jpg)
Transactional Memory Overview
• Addresses problems of correctness of parallel programs as well as performance.
• Requires hardware support.
• Mitigates many of the problems associated with locks – composability, granularity, mixing correctness and performance.
![Page 9: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/9.jpg)
Transactional Memory Overview
begin_transaction
x = x + 1
y = y + x
if (x < 10)
z = x;
else
z = y;
end_transaction
Thread 1
begin_transaction
x = x - 1
y = y - x
if (x > 10)
z = x;
else
z = y;
end_transaction
Thread 2
Each thread sees either all, or none of the other threads updates.
Basic mechanisms: isolation (conflict detection), versioning (maintain versions), and atomicity (commit or rollback).
![Page 10: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/10.jpg)
Implications for Application Development and Performance
• Fundamental changes in the entire application stack
– Programming paradigms (models of concurrency)
– Software support (compilers, OS)
– Library support (application kernels)
– Runtime systems and performance monitoring (performance bottlenecks and alleviation)
– Analysis techniques (scaling to the extreme)
![Page 11: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/11.jpg)
Ongoing work at Purdue / collaborators – A Birds-eye View
(Collaborators: Intel -- Compilers, Libraries, UMN -- Analysis Techniques, EPFL -- Programming Paradigms)
Programming Models: What are appropriate concurrency abstractions?
– When is communication good?
– How do we deal with the spectrum of coherence models seamlessly?
– How do we use transactions in real programs (I/O and networks are not transactional)
![Page 12: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/12.jpg)
Programming Models: The Mediera Environment
– Define domains of identical coherence models.
– Build slack into concurrency.
– View other cores as intelligent caches.
– Use an LRU-type strategy to swap out threads across cores.
– Support for algorithmic asynchrony.
A number of important issues need to be resolved relating to mixed models -- messaging overhead associated with swapped out threads, resource bounds, livelock, priority inversion.
![Page 13: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/13.jpg)
Library Support
• Building optimized multicore libraries for important computational kernels (sparse algebra, quantum scale – MD methods) / Intel MKE.
• Novel algorithms for memory-constrained platforms (excess FLOPS, instead of excess memory accesses).
• Demonstrated application performance (model reduction, nano-scale modeling).
• Comprehensive benchmarking of platforms (DARPA/HPCS pilot study) with a view to identifying performance bottlenecks and desirable application characteristics.
![Page 14: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/14.jpg)
Analysis Techniques
How do we analyze programs over large number of cores?
• Isoefficiency metric
– Scaling problem size with number of cores to maintain performance.
• Memory constrained scaling
– Quantifying drop in performance with increase in number of cores while operating at peak memory
• Impact of limited bandwidth
– Increasing number of cores implies lower bandwidth at each core
![Page 15: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/15.jpg)
Technical Objective
To develop the next generation software environment for scalable chip-multiprocessor systems, along with library support and validating applications.
![Page 16: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/16.jpg)
Software Environments for Embedded Systems
Setting of calibration tests
![Page 17: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/17.jpg)
Programming Scalable Systems
• The traditional approach to distributed programming involves writing “network-enabled” programs for each node– The program encodes distributed system behavior using
complex messaging between nodes
– This paradigm raises several issues and limitations:• Program development is time consuming
• Programs are error prone and difficult to debug
• Lack of a distributed behavior specification, which precludes verification
• Limitations with respect to scalability, heterogeneity
and performance
![Page 18: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/18.jpg)
Programming Scalable Systems
• Macroprogramming entails direct specification of the distributed system behavior in contrast to programming individual nodes
• Provides:– Seamless support for heterogeneity
• Uniform programming platform
• Node capability-aware abstractions
• Performance scaling
– Separating the application from system-level details
– Scalability and adaptability with network & load dynamics
– Validation of behavioral specification
![Page 19: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/19.jpg)
Technical Objective
To develop a second generation operating system suite that facilitates rapid macroprogramming of efficient self-organized distributed applications for scalable embedded systems
![Page 20: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/20.jpg)
Ongoing Work: The CosmOS System Suite for Embedded Environments
• CosmOS Components:– Programming model, compilation techniques– Device independent node operating system
interfaces and implementations– Network operating system
![Page 21: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/21.jpg)
CosmOS Programming Model
• Macroprogram consists of:• Distributed system behavioral specification
• Constraints associated with mapping behavioral specification to physical system
• Behavioral Specification– Functional Components (FCs)
• Represents a specific data processing function
• Typed input and output interface
– Interaction Assignment (IA)• Directed graph that specifies data flow through FCs
• Data source and sinks are (logical) device ports
![Page 22: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/22.jpg)
CosmOS Program Valdiation
• Statically type-checked interaction assignment• The output of a component can be connected to the input of
another only if their types match
• Functional components represent a deterministic data processing function
• The output sequence depends only on the inputs to the FC
• Correctness• Given input at each source in the IA the outputs at sinks are
deterministically known
![Page 23: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/23.jpg)
CosmOS Functional Components
• Elementary unit of execution– Isolated from the state of the system and other FCs– Uses only stack variables and statically assigned state memory– Asynchronous execution: data flow and control flow handled by
cosmOS
• Static memory– Prevents non-deterministic behavior due to malloc failures– Leads to a lean memory management system in the OS
• Reusable components– The only interaction is via typed interfaces
• Dynamically loadable components– Runtime updates possible
Average
raw_t
avg_t
avg_t
![Page 24: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/24.jpg)
CosmOS Program Specification
• Sections:– Enumerations
– Declarations
– Mapping constraints
– IA Description
![Page 25: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/25.jpg)
CosmOS Program: An Example• %photo : device = PHOTO_SENSOR, out [ raw_t ];• %fs : device = FILE_DUMP, in [ * ];• %avg : { fcid = FCID_AVG, in [ raw_t, avg_t ], out [ avg_t ] };• %thresh : { fcid = FCID_THRESH, in [ raw_t ], out [ raw_t ] };• @ snode = CAP_PHOTO_SENSOR : photo, thresh;• @ fast_m = CAP_FAST_CPU : avg;• @ server = CAP_FS | CAP_UNIQUE_SERVER : avg, fs;• start_ia• timer(100) photo(1);• photo(1) thresh(2,0,500);• thresh(2,0) avg(3,0,10), avg(4,0,100);• avg(3,0) fs(5) | avg(3,1);• avg(4,0) fs(6) | avg(4,1);• end_ia
raw_t
T(t)
P() Threshold(500)
raw_t raw_t*Average
(10)raw_t avg_t
FS
*Average(100)
raw_t avg_tFS
avg_t
avg_t
![Page 26: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/26.jpg)
CosmOS: Runtime System
Average(10)avg_t
raw_t *avg_tFS
raw_t *avg_tFS
raw_traw_t
T(t)
P() Threshold(500)
raw_t
Average(100)avg_t
![Page 27: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/27.jpg)
CosmOS: Runtime System
• Provides a low-footprint execution environment for CosmOS programs
• Key components– Data flow and control flow
– Locking and concurrency
– Load conditioning
– Routing primitives
![Page 28: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/28.jpg)
CosmOS Node Operating System
UpdateableUser space
Static OSKernel
Platform Independent Kernel
App FC App FC App FCServicesServices
HW Drivers HW Drivers HW Drivers
Hardware Abstraction Layer
![Page 29: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/29.jpg)
CosmOS: Current Status
• Fully functional implementations for Mica2 and POSIX (on Linux)
• Mica2:• Non-preemptive function pointer scheduler• Dynamic memory management
• POSIX:• Multi-threading using POSIX threads and
underlying scheduler• The OS exists as library calls and a single
management thread
![Page 30: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/30.jpg)
CosmOS: Current Status
• Comprehensively evaluated and validated
• Alpha releases can be freely downloaded from:
http://www.cs.purdue.edu/~awan/cosmos/
![Page 31: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/31.jpg)
CosmOS Validation
ECN Net
ECN Net
Internet
Internet
802.11b Peer-to-Peer
FM 433MHz
Laser attachedvia serial port to
Stargate computers
MICA2 motes withADXL 202
Currently laser readingscan be viewed for from anywhere over the Internet(conditioned on firewall settings)
Pilot deployment at BOWEN labs
![Page 32: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/32.jpg)
CosmOS: Ongoing Work
• Semantics of the CosmOS Programming Model
• GUI for Interaction Assignment
• Library of modules
• Large-scale deployment and scalability studies
• Application-specific optimizations.
![Page 33: Software Support for Advanced Computing Platforms](https://reader036.vdocuments.mx/reader036/viewer/2022062518/568146b2550346895db3ce96/html5/thumbnails/33.jpg)
Thank you!
For papers and talks on these topics, please visit:
http://www.cs.purdue.edu/pdsl