![Page 1: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/1.jpg)
View-Oriented Parallel Programming for multi-
core systems
Dr Zhiyi HuangWorld 45
Univ of Otago
![Page 2: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/2.jpg)
An age of CMT
• CMT offers us the power of parallel computing
• To harness the power relies on good parallel applications and competent parallel programmers
• Sound parallel programming methodology is the key
![Page 3: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/3.jpg)
Two camps
• Message passing vs. shared memory– Message passing style is complex– Communication with shared memory is
simple and easy, but…
![Page 4: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/4.jpg)
Problems for SM-based PP (1)
• Data race condition is a pain– Data race: there are concurrent
accesses to the same memory location, and at least one of them is write access
– To debug a data race condition is difficult since a parallel execution is normally not repeatable
![Page 5: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/5.jpg)
Problems for SM-based PP (2)
• Deadlock is another pain– Mutual exclusive primitives such as
locks are required to prevent data races, but
– it may result in deadlock, a situation where multiple threads/processes wait for each other due to competing for locks
– Mutual exclusion has complicated the mental model of parallel programming
![Page 6: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/6.jpg)
Problems for SM-based PP (3)
• Poor portability is yet another pain– Parallel applications are system
dependent– Mutual exclusive primitives such as lock
are not standardized– Synchronization primitives such as
barrier are not standardized– Shared memory allocation is not
standardized
![Page 7: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/7.jpg)
Solutions?
• A parallel programming style with the following features– Data race free– Mutual exclusion free– Deadlock free– Portable to any systems with shared
memory
![Page 8: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/8.jpg)
View-Oriented Parallel Programming
![Page 9: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/9.jpg)
What is a view?
• Suppose M is the set of data objects in shared memory
• A view is a group of data objects from the shared memory V, VM
• Views must not overlap each other Vi, Vj, i j, Vi Vj =
• Suppose there are n views in shared memory– ∑ Vi=M
![Page 10: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/10.jpg)
VOPP Requirements
• The programmer should divide the shared data into a number of views according to the data flow of the parallel algorithm.
• A view should consist of data objects that are always processed as an atomic set in a program.
• Views can be created and destroyed anytime.
• Each view has a unique view identifier
![Page 11: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/11.jpg)
VOPP Requirements (cont.)
• View primitives such as acquire_view and release_view must be used when a view is accessed.
acquire_view(View_A);A = A + 1;
release_view(View_A);• acquire_Rview and release_Rview
can be used when a view is only read by a processor.
![Page 12: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/12.jpg)
VOPP Requirements (cont.)
• When a process/thread accesses multiple views at the same time, only one acquiring primitive is used.
acquire_3_views(V_A, V_B, V_C);C = A + B;
release_views();
![Page 13: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/13.jpg)
Example
• A VOPP program for a producer/consumer problem
If(prod_id == 0){ acquire_view(1); produce(x); release_view(1);}barrier(0);acquire_Rview(1);consume(x);release_Rview(1);
![Page 14: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/14.jpg)
VOPP features
• No concern of data race condition– The programmer is only concerned about
views, not mutual exclusion– Mutual exclusion is implemented by the
system which detects potential data races as well by checking view boundaries
• Deadlock free– Mutual exclusion is implemented by the
system and can be implemented data race free and deadlock free
• Portability?– By standardization of API
![Page 15: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/15.jpg)
Requirements for the system
• Keep track of view locations• Capable to check view boundaries• Guarantee deadlock free when
implementing mutual exclusion
![Page 16: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/16.jpg)
Advantages of VOPP
• Keep the convenience of shared memory programming
• Focus on data partitioning and data access instead of data race and mutual exclusion– View primitives automatically achieve mutual
exclusion– View primitives are not extra burden
• The programmer can finely tune the parallel algorithm by careful view partitioning
![Page 17: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/17.jpg)
Advantages of VOPP (cont.)
• Implementation independent– View access can be based on mutual exclusion or
Transactional Memory (TM)– TM is a memory system that checks access
conflicts
• Programming language independent– Can be implemented as a user space library
• Performance advantage– Cache pre-fetching when a view is acquired– Can cache a view until the view is not acquired
by any other threads/processes
![Page 18: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/18.jpg)
Philosophy of VOPP
• Shared memory is a critical resource that needs to be used with care– If there is no need to use shared
memory, don’t use it– Justification is wanted before a view is
created– Compatible with Throughput Computing
which encourages multiple independent threads running in a chip
![Page 19: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/19.jpg)
VOPP vs. MPI
• Easier for programmers than MPI– For problems like task queue,
programming with MPI is horrific.• Can mimic any finely-tuned MPI
program– Shared message view– Send/recv acquire_view
• Essential differences– View is location transparent– More barriers in VOPP
![Page 20: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/20.jpg)
Implementation
• VOPP is supported by our DSM system called VODCA– DSM: Distributed Shared Memory system
provides a virtual shared memory on multi-computers
– VODCA: View-Oriented, Distributed, Cluster-based Approach to parallel computing
• VODCA version 1.0– Will be released as an open source software– A library run at the user space– Its implementation will be published on DSM06
![Page 21: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/21.jpg)
Experiment
• Use a cluster computer– The cluster computer, in Tsinghua Univ.,
consists of 128 Itanium 2 running Linux 2.4, connected by InfiniBand. Each node has two 1.3 GHz processors and 4 Gbytes RAM. We run two processes on each node.
• We used four applications, Integer Sort (IS), Gauss, Successive Over-Relaxation (SOR), and Neural Network (NN).
![Page 22: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/22.jpg)
Related systems
• TreadMarks (TMK) is a state-of-the-art Distributed Shared Memory system based on traditional parallel programming.
• Message Passing Interface (MPI) is a standard for message passing-based parallel programming. We used LAM/MPI.
![Page 23: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/23.jpg)
Performance of NN
0
5
10
15
20
25
30
35
2-p 4-p 8-p 16-p 32-p
VODCA
TMK
MPI
![Page 24: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/24.jpg)
Performance of IS
0
5
10
15
20
25
2-p 4-p 8-p 16-p 32-p
VODCA
TMK
MPI
![Page 25: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/25.jpg)
Performance of SOR
0
2
4
6
8
10
12
14
16
2-p 4-p 8-p 16-p 32-p
VODCA
TMK
MPI
![Page 26: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/26.jpg)
Performance of Gauss
0
5
10
15
20
25
2-p 4-p 8-p 16-p 32-p
VODCA
TMK
MPI
![Page 27: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/27.jpg)
Future work on VOPP
• API for multi-core systems• Implementation on Niagara• More benchmarks/applications,
especially telecommunication applications
• Performance evaluation on CMT• A view-based debugger for VOPP
![Page 28: View-Oriented Parallel Programming for multi-core systems](https://reader031.vdocuments.mx/reader031/viewer/2022032106/56812a47550346895d8d8742/html5/thumbnails/28.jpg)
Questions?