recognizing potential parallelism introduction to parallel programming part 1
TRANSCRIPT
This course module is intended for single and academic use only
Single users may utilize these course modules for personal use and individual training.
Individuals or institutions may use these modules in whole or part in an academic environment providing that they are members of the Intel Academic Community http://software.intel.com/en-us/academic and abide by its terms and conditions
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.
Intel may make changes to specifications and product descriptions at any time, without notice.
All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance.
Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2008 Intel Corporation.
DISCLAIMER AND LEGAL INFORMATION
What Is Parallel Computing?
Attempt to speed solution of a particular task by
1. Dividing task into sub-tasks
2. Executing sub-tasks simultaneously on multiple processors
Successful attempts require both
1. Understanding of where parallelism can be effective
2. Knowledge of how to design and implement good solutions
Clock Speeds Have Flattened Out
Problems caused by higher speedsExcessive power consumptionHeat dissipationCurrent leakage
Power consumption critical for mobile devices
Mobile computing platforms increasingly importantRetail laptop sales now exceed desktop salesLaptops may be 35% of PC market in 2007
Multi-core Architectures
Potential performance = CPU speed # of CPUs
Strategy:Limit CPU speed and sophisticationPut multiple CPUs (“cores”) on a single chip
Potential performancethe same
4
4
4
2
21
Speed
CP
Us
Concurrency vs. Parallelism
• Concurrency: two or more threads are in progress at the same time:
• Parallelism: two or more threads are executing at the same time
• Multiple cores needed
Thread 1Thread 1
Thread 2Thread 2
Thread 1Thread 1
Thread 2Thread 2
Improving Performance
Use parallelism in order to improve turnaround or throughput
Examples
• Automobile assembly line• Each worker does an assigned function
• Searching for pieces of Skylab• Divide up area to be searched
• US Postal Service• Post office branches, mail sorters, delivery
Turnaround
Complete single task in the smallest amount of time
Example: Setting a dinner table
• One to put down plates
• One to fold and place napkins
• One to place utensils
• One to place glasses
Throughput
Complete more tasks in a fixed amount of time
Example: Setting up banquet tables
• Multiple waiters each do separate tables
• Specialized waiters for plates, glasses, utensils, etc.
Methodology
Study problem, sequential program, or code segment
Look for opportunities for parallelism
Try to keep all processors busy doing useful work
Domain Decomposition
First, decide how data elements should be divided among processors
Second, decide which tasks each processor should be doing
Example: Vector addition
Domain Decomposition
Large data sets whose elements can be computed independently
• Divide data and associated computation among threads
Example: Grading test papers• Multiple graders with same key
What if different keys are needed?
Task (Functional) Decomposition
First, divide tasks among processors
Second, decide which data elements are going to be accessed (read and/or written) by which processors
Example: Event-handler for GUI
Task Decomposition
Divide computation based on natural set of independent tasks
• Assign data for each task as needed
Example: Paint-by-Numbers• Painting a single color is a single task
• Number of tasks = number of colors
• Two artists: one does even, other odd
1
1 12
2
3
33
3
3
33
3 3
3
4
4
4
4
5 5 5 5 5
55
5
5
5 5 5 5 5
36
6
79
8
3
8
3
3
88
9
1
107
6
11
Recognizing Sequential Processes
Time is inherently sequential• Dynamics and real-time, event driven applications are often
difficult to parallelize effectively
•Many games fall into this category
Iterative processes• The results of an iteration depend on the preceding iteration
• Audio encoders fall into this category
Pregnancy is inherently sequential• Adding more people will not shorten gestation
Summary
Clock speeds will not increase dramatically
Parallelism takes full advantage of multi-core processors
• Improve application turnaround or throughput
Two methods for implementing parallelism
• Domain Decomposition
• Task Decomposition