recognizing potential parallelism introduction to parallel programming part 1

36
Recognizing Potential Parallelism Introduction to Parallel Programming Part 1

Upload: doris-parsons

Post on 03-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Recognizing Potential Parallelism

Introduction to Parallel Programming

Part 1

This course module is intended for single and academic use only

Single users may utilize these course modules for personal use and individual training.

Individuals or institutions may use these modules in whole or part in an academic environment providing that they are members of the Intel Academic Community http://software.intel.com/en-us/academic and abide by its terms and conditions

 

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.

Intel may make changes to specifications and product descriptions at any time, without notice.

All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests.  Any difference in system hardware or software design or configuration may affect actual performance.

Intel, Intel Inside, and the Intel logo are trademarks of Intel Corporation in the United States and other countries. 

*Other names and brands may be claimed as the property of others.

Copyright © 2008  Intel Corporation.

DISCLAIMER AND LEGAL INFORMATION

What Is Parallel Computing?

Attempt to speed solution of a particular task by

1. Dividing task into sub-tasks

2. Executing sub-tasks simultaneously on multiple processors

Successful attempts require both

1. Understanding of where parallelism can be effective

2. Knowledge of how to design and implement good solutions

Clock Speeds Have Flattened Out

Problems caused by higher speedsExcessive power consumptionHeat dissipationCurrent leakage

Power consumption critical for mobile devices

Mobile computing platforms increasingly importantRetail laptop sales now exceed desktop salesLaptops may be 35% of PC market in 2007

Multi-core Architectures

Potential performance = CPU speed # of CPUs

Strategy:Limit CPU speed and sophisticationPut multiple CPUs (“cores”) on a single chip

Potential performancethe same

4

4

4

2

21

Speed

CP

Us

Concurrency vs. Parallelism

• Concurrency: two or more threads are in progress at the same time:

• Parallelism: two or more threads are executing at the same time

• Multiple cores needed

Thread 1Thread 1

Thread 2Thread 2

Thread 1Thread 1

Thread 2Thread 2

Improving Performance

Use parallelism in order to improve turnaround or throughput

Examples

• Automobile assembly line• Each worker does an assigned function

• Searching for pieces of Skylab• Divide up area to be searched

• US Postal Service• Post office branches, mail sorters, delivery

Turnaround

Complete single task in the smallest amount of time

Example: Setting a dinner table

• One to put down plates

• One to fold and place napkins

• One to place utensils

• One to place glasses

Throughput

Complete more tasks in a fixed amount of time

Example: Setting up banquet tables

• Multiple waiters each do separate tables

• Specialized waiters for plates, glasses, utensils, etc.

Methodology

Study problem, sequential program, or code segment

Look for opportunities for parallelism

Try to keep all processors busy doing useful work

Ways of Exploiting Parallelism

Domain decomposition

Task decomposition

Domain Decomposition

First, decide how data elements should be divided among processors

Second, decide which tasks each processor should be doing

Example: Vector addition

Domain Decomposition

Large data sets whose elements can be computed independently

• Divide data and associated computation among threads

Example: Grading test papers• Multiple graders with same key

What if different keys are needed?

Domain Decomposition

Find the largest element of an array

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Domain Decomposition

Find the largest element of an array

Core 0 Core 1 Core 2 Core 3

Task (Functional) Decomposition

First, divide tasks among processors

Second, decide which data elements are going to be accessed (read and/or written) by which processors

Example: Event-handler for GUI

Task Decomposition

Divide computation based on natural set of independent tasks

• Assign data for each task as needed

Example: Paint-by-Numbers• Painting a single color is a single task

• Number of tasks = number of colors

• Two artists: one does even, other odd

1

1 12

2

3

33

3

3

33

3 3

3

4

4

4

4

5 5 5 5 5

55

5

5

5 5 5 5 5

36

6

79

8

3

8

3

3

88

9

1

107

6

11

Task Decomposition

f()

s()

r()q()h()

g()

Task Decomposition

f()

s()

r()q()h()

g()

Core 0

Core 2

Core 1

Task Decomposition

f()

s()

r()q()h()

g()

Core 0

Core 2

Core 1

Task Decomposition

f()

s()

r()q()h()

g()

Core 0

Core 2

Core 1

Task Decomposition

f()

s()

r()q()h()

g()

Core 0

Core 2

Core 1

Task Decomposition

f()

s()

r()q()h()

g()

Core 0

Core 2

Core 1

Recognizing Sequential Processes

Time is inherently sequential• Dynamics and real-time, event driven applications are often

difficult to parallelize effectively

•Many games fall into this category

Iterative processes• The results of an iteration depend on the preceding iteration

• Audio encoders fall into this category

Pregnancy is inherently sequential• Adding more people will not shorten gestation

Summary

Clock speeds will not increase dramatically

Parallelism takes full advantage of multi-core processors

• Improve application turnaround or throughput

Two methods for implementing parallelism

• Domain Decomposition

• Task Decomposition