intel confidential implementing a task decomposition introduction to parallel programming – part 9

47
INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Upload: ashton-miner

Post on 31-Mar-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

INTEL CONFIDENTIAL

Implementing a Task DecompositionIntroduction to Parallel Programming – Part 9

Page 2: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

2

Review & Objectives

Previously:Described how the OpenMP task pragma is different from

the for pragmaShowed how to code task decomposition solutions for while

loop and recursive tasks, with the OpenMP task construct

At the end of this part you should be able to:Design and implement a task decomposition solution

Page 3: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Case Study: The N Queens Problem

3

Is there a way to placeN queens on an N-by-Nchessboard such thatno queen threatens

another queen?

Page 4: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

A Solution to the 4 Queens Problem

4

Page 5: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Exhaustive Search

5

Page 6: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #1 for Parallel Search

Create threads to explore different parts of the search tree simultaneously

If a node has childrenThe thread creates child nodesThe thread explores one child node itselfThread creates a new thread for every other

child node

6

Page 7: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #1 for Parallel Search

7

Thread W

Thread W NewThread X

NewThread Y

NewThread Z

Page 8: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Pros and Cons of Design #1

ProsSimple design, easy to implementBalances work among threads

ConsToo many threads createdLifetime of threads too shortOverhead costs too high

8

Page 9: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #2 for Parallel Search

One thread created for each subtree rooted at a particular depth

Each thread sequentially explores its subtree

9

Page 10: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #2 in Action

10

Thread1

Thread2

Thread3

Page 11: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Pros and Cons of Design #2

ProsThread creation/termination time minimized

ConsSubtree sizes may vary dramaticallySome threads may finish long before othersImbalanced workloads lower efficiency

11

Page 12: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #3 for Parallel Search

Main thread creates work pool—list of subtrees to explore

Main thread creates finite number of co-worker threads

Each subtree exploration is done by a single threadInactive threads go to pool to get more work

12

Page 13: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Work Pool Analogy

More rows than workersEach worker takes an

unpicked row and picks the crop

After completing a row, the worker takes another unpicked row

Process continues until all rows have been harvested

13

Page 14: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Design #3 in Action

14

Thread1

Thread2

Thread3

Thread3

Thread1

Page 15: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Pros and Cons of Strategy #3

ProsThread creation/termination time minimizedWorkload balance better than strategy #2

ConsThreads need exclusive access to data

structure containing work to be done, a sequential component

Workload balance worse than strategy #1Conclusion

Good compromise between designs 1 and 2

15

Page 16: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Implementing Strategy #3 for N Queens

Work pool consists of N boards representing N possible placements of queen on first row

16

Page 17: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Parallel Program Design

One thread creates list of partially filled-in boardsFork: Create one thread per coreEach thread repeatedly gets board from list, searches

for solutions, and adds to solution count, until no more board on list

Join: Occurs when list is emptyOne thread prints number of solutions found

17

Page 18: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Search Tree Node Structure

/* The ‘board’ struct contains information about a node in the search tree; i.e., partially filled- in board. The work pool is a singly linked list of ‘board’ structs. */

struct board { int pieces; /* # of queens on board*/ int places[MAX_N]; /* Queen’s pos in each row

*/ struct board *next; /* Next search tree node */};

18

Page 19: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Key Code in main Function

struct board *stack;...stack = NULL;for (i = 0; i < n; i++) { initial=(struct board *)malloc(sizeof(struct board)); initial->pieces = 1; initial->places[0] = i; initial->next = stack; stack = initial;}num_solutions = 0;search_for_solutions (n, stack, &num_solutions);printf ("The %d-queens puzzle has %d solutions\n", n,

num_solutions);

19

Page 20: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Insertion of OpenMP Code

struct board *stack;...stack = NULL;for (i = 0; i < n; i++) { initial=(struct board *)malloc(sizeof(struct board)); initial->pieces = 1; initial->places[0] = i; initial->next = stack; stack = initial;}num_solutions = 0;

#pragma omp parallel search_for_solutions (n, stack, &num_solutions);

printf ("The %d-queens puzzle has %d solutions\n", n, num_solutions);

20

Page 21: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Original C Function to Get Work

void search_for_solutions (int n, struct board *stack, int *num_solutions){ struct board *ptr; void search (int, struct board *, int *);

while (stack != NULL) { ptr = stack; stack = stack->next; search (n, ptr, num_solutions); free (ptr); }}

21

Page 22: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

C/OpenMP Function to Get Work

void search_for_solutions (int n, struct board *stack, int *num_solutions){ struct board *ptr; void search (int, struct board *, int *);

while (stack != NULL) {#pragma omp critical{ ptr = stack; stack = stack->next; }

search (n, ptr, num_solutions); free (ptr); }}

22

Page 23: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Original C Search Function

void search (int n, struct board *ptr,int *num_solutions)

{ int i; int no_threats (struct board *);

if (ptr->pieces == n) { (*num_solutions)++; } else { ptr->pieces++; for (i = 0; i < n; i++) { ptr->places[ptr->pieces-1] = i; if (no_threats(ptr))

search (n, ptr, num_solutions); } ptr->pieces--; }}

23

Page 24: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

C/OpenMP Search Function

void search (int n, struct board *ptr,int *num_solutions)

{ int i; int no_threats (struct board *);

if (ptr->pieces == n) { #pragma omp critical (*num_solutions)++; } else { ptr->pieces++; for (i = 0; i < n; i++) { ptr->places[ptr->pieces-1] = i; if (no_threats(ptr))

search (n, ptr, num_solutions); } ptr->pieces--; }}

24

Page 25: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Only One Problem: It Doesn’t Work!

OpenMP program throws an exceptionCulprit: Variable stack

25

Heap

stack

Page 26: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Problem Site

int main (){ struct board *stack; ... #pragma omp parallel search_for_solutions(n, stack, &num_solutions); ...}

void search_for_solutions (int n, struct board *stack, int *num_solutions){ ... while (stack != NULL) ...

26

Page 27: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

1. Both Threads Point to Top

27

stack stack

Thread 1 Thread 2

Page 28: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

2. Thread 1 Grabs First Element

28

stack

Thread 1 Thread 2

stack

ptr

Page 29: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

3. Thread 2 Grabs “Next” Element

29

Thread 1 Thread 2

stack

ptr

stack

ptr

Error #1Thread 2

grabs same element

Page 30: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

4. Thread 1 Deletes Element

30

stack

Thread 1 Thread 2

stack

ptr

?

Error #2Thread 2’s stack pointer dangles

Page 31: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Demonstrate error #2

31

stack

Thread 1 Thread 2

stack

ptr

Thread 1 gets hits critical

region & reads stack

Page 32: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Demonstrate error #2

32

stack

Thread 1 Thread 2

stack

ptr

Thread 1 copies stack to ptr

Page 33: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Demonstrate error #2

33

stack

Thread 1 Thread 2

stack

ptr

Thread 1 advances stack

Page 34: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Demonstrate error #2

34

stack

Thread 1 Thread 2

stack

ptr

Thread 1 exits critical region

Page 35: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Demonstrate error #2

35

stack

Thread 1 Thread 2

stack

ptr

?Thread 1 frees ptr

Thread 2 stack points to

undefined value

Page 36: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

36

Thread 1 Thread 2

stack

Page 37: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

37

Thread 2

stack

Thread 1

stack

ptrstack

ptr

Why would this work?

Page 38: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

38

Thread 2

stack

Thread 1

stack

ptrstack

ptr

Thread 1 enters critical region

Page 39: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

39

Thread 2

stack

Thread 1

stack

ptrstack

ptr

Thread 1 copies stack to ptr

Page 40: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

40

Thread 2

stack

Thread 1

stack

ptrstack

ptr

Thread 1 advances stack

Page 41: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

41

Thread 2

stack

Thread 1

stack

ptrstack

ptr

Thread 1 exits critical region

Page 42: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 1: Make stack Static

42

Thread 2

stack

Thread 1

stack

ptrstack

ptr

Thread 1 frees ptr – no dangling

memory

Page 43: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Remedy 2: Use Indirection (Best choice)

43

Thread 1 Thread 2

&stack

Now data is encapsulated inside function calls and no longer susceptible to

overwriting “global/static” variable

Page 44: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Corrected main Function

struct board *stack;...stack = NULL;for (i = 0; i < n; i++) { initial=(struct board *)malloc(sizeof(struct board)); initial->pieces = 1; initial->places[0] = i; initial->next = stack; stack = initial;}num_solutions = 0;#pragma omp parallel search_for_solutions (n, &stack, &num_solutions);printf ("The %d-queens puzzle has %d solutions\n", n,

num_solutions);

44

Page 45: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

Corrected Stack Access Function

void search_for_solutions (int n, struct board **stack, int *num_solutions){ struct board *ptr; void search (int, struct board *, int *);

while (*stack != NULL) {#pragma omp critical{ ptr = *stack;

*stack = (*stack)->next; } search (n, ptr, num_solutions); free (ptr); }}

45

Page 46: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

Copyright © 2009, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States or other countries. * Other brands and names are the property of their respective owners.

References

Rohit Chandra, Leonardo Dagum, Dave Kohr, Dror Maydan, Jeff McDonald, and Ramesh Menon, Parallel Programming in OpenMP, Morgan Kaufmann (2001).

Barbara Chapman, Gabriele Jost, Ruud van der Pas, Using OpenMP: Portable Shared Memory Parallel Programming, MIT Press (2008).

Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).

46

Page 47: INTEL CONFIDENTIAL Implementing a Task Decomposition Introduction to Parallel Programming – Part 9