Claudio F. R. GeyerII - UFRGS
Inês de Castro DutraCOPPE - Sistemas - UFRJ
2
Outline Introduction Sequential Implementation Parallel Implementation Performance Conclusions Future Work
3
Introduction Why logic programming?
Formal basis expression power implicit parallelism suitability to some problems
Main Language: Prolog syntax declarative and operational
semantics
4
parent(arthur,carol).parent(carol,john).
grandparent(X,Y) :- parent(X,Z), parent(Z,Y).
length([H|T],N) :- length(T,N1), N is N1+1.length([],0).
Introduction
5
Sequential Implementation Interpreters x Compilers WAM (WarrenAbstract Machine)
structure copying environments choicepoints heap trail
6
Sequential Implementation
7
Parallel Implementation Control Parallelism
ORP: Or-parallelism ANDP: And-parallelism And + Or
Data Parallelism Unification Path
8
Parallel Implementation: ORP Problems
representation of multiple bindings to the same variable
Solutions stack sharing stack copying
9
Parallel Implementation: ORP
10
Parallel Implementation: ORP Stack sharing
binding arrays hash windows version vectors variable importation …
11
Parallel Implementation: ORP
Speculative work Prolog semantics? Side-effects and pruning Scheduling
12
Parallel Implementation: ANDP
IAP: Independent and-parallelism DAP: Dependent and-parallelism DetAP: Determinate and-parallelism
13
Parallel Implementation: IAP
Goals that do not share variables can proceed in parallel.
Compiler support CGEs: Conditional Graph Expressions
14
Parallel Implementation: IAP
paper(P,A,D,L) :- author(A), date(D), loc(P,A,D,L).
Possible CGE:
indep(A) & indep(D) => author(A) & date(D),loc(P,A,D,L)
15
Parallel Implementation: IAP Cross-product of solutions Recomputation
qsort([], []).qsort([P|T],L) :- partition(T,P,A,B), qsort(A,L1), qsort(B,L2), append(L1,[P|L2],L).
16
Parallel Implementation: DAP Goals that share variables can
proceed in parallel Producer and consumer
Chosen at compile-time or runtime
one value or stream Compiler support
17
Parallel Implementation: DAP
producer(N,Out) :- N > 0, N1 is N - 1, Out = [ferrari|Ms], producer(N1,Ms).producer(0,Out) :- Out = [].
consumer([ferrari|Ms]) :- go-ride-ferrari, consumer(Ms).consumer([]).
18
Parallel Implementation: DetAP
Goals that match at most one clause can be executed first and in parallel
Compiler support Reduction of search space
19
Parallel Platforms Shared-memory Distributed memory Distributed-shared memory
Implicit x Explicit Parallelism Programming Model Process or processor-based
20
Shared-memory Or-Parallel Systems Aurora
WAM-based processor-based shared stacks binding arrays
21
Aurora: Binding Arrays
22
Shared-memory Or-Parallel Systems Scheduling in Aurora
Wavefront Argonne Manchester Bristol Dharma
23
Shared-memory Or-Parallel Systems
Wavefront, Manchester and Argonne: topmost dispatching
Bristol and Dharna: bottom-most dispatching speculative work
24
Shared-memory Or-Parallel Systems Muse
WAM-based processor-based stack copying
25
Muse: Stack Copying
Multiple environments maintained via stack-copying
Memory space divided into identical address spaces to avoid pointer relocation
Incremental copying
26
Shared-memory Or-Parallel Systems Scheduling in Muse
Sophisticated operations to avoid data race
workers keep data structures about idle and busy workers below their subtrees
Shadowing Preference to leftmost work
27
Shared-memory And-Parallel Systems
&-Prolog &ACE DASWAM
28
Shared-memory And-Parallel Systems &-Prolog
RAP-WAM CGEs compiler support
&ACE based on &-Prolog
DASWAM DAP and IAP, producer
determined at runtime
29
Shared-memory And+Or Systems
Andorra-I ACE SBA ParAKL Penny DAOS
30
Andorra-I
Determinate and-parallelism or-parallelism side-effects, cuts and commits teams of workers scheduling reduction of search space
31
Andorra-I
DetAP phase
ORP phase
#det goals = 0
#det goals <> 0
32
Shared-memory And+Or Systems ACE
IAP + ORP Stack copying IAP a la &-Prolog Composition tree Last parallel call optimisation
33
ACE
34
SBA IAP + ORP Stack sharing Shared Binding Arrays IAP a la &ACE Binding array divided into fixed
segment sizes Conditional variable bound to a
pair <seg#,offset>
35
Performance
Andorra-I
36
Performanceprog name Andorra-I JAM Aurora Musenrv400 8.25 8.37 ---- ----bt_cluster 9.37 9.70 ---- ----bt_wms 3.32 ---- ---- ----road_markings 6.24 ---- ---- ----chat_80_db5 7.30 ---- 7.30 5.915x4x3_puzzle 9.66 ---- 9.51 8.69warplan 1.20 ---- 2.63 1.06protein_all 6.81 ---- 9.49 8.64protein_1st 2.78 ---- 4.10 3.12fly_pan 6.88 ---- ---- ----scanner 5.47 ---- ---- ----cipher 5.65 ---- ---- ----
37
Performance
Pgm map 8queen Xword 8queenp zebra flypan
Prolog 5003 383146 6377 133612 19404 10539
Andorra-I 1047 214918 835 8496 5757 1517
Reduction in search space
38
Performance: bt_cluster
39
Performance: chat-80
40
Performance: floorplan design
41
Applications Optimisation Problems Databases Natural Language Processing Data Mining Constraint Satisfaction Problems ….
42
Conclusions Logic programming: high level of
abstraction Favours Implicit Parallelism Several applications Good performance on small to
medium parallel architectures High performance is coming!
43
Future Work More efficient methods to combine
and + or parallelism Scheduling is an important issue Sophisticated compiler support Memory management Parallel constraint logic
programming Efficient cluster implementations Applications
44
Future Work
Ideal System
45
Perspectives