october 26, 2006 parallel image processing programming and architecture ist phd lunch seminar wouter...
TRANSCRIPT
October 26, 2006
Parallel Image ProcessingProgramming and Architecture
IST PhD Lunch Seminar
Wouter Caarls
Quantitative Imaging Group
October 26, 2006 2 of 22
Why Parallel?
• Processing time• Smaller timesteps, more scales, faster
response times• Memory• Larger images, more dimensions
• Energy consumption• More applications, smaller devices
October 26, 2006 3 of 22
Data parallelism
• Many image processing operations have locality of reference (segmentation, filtering, distance transforms, etc.)Data parallelism
October 26, 2006 4 of 22
Task farm parallelism
• An application consists of many different operations• Some of these operations are independent (scale spaces, parameter sweeps, noise realizations, etc.)Task farm parallelism
October 26, 2006 5 of 22
Pipeline parallelism
• An image processing algorithm consists of consecutive stages• If multiple objects are to be processed, they may be in different stages at the same timePipeline parallelism
October 26, 2006 6 of 22
Parallel hardware architecturesFine grained
• Irregular• Superscalar (most modern microprocessors)• VLIW (DSPs)
• Regular• Vector (supercomputers, MMX)• SIMD (graphics processors)
• Custom• FPGA
October 26, 2006 7 of 22
Parallel hardware architecturesCoarse grained
• Homogeneous• Multi-core, SMP• Cluster
• Heterogeneous• Embedded systems• Grid
October 26, 2006 8 of 22
Obstacles
• Programming• Synchronization, bookkeeping• Different systems, languages, optimization
strategies• Choosing an architecture• Analyze program before it is written• Additional requirements or unexpected
performance may require rewrite
October 26, 2006 9 of 22
Architecture-independent parallel programming
• Data parallelism• Differentiate between synchronization pattern
and computation• Library provides pattern, user provides
computation• Task farm & pipeline parallelism• Operations do not work on images, but on
streams• Sequences of operation calls do not imply an
order, but a stream graph.
October 26, 2006 11 of 22
Example skeletons
• Pixel• Neighbourhood• Recursive neighbourhood
• Stack• Filter• Associative reduction
October 26, 2006 12 of 22
Constructing stream graphs
• By program (dynamic)
capture(orig);normalize(orig, norm);dx(orig, x_der, 1.0);dy(orig, y_der, 1.0);direction(x_der, y_der, dir);display(dir);
• Visually (static)
normalize
dx dy
direction
display
capture
October 26, 2006 14 of 22
Dealing with heterogeneous tasksProcessor
1Processor
2
1
1
2
1 3
2
4 6
1
1
2
1 3
2
5 5
October 26, 2006 15 of 22
Dealing with interconnect
Processor 1
Processor 2
Interconnect
1
1
2
1 3
2
4
4
5 58
1
1
2
1 3
2
4
3 4 7
October 26, 2006 16 of 22
Dealing with dependencies
Processor 1
Processor 2
Interconnect
1
1
2
1 3
2
4
3 (3)+4 (3)+7(3)+3
1
1
2
1
32
3+4 (3)+4
4
October 26, 2006 17 of 22
Choosing an architecture automatically
• Architecture-independent program allows automatic analyis after it is written, but before an architecture is chosen
• Based on certain constraints, architecture can be chosen automatically to optimize some cost function.
• Tradeoff between cost, power and performance must be made by the designer
October 26, 2006 19 of 22
Search strategyConstrained single objective
minimumperformance
perf
orm
ance
cost