basic optimization methods · integrated hw/sw-systems andreas mitschele-thiel 22-may-05 24...

49
TECHNISCHE UNIVERSITÄT ILMENAU Integrated Hard- and Software Systems http://www-ihs.theoinf.tu-ilmenau.de Basic Optimization Methods Problem Statement Heuristic Search General Framework Hill-Climbing Random Search Simulated Annealing Genetic Algorithms Tabu Search Single Pass Approaches Framework List scheduling Clustering Branch-and-Bound

Upload: others

Post on 16-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

TECHNISCHE UNIVERSITÄTILMENAU

Inte

grat

edH

ard-

and

Softw

are

Syst

ems

http

://w

ww

-ihs.

theo

inf.t

u-ilm

enau

.de

Basic Optimization Methods

Problem Statement

Heuristic SearchGeneral FrameworkHill-ClimbingRandom SearchSimulated AnnealingGenetic AlgorithmsTabu Search

Single Pass ApproachesFrameworkList schedulingClustering

Branch-and-Bound

Page 2: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 2Andreas Mitschele-Thiel 22-May-05

Problem Statement

Design and Implementation Issues:Mapping of modules, functions, operations, etc. on hardware entitiesHW/SW partitioningScheduling of the execution of operations

Example: HW/SW Partitioning

HW

SW

Page 3: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 3Andreas Mitschele-Thiel 22-May-05

Enumeration and Branch and Bound

Complete enumeration checks all possible solutions for its quality

This is a brut force approach!

Branch-and-Boundsubsequent (stepwise) construction of solutions put partial solutions on hold (bound branches) that do not seem interestingat the momentthese partial solutions may be revisited (expanded) later on

Branch-and-Bound-with-Underestimatesuse best-case estimates (underestimates) to bound (and exclude) solutionsduring the search-> see later for details

Page 4: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 4Andreas Mitschele-Thiel 22-May-05

A Simple Example ...

... that shows the exponential nature of the problem

A simple sequencing problem:Find the best order to traverse a given set of n cities according to some pathoptimization criteria (or find the best order to execute a set of tasks on a computer):

3 628 80010

n!n

63

22

11

number of possible sequencesnumber of cities

Most problems we deal with are NP hard!In practice, these means that the time to compute the best solution increasesexponentially with the size of the problem.

Page 5: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 5Andreas Mitschele-Thiel 22-May-05

The Complexity Problem

Size n (linear scale)

1.3 x 1013

centuries2 x 108

centuries3855

centuries6.5 years58 min59 ms3n

366 centuries

35.7 years

12.7 days17.9 min1 s1 ms2n

13 min5.2 min1.7 min24.3 s3.2 s100 msn5

216 ms125 ms64 ms27 ms8 ms1 msn3

3.6 ms2.5 ms1.6 ms0.9 ms0.4 ms0.1 msn2

0.06 ms0.05 ms0.04 ms0.03 ms0.02 ms0.01 msn

605040302010

Timecomplexityfunction

Complete enumeration of all alternative solutions is out of question even withthe most modern computers

Heuristics are needed that come up with good solutions that are not neccessarily optimal

Page 6: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 6Andreas Mitschele-Thiel 22-May-05

Heuristic Search

Most heuristics are based on an iterative search comprising the followingelements:

selection of an initial (intermediate) solution (e.g. a sequence)evaluation of the quality of the intermediate solutioncheck of termination criteria

select initial solution

select next solution(based on previous solution)

evaluate quality

acceptance criteria satisfied

termination criteria satisfied

y

search strategy

accept solution as„best solution so far“

y

n

Page 7: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 7Andreas Mitschele-Thiel 22-May-05

Hill-Climbing

Idea:search neighborhood for improvementsselect best neighbor and continue

select initial solution

select all neighbors(based on previous solution)

neighbor with better quality exists

evaluate qualityof neighbors

y

n

accept best neighbor asintermediate solution

y

Page 8: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 8Andreas Mitschele-Thiel 22-May-05

Hill-Climbing – Application to HW/SW Partitioning

1. Initial Solution 2. Candidates for Climb 3. Select best improvement

SW HW

Legend:indicates some module, function, operation etc.indicates some kind of communication relation between tasks

HW indicates programmable hardware units (FPGA, etc.)SW indicates programmable processor running softwareneighborhood definition: consider all tasks for a move between SW and HW that haveneighbors which are implemented on another technology (i.e. are connected by an arc)

HWSW HWSW

Page 9: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 9Andreas Mitschele-Thiel 22-May-05

Hill-Climbing – Discussion

simplelocal optimizations only: algorithm is not able to pass a valley to finally reacha higher peakidea is only applicable to small parts of optimization algorithms but needs to be complemented with other strategies to overcome local optimas

Page 10: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 10Andreas Mitschele-Thiel 22-May-05

Random Search

also called Monte Carlo algorithm

Idea:random selection of the candidates for a change of intermediate solutions orrandom selection of the solutions (no use of neighborhood)

Discussion:simple (no neighborhood relation is needed)not time efficient, especially where the time to evaluate solutions is highsometimes used as a reference algorithm to evaluate and compare the quality of heuristic optimization algorithmsidea of randomization is applied to other techniques, e.g. genetic algorithms and simulated annealing

Page 11: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 11Andreas Mitschele-Thiel 22-May-05

Simulated Annealing

Idea:simulate the annealing process of material: the slow cooling of material leads to a state with minimal energy, i.e. the global optimum

Classification:Search strategy

random local search

Acceptance criteriaunconditional acceptance of the selected solution if it represents an improvement over previous solutions;otherwise probabilistic acceptance

Termination criteriastatic bound on the number of iterations (cooling process)

Page 12: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 12Andreas Mitschele-Thiel 22-May-05

select random solutionfrom neighborhood

evaluate quality

new solution is bettery

n

n

y

decrease acceptance prob.for poor solutions

probabilistic choice based onquality of solution

maximum numberof iterations exceeded

y

Sim

ulat

edAn

neal

ing

–Al

gorit

hm

select initial solution

accept solution

Page 13: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 13Andreas Mitschele-Thiel 22-May-05

Simulated Annealing – Discussion and Variants

Discussion:parameter settings for cooling process is essential (but complicated)

slow decrease results in long run timesfast decrease results in poor solutionsdiscussion whether temperature decrease should be linear or logarithmic

straightforward to implement

Variants:deterministic acceptancenonlinear cooling (slow cooling in the middle of the process)adaptive cooling based on accepted solutions at a temperaturereheating

Page 14: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 14Andreas Mitschele-Thiel 22-May-05

Genetic Algorithms

Idea:application of evolution theory (survival of the fittest):individuals well adapted to the environment will have more descendants and better adapted descendantsapplication of two basic operations

crossovermutation

to derive new solutions

Classification:Search strategy

probabilistic selection of solutions from the populationhigher quality solutions are selected with higher probability

Acceptance criterianew solutions replace older ones

Termination criteriastatic bound on the number of iterationsdynamic, e.g. based on improvements of quality of solutions

Page 15: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 15Andreas Mitschele-Thiel 22-May-05

Genetic Algorithms – Basic Operations

crossover

1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 1 0 0 1

1 1 0 0 0 0 1 0 0 1

mutation

1 1 0 0 0 1 1 0 0 1

Page 16: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 16Andreas Mitschele-Thiel 22-May-05

Genetic Algorithms – Basic Algorithm

crossover

replacement

mutationselection

population

Replacement and selection rely on some cost function defining the quality of each solution

Crossover selection is typically random

General parameters:size of populationmutation probabilitycandidate selection strategy (mapping quality on probability)replacement strategy (replace own parents, replace weakest, influence of probability)

Application-specific parameters:mapping of problem on appropriate coding handling of invalid solutions in codings

Page 17: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 17Andreas Mitschele-Thiel 22-May-05

Genetic Algorithms – Application to HW/SW Partitioning

Problem statement: target system: one HW unit, one programmable (SW) unit8 tasks to be assigned to HW or SW

no constraints for task assignment (precedence constraints, etc.)cost function:

cost table with different (normalized) cost for SW and HW implementation

goal: find the HW/SW partition that minimizes Σ (cost + time) over all tasks

Coding:1 represents assignment to HW0 represents assignment to SWaltogether 28 = 256 possible solutions

Algorithm details:10 solutions in populationrandom selection of crossover pointmutation probability of 0.1...

104138

2014157

3023206

303255

1032204

81133

202182

1011101

SWHWSWHW

timecosttask

Page 18: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 18Andreas Mitschele-Thiel 22-May-05

Genetische Algorithmen – Minimum Spanning Tree

small population results in inbreeding larger population works well with small mutation rate tradeoff between size of population and number of iterations

Page 19: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 19Andreas Mitschele-Thiel 22-May-05

Genetic Algorithms – Discussion

finding an appropriate coding for the binary vectors for the specific application at hand is not intuitive problems are

redundant codings, codings that do not represent a valid solution, e.g. coding for a sequencing problem

tuning of genetic algorithms may be time consumingparameter settings highly depend on problem specificssuited for parallelization

Page 20: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 20Andreas Mitschele-Thiel 22-May-05

Tabu Search

Idea:extension of hill-climbing to avoid being trapped in local optimaallow intermediate solutions with lower qualitymaintain history to avoid running in cycles

Classification:Search strategy

deterministic local search

Acceptance criteriaacceptance of best solution in neighborhood which is not tabu

Termination criteriastatic bound on number of iterations ordynamic, e.g. based on quality improvements of solutions

Page 21: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 21Andreas Mitschele-Thiel 22-May-05

Tabu Search – Algorithm

select initial solution

select neighborhood set(based on current solution)

The brain of the algorithm is thetabu list that stores and maintainsinformation about the history of thesearch.In the most simple case a number of previous solutions are stored in thetabu list.More advanced techniques maintainattributes of the solutions rather thanthe solutions itself

remove tabu solutionsfrom set increase neigborhood

set is emptyy

nevaluate quality and

select best solution from set

update tabu list

ntermination criteria satisfied

y

Page 22: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 22Andreas Mitschele-Thiel 22-May-05

Tabu Search – Organisation of the History

The history is maintained by the tabu listAttributes of solutions are a very flexible mean to control the search

Example of attributes of a HW/SW partitioning problem with 8 tasks assigned to 1 of 4 different HW entities:(A1) change of the value of a task assignment variable(A2) move to HW(A3) move to SW(A4) combined change of some attributes(A5) improvement of the quality of two subsequent solutions over or below

a threshold value

Aspiration criteria: Under certain conditions tabus may be ignored, e.g. ifa tabu solution is the best solution found so farall solutions in a neighborhood are tabua tabu solution is better than the solution that triggered the respective tabu conditions

Intensification checks whether good solutions share some common propertiesDiversification searches for solution that do not share common propertiesUpdate of history information may be recency-based or frequency-based (i.e. depending on the frequency that the attribute has been activated)

Page 23: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 23Andreas Mitschele-Thiel 22-May-05

Tabu Search – Discussion

easy to implement (at least the neighborhood search as such)non-trival tuning of parameterstuning is crucial to avoid cyclic searchadvantage of usage of knowledge, i.e. feedback from the search to controlthe search (e.g. for the controlled removal of bottlenecks)

Page 24: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 24Andreas Mitschele-Thiel 22-May-05

Heuristic Search Methods – Classification

Search strategysearch area

global search (potentially all solutions considered)local search (direct neighbors only – stepwise optimization)

selection strategydeterministic selection, i.e. according to some deterministic rulesrandom selection from the set of possible solutionsprobabilistic selection, i.e. based on some probabilistic function

history dependence, i.e. the degree to which the selection of the newcandidate solution depends on the history of the search

no dependenceone-step dependencemulti-step dependence

Acceptance criteriadeterministic acceptance, i.e. based on some deterministic functionprobabilistic acceptance, i.e. influenced by some random factor

Termination criteriastatic, i.e. independent of the actual solutions visited during the searchdynamic, i.e. dependent on the search history

Page 25: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 25Andreas Mitschele-Thiel 22-May-05

Heuristic Search Methods – Classification

xxxxxrandomsearch

xxxxxxxgeneticalgorithms

xxxxxsimulatedannealing

xxxxxxtabu search

xxxxxhill-climbing

dyn.stat.prob.det.multi-step

one-step

nonerandomprob.det.globallocal

History dependenceSelection strategySearch area

Terminationcriterion

Acceptancecriterion

Search strategyHeuristic

Page 26: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 26Andreas Mitschele-Thiel 22-May-05

Single Pass Approaches

The techniques covered so far search through a high number of solutions.

Idea underlying single pass approaches:intelligent construction of a single solution (instead of updating and modification of a number of solutions) the solution is constructed by subsequently solving a number of subproblems

Discussion:single-pass algorithms are very quickquality of solutions is often smallnot applicable where lots of constraints are present (which require some kindof backtracking)

Important applications of the idea:list scheduling: subsequent selection of a task to be scheduled until thecomplete schedule has been computedclustering: subsequent merger of nodes/modules until a small number of cluster remains such that each cluster can be assigned a single HW unit

Page 27: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 27Andreas Mitschele-Thiel 22-May-05

Single Pass Approaches – Framework

The guidelines are crucialand represent theintelligence of the algorithm

derive guidelines forsolution construction

select subproblem

decide subproblembased on guidelines

possibly recompute oradapt guidelines

nfinal solution constructed

y

Page 28: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 28Andreas Mitschele-Thiel 22-May-05

List Scheduling

List scheduling: subsequent selection of a task to be scheduled on some processor (or HW entity) operation is similar to a dynamic task scheduler of an operating system

assign priorities to the tasksaccording to some strategy

priorisationstrategy

select executable taskwith highest priority

assign task to a processoraccording to some strategy

assignmentstrategy

nschedule complete?

y

Page 29: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 29Andreas Mitschele-Thiel 22-May-05

List Scheduling – Example (1)

Problem:2 processors6 tasks with precedence constraintsfind schedule with minimal execution time

HLFET (highest level first with estimated times)length of the longest (critical) path to the sink node (node 6)

Assignment strategyfirst fit

Resulting schedule:

1

2

3

4 5

6

2

43

1

4

1

/8

/5/4

/1

/5

/6

Legend:green: estimated timesred: levels (priorities)

P1P2

1 25

34

6

0 2 4 6 8 10

Page 30: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 30Andreas Mitschele-Thiel 22-May-05

List Scheduling – Example (2)

Problem (unchanged):2 processors6 tasks with precedence constraintsfind schedule with minimal execution time

SCFET (smallest co-level first with estimated times)length of the longest (critical) path to thesource node (node 1)

Assignment strategyfirst fit

Resulting schedule:

1

2

3

4 5

6

2

43

1

4

1

/2

/6/5

/8

/7

/3

Legend:green: estimated timesblue: co-levels (priorities)

P1P2

1 2 534

6

0 2 4 6 8 10

Page 31: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 31Andreas Mitschele-Thiel 22-May-05

Clustering

probabilistic deterministic

Each node belongs with certain probabilities to different clusters

A node belongs to exactly one cluster or not

partitioninghierarchical

Starts with given number of K clusters

Starts with a distance matrix of each pair of nodes

Exact method: always the same result

Results depend on the chosen initial set of clusters

Termination after all nodes belong to one cluster

Termination after a given number of iterations

Page 32: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 32Andreas Mitschele-Thiel 22-May-05

Clustering

Partitioning of a set of nodes in a given number of subsets

compute the „distance“between any pair of clusters

select the pair of clusters with the highest affinity

merge the clusters

termination criteria holds

y

Application:processor assignment (loadbalancing – minimize interprocesscommunication)scheduling (minimize criticalpath)HW/SW partitioning

Clustering may be employed as part of the optimization process, i.e. combined with other techniques

assign each node to adifferent cluster

n

Page 33: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 33Andreas Mitschele-Thiel 22-May-05

Hierarchical Clustering

Replace the selected pair in distance matrix by a cluster representative

Recompute distance matrix

All nodes in one cluster

Determine the distance between each pair of

nodes

Dendrogramm

Select the smallest distance

n

y

Page 34: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 34Andreas Mitschele-Thiel 22-May-05

Partitioning Clustering (k-means)

Number of iterations reached

y

assign each node to the nearest cluster representative

Recompute positions of the cluster representative

Based on the positions of the nodes in each cluster

Choose positions of k initial cluster representative

n

Page 35: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 35Andreas Mitschele-Thiel 22-May-05

Clustering – Application to Load Balancing

Optimization goal:minimize inter-process (inter-cluster) communicationlimit maximum load per processor(cluster) to 20

compute the sum of the communication cost

between any pair of clusters

select the pair of clusters with the highest comm. cost that does not violatethe capacity constraints

merge the clusters

reduction of comm. cost without violation of constraints

possiblen

assign each node to adifferent cluster

y

Page 36: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 36Andreas Mitschele-Thiel 22-May-05

Clustering – Application to Load Balancing

7 5

76

92

10

18

4

3

54

76

921

8

4

3

54

12

6

21

4

3

54

16

12

6

21

4

3

54

16

12

21

3

8

16

18

9

16

20

76

921

8

4

3

54

12

21

3

8

16

18

Page 37: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 37Andreas Mitschele-Thiel 22-May-05

Clustering – Variants

WardsTree Structural Vector QuantificationMacnaughton-Smith algorithm

Single linkageComplete linkageAverage groupCentroidMSTROCK

Division(top down)

Agglomeration(bottom up)

k-meansFuzzy-c-meansSOMCliqueOne PassGustafson-Kessel algorithm

Hierarchical methodsPartitioning methods

Clustering methods

Distance MetricsEuclideanManhattanMinkowskyMahalanobisJaccard

CamberraChebychevCorrelationChi-squareKendalls‘s Rank Correlation

Page 38: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 38Andreas Mitschele-Thiel 22-May-05

Clustering – Hierarchical Algorithms

Complete LinkageSingle linkage

Centroid-based

Page 39: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 39Andreas Mitschele-Thiel 22-May-05

Clustering – Single Linkage

0123456789

10

0 2 4 6 8 10

P1

P2

P3P4

P5

P6

P7

the smallest distance between entities Example:

[ ] 1.4,min 4545255)4,2( === dddd

Distance between groups is estimated as

0------P7

60-----P6

1.45.10----P5

5.42.24.10---P4

4.34.32.82.20--P3

6.735.41.430-P2

79.26.17.157.20P1

P7P6P5P4P3P2P1Cluster #

Page 40: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 40Andreas Mitschele-Thiel 22-May-05

Clustering – Single Linkage

0123456789

10

0 2 4 6 8 10

P1

P2

P3P4

P5

P6

P7

0------P7

60-----P6

1.45.10----P5

5.42.24.10---P4

4.34.32.82.20--P3

6.735.41.430-P2

79.26.17.157.20P1

P7P6P5P4P3P2P1Cluster #

0----P6

5.10---C57

4.32.80--P3

2.24.12.20-C24

9.26.157.10P1

P6C57P3C24P1Cluster #

0---P6

5.10--C57

2.22.80-C243

9.26.150P1

P6C57C243P1Cluster #

Page 41: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 41Andreas Mitschele-Thiel 22-May-05

Clustering – Group Average

0123456789

10

0 2 4 6 8 10

P1

P2

P3P4

P5

P6

P7

0------P7

60-----P6

1.45.10----P5

5.42.24.10---P4

4.34.32.82.20--P3

6.735.41.430-P2

79.26.17.157.20P1

P7P6P5P4P3P2P1Cluster #

Distance between groups is definedas the average distance between all pairs of entities

Example:

( ) 8.421

45255)4,2( =+= ddd

Page 42: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 42Andreas Mitschele-Thiel 22-May-05

Clustering – Group Average

0123456789

10

0 2 4 6 8 10

P1

P2

P3P4

P5

P6

P7

0------P7

60-----P6

1.45.10----P5

5.42.24.10---P4

4.34.32.82.20--P3

6.735.41.430-P2

79.26.17.157.20P1

P7P6P5P4P3P2P1Cluster #

0----P6

5.60---C57

4.33.60--P3

2.64.52.60-C24

9.26.657.20P1

P6C57P3C24P1Cluster #

0---P6

5.10--C57

2.54.80-C243

9.26.16.40P1

P6C57C243P1Cluster #

Page 43: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 43Andreas Mitschele-Thiel 22-May-05

Clustering – Centroid-based

0123456789

10

0 2 4 6 8 10

P1

P2

P3P4

P5

P6

P7xx

x

x

Determine distances between centroids (k,l)Merge centroids with the least distance

( ) ( )( )22),(lklk yyxx CCCClkd −+−=

0------P7

60-----P6

1.45.10----P5

5.42.24.10---P4

4.34.32.82.20--P3

6.735.41.430-P2

79.26.17.157.20P1

P7P6P5P4P3P2P1Cluster #

Page 44: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 44Andreas Mitschele-Thiel 22-May-05

Clustering – Centroid-based

0123456789

10

0 2 4 6 8 10

P1

P2

P3P4

P5

P6

P7xx

x

x0----C6

5.50---C57

4.33.50--C3

2.55.42.50-C24

9.26.557.10C1

C6C57C3C24C1Cluster #

0------P7

60-----P6

1.45.10----P5

5.42.24.10---P4

4.34.32.82.20--P3

6.735.41.430-P2

79.26.17.157.20P1

P7P6P5P4P3P2P1Cluster #

Page 45: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 45Andreas Mitschele-Thiel 22-May-05

Differences between Clustering Algorithms

-3 -2 -1 0 1 2 3

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 104

X (m)

Y (m

)

Single Linkage

-3 -2 -1 0 1 2 3

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 104

X (m)

Y (m

)

Complete Linkage

-3 -2 -1 0 1 2 3

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 104

X (m)

Y (m

)

Centroid Linkage

-3 -2 -1 0 1 2 3

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 104

X (m)

Y (m

)

K-means

-3 -2 -1 0 1 2 3

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 104

X (m)

Y (m

)

Ward

-3 -2 -1 0 1 2 3

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 104

X (m)

Y (m

)

Single Linkage

-3 -2 -1 0 1 2 3

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 104

X (m)

Y (m

)

Complete Linkage

-3 -2 -1 0 1 2 3

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 104

X (m)

Y (m

)

Centroid Linkage

-3 -2 -1 0 1 2 3

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 104

X (m)

Y (m

)

K-means

-3 -2 -1 0 1 2 3

x 104

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5x 104

X (m)

Y (m

)

Ward

Page 46: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 46Andreas Mitschele-Thiel 22-May-05

Clustering – Discussion

ResultsExact results (single linkage)Not-exact results often several iterations are necessary (K-means)

MetricsStrong impact to clustering resultsNot each metric is suitable for each clustering algorithmDecision for one- or multi-criteria metrics (separated or joint clustering)

Selection of AlgorithmDepends strongly on the structure of the data set and the expected results

Some algorithms tend to separate outlayers in own clusters some large clusters and a lot of very small clusters (complete linkage)Only few algorithms are able to detect also branched, curved or cyclic clusters (single linkage) Some algorithms tend to return clusters with nearly equal size (K-means, Ward)

Quality of clustering resultsthe mean variance of the elements in each cluster (affinity parameter) is often usedIn general the homogeneity within clusters and the heterogeneity between clusters can be measuredHowever, the quality prediction can be only as good as the quality of the used metric!

Page 47: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 47Andreas Mitschele-Thiel 22-May-05

Branch and Bound with Underestimates

Application of the A* algorithm to the scheduling problem

Example: scheduling on a 2-processor system (processor A and B)

Process graph Search Tree

Legend:green: processing timesblue: comm. times

2 -> Af(3)=22

3 2 -> Bf(4)=18

4 3 -> Bf(6)=22

63 -> Af(5)=18

5

2 -> Af(7)=25

7 2 -> Bf(8)=18

8

4 -> Af(9)=24

9 4 -> Bf(10)=18

10

1 -> Af(1)=17

1 2 1 -> Bf(2)=17

1

8

6 1

2 55

2 3 9

4 3f(x) = g(x) + h(x)g(x) exact value of partial

scheduleh(x) underestimate for

remainder

Search is terminated when min {f(x)} is a terminal node (in the search tree)

Page 48: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 48Andreas Mitschele-Thiel 22-May-05

Branch and Bound with Underestimates

Example: computation of f(3)

1 -> Af(1)=17

11

2 3

4

5

8

3

2 5

6 1

2 1 -> Bf(2)=17

9 3 2 -> Af(3)=22

44

case 1: path 1-2-4

g(x) = 5 + 8 = 13

h(x) = 3

f(x) = 16

AB

1 2

0 4 8 12 16 20 24

3->4

1->33

3 44A

B1 2

0 4 8 12 16 20 24

case 2: path 1-3-4

g(x) = 5

h(x) = 5 + 9 + 3

f(x) = 22

Page 49: Basic Optimization Methods · Integrated HW/SW-Systems Andreas Mitschele-Thiel 22-May-05 24 Heuristic Search Methods – Classification Search strategy search area global search (potentially

Integrated HW/SW-Systems 49Andreas Mitschele-Thiel 22-May-05

References

A. Mitschele-Thiel: Systems Engineering with SDL – Developing Performance-Critical Communication Systems. Wiley, 2001. (section 2.5)C.R. Reeves (ed.): Modern Heuristic Techniques for Combinatorial Problems. Blackwell Scientific Publications, 1993.H.U. Heiss: Prozessorzuteilung in Parallelrechnern. BI-Wissenschaftsverlag, Reihe Informatik, Band 98, 1994.M. Garey, D. Johnson: Computer and Intractability. W.H. Freeman, New York, 1979.