Download - Parallel Graph Colouring Shared Memory
-
8/8/2019 Parallel Graph Colouring Shared Memory
1/61
Parallel Graph Colouring Algorithmsfor Shared-Memory Machines
Ismet Isnaini, B.Eng.
June 2002
Department of Computer Science
The University Of Adelaide,
South Australia
Supervisor: Dr Paul Coddington
Submitted in partial fulfillment of the requirement for the Master Degree
in Computer Science
-
8/8/2019 Parallel Graph Colouring Shared Memory
2/61
Abstract
Graph colouring is very useful in many different kind of applications. The Graph
Colouring Problem (GCP) itself which is known as an NP-hard problem is usu-
ally part of another large computation problem, therefore a good solution to the GCP is
required. Much researches have found solutions in the form of sequential algorithms,
which is very useful for small scale graphs. In the case of large graphs, these sequential
algorihms might cause a bottle neck in the overall computation, particularly if the rest of
the computation is done in parallel. Hence, a parallel heuristic is required to enhance the
computation timing to the GCP problem.
The lack of research on parallel heuritics of GCP has motivated us to seek a good
solution for the problem. This project is aimed at implementing and comparing a variety
of those sequential as well as parallel algorithm(s). Moreover, most of existing parallel
algorithms have been implemented on distributed memory machines and typically give
little or no speed-up. Therefore, the algorithms developed here is written in Java Thread
and run on shared memory machine to achieve a good speed-up. A comparison of per-
formance for different algorithms in different types and size of graphs is conducted to
observe which algorithm is best for particular types of graphs.
-
8/8/2019 Parallel Graph Colouring Shared Memory
3/61
.
Alhamdulillaahi Rabbil Alamiin
praise is only for Allah who is the Lord of all the Universes
i
-
8/8/2019 Parallel Graph Colouring Shared Memory
4/61
Acknowledgements
I would like to thank my supervisors, Paul Coddington has been patience and gives
me a lot of encouragement and guidance throughout the project
My gratitude and sympathy go to my family overseas and friends here who always
wish me the best of my study
My special thanks to my wife for her understanding and support, and my 2 little
daughters . . . seeing them makes me forget the due date of this Thesis . . .
ii
-
8/8/2019 Parallel Graph Colouring Shared Memory
5/61
Contents
1 Introduction 1
1.1 Graph Colouring Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Sequential Graph Colouring 5
2.1 Common Graph Colouring Algorithms . . . . . . . . . . . . . . . . . . . 52.2 FirstFit (FF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 LargestDegreeFirst Algorithm (LDF) . . . . . . . . . . . . . . . . . . 6
2.4 SmallestDegreeLast (SDL) . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 IncidenceDegreeOrdering (IDO) . . . . . . . . . . . . . . . . . . . . . 6
2.6 SaturationDegreeOrdering (SDO) . . . . . . . . . . . . . . . . . . . . 10
3 Parallel Graph Colouring 11
3.1 Parallel Graph Colouring Algorithm . . . . . . . . . . . . . . . . . . . . 12
3.2 Synchronisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Independent Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 JonesPlassmann (JP) . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2 LargestDegreeFirst Algorithm (LDF) . . . . . . . . . . . . . . 15
3.4 Non-independent Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4.1 First Fit (FF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4.2 GebremedhinManne (GEBMAN) . . . . . . . . . . . . . . . . . 20
3.4.3 SmallestDegreeLast (SDL) . . . . . . . . . . . . . . . . . . . 21
3.4.4 IncidenceDegreeOrdering (IDO) . . . . . . . . . . . . . . . . 23
3.4.5 SaturationDegreeOrdering (SDO) . . . . . . . . . . . . . . . . 23
3.5 Balanced Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 Implementation 26
4.1 Previous Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
iii
-
8/8/2019 Parallel Graph Colouring Shared Memory
6/61
4.2 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.1 Java Thread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.3 Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Sequential version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4 Parallel version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.4.1 Independent Set Vertices . . . . . . . . . . . . . . . . . . . . . . 30
4.4.2 non-Independent Set Vertices . . . . . . . . . . . . . . . . . . . 30
4.5 Balanced Colouring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5 Performance measurement and Analysis 33
5.1 Experiment conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.1 Different types of graphs . . . . . . . . . . . . . . . . . . . . . . 34
5.2.2 Graphs with same number of vertices and different number of edges 39
5.2.3 Different number of processors . . . . . . . . . . . . . . . . . . . 45
5.3 Balanced Colouring Graph . . . . . . . . . . . . . . . . . . . . . . . . . 45
6 Conclusions and Future Work 50
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
iv
-
8/8/2019 Parallel Graph Colouring Shared Memory
7/61
List of Tables
5.1 Testing Graphs 1 : Random Graph . . . . . . . . . . . . . . . . . . . . . 34
5.2 Testing Graphs 2 : Sparse Matrix . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Speed up of all algorithms . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.4 Time taken for each algorithms (in second) . . . . . . . . . . . . . . . . 35
5.5 Number of colour used in the algorithms using 4 processors . . . . . . . . 35
5.6 Speed up of each algorithms on Random Graphs . . . . . . . . . . . . . . 39
5.7 Time taken (in second) of each algorithms for Random Graphs . . . . . . 39
5.8 Number of colours used in each algorithm for Random Graphs . . . . . . 43
5.9 Computation time for each algorithm for Graphs of same nodes and dif-
ferent edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.10 Number of colours in each algorithm for Graphs of same nodes and dif-
ferent edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.11 Computation time for each algorithm in different machines (TITAN) . . . 45
5.12 Distribution of Colour before balancing for 4 processors using FF Algo-
rithm in 4elt problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.13 Distribution of Colour after balancing for 4 processors using FF Algo-
rithm in 4elt problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.14 Distribution of Colour before balancing for 4 processors using FF Algo-
rithm in 4elt2 problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.15 Distribution of Colour after balancing for 4 processors using FF Algo-
rithm in 4elt2 problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
v
-
8/8/2019 Parallel Graph Colouring Shared Memory
8/61
List of Figures
1.1 Principal of Graph Colouring . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 First Fit (FF) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 6
2.3 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 7
2.4 Smallest Degree Last (SDL) Algorithm . . . . . . . . . . . . . . . . . . 8
2.5 Smallest Degree Last (SDL) Algorithm . . . . . . . . . . . . . . . . . . 9
2.6 Incidence Degree Ordering (IDO) Algorithm . . . . . . . . . . . . . . . 10
3.1 Incorrect Graph Colouring . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 JonesPlassmann (JP) Algorithm . . . . . . . . . . . . . . . . . . . . . 16
3.3 JonesPlassmann (JP) Algorithm . . . . . . . . . . . . . . . . . . . . . 17
3.4 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 18
3.5 Largest Degree First (LDF) Algorithm . . . . . . . . . . . . . . . . . . . 19
3.6 First Fit (FF) Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.7 GebremedhinManne (GEBMAN) Algorithm . . . . . . . . . . . . . . . 21
3.8 Smallest Degree Last (SDL) Algorithm . . . . . . . . . . . . . . . . . . 22
3.9 Incidence Degree Ordering (IDO) Algorithm . . . . . . . . . . . . . . . 23
3.10 Balanced Coloured Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Colour Balancing method . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.1 Computation time for 3elt problem . . . . . . . . . . . . . . . . . . . . . 36
5.2 Computation time for 4elt2 problem . . . . . . . . . . . . . . . . . . . . 37
5.3 Speed up for 3 elt problem . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4 Speed up for Random Graph (250 Nodes) . . . . . . . . . . . . . . . . . 405.5 Computation time for Random Graph (500 Nodes) . . . . . . . . . . . . 41
5.6 Computation Time for Random Graph (250 Nodes) . . . . . . . . . . . . 42
5.7 Computation time for Graph of different number of edges . . . . . . . . . 44
5.8 Computation time for 3elt Graph in Titan . . . . . . . . . . . . . . . . . 46
5.9 Speedup for 3elt Graph in Titan . . . . . . . . . . . . . . . . . . . . . . 47
vi
-
8/8/2019 Parallel Graph Colouring Shared Memory
9/61
Chapter 1
Introduction
Graph Colouring is the process of assigning labels (called colours) to a vertex in an
arbitrary graph, such that the neighbouring vertices (i.e. those connected by an edge of
a graph) will not have the same colour [8]. In other words, we will avoid having twovertices of the same colour connected by an edge which usually signifies a relationship
between the vertices. Therefore, the vertices are in some sense independent, which makes
it easier to manipulate the vertices, for example to update them independently in parallel.
Figure 1.1 shows that every vertex in the graph does not have the same colour as its
neighbour vertices.
Graph colouring algorithms have been widely applied in many different kinds of ap-
plications. Timetabling of courses at university [20], for example, can be viewed as
a graph-colouring application that optimises the allocation of subjects, students, rooms
and lecturers. These entities are similar to vertices in the graphs, while the relationships
between the entities are the edges. Hence, for a given time period (colour), the graph
colouring algorithm will make sure there will be no clash between the rooms, student and
lecturers. This can also be applied for scheduling of flights at airports, and the schedul-
ing of running tasks in a multiprocessor machine. Another application is printed circuit
board testing, in which a graph colouring algorithm was used to check whether any of
the points in the board is short-circuited [9]. In this case, the lines between the points
in the board are the edges, while the points themselves are the vertices. There are also
other applications such as optimising the solution of sparse Jacobian matrix problems
[6], parallel numerical computation [17] and register allocation [4].
Due to the importance of graph-colouring applications, many researches have been con-ducted to find out the best algorithm in order to get an optimal graph colouring. Un-
fortunately, optimal graph colouring is an NP-hard problem [8]; therefore it is almost
impossible to find an optimal solution with the minimum number of colours, we can
only get a good colouring with a small number of colours. However, this is acceptable
for virtually all applications. Many applications dont require to have the least number
of colour, they might be more interested to solve the problem in the shortest period of
1
-
8/8/2019 Parallel Graph Colouring Shared Memory
10/61
time. In order to achieve a good solution, there are two main strategies in forming the
algorithm: first, the algorithm should perform in such a manner that it will only use as
few colours as possible in a graph, and secondly it should colour all the vertices in the
graph in the shortest period of time [8]. Nevertheless, there is always a trade off between
these two strategies. In some applications, we might need to emphasise minimum time,
but allowing a bigger number of colours. On the other hand, for some applications the
minimum colours are more important than the time constraint.
1.1 Graph Colouring Problem
The terminology in this paper will be defined as follows. Say we have a graph
with vertex set
with the number of vertices
, and edge set
, with the num-
ber of edges . Two vertices, and , in are said to be adjacent (or neigh-
bour to each other) if there exists an edge connecting them,
and the
set of vertices adjacent to is denoted as . Every vertex in the graph, has a
degree,
, defined as the number of adjacent vertices,
. The maxi-mum and minimum degree of vertex in a graph is denoted as and .
In solving the Graph Coloring Problem, we need to form a set of vertices, denoted by
with the number of vertices in the set . An independent set of is an
independent set of vertices , in such a way that there is no edge existing between and
, . On the contrary, a non-independent set of is a set of vertices , such
that there is an edge between and , for some . In some algorithms, a vertex ,
might be assigned a random number denoted as or given a weight denoted as .
The colour assigned to a vertex is denoted as with the total number of colour in the
adjacent vertices is denoted as
.
1.2 Motivation
There has been relatively little research on parallel graph-colouring algorithms, which
has motivated this project to try to find improved algorithms. We have also tried to
achieve a balanced graph colouring, that is minimising the number of colours and at the
same time considering the requirement that each processor should approximately have
the same number of vertices of each colour. This gives good load balancing when the
colouring is used in other parallel algorithms using the graph.
The fact that there are many good sequential algorithms which have not yet been paral-
lelized is also one of the reasons behind this project. We have looked at some well known
sequential algorithms and parallelized them. In most previous work, many of the parallel
algorithms gain little or no speedup [2, 14]. The work of Jones and Plassman [14] report
that they did not get any speedup for their algorithm. Most of these algorithms were also
2
-
8/8/2019 Parallel Graph Colouring Shared Memory
11/61
Figure 1.1: The Principal of Graph Colouring Algorithm
written for distributed-memory machines. Therefore we would like to try to implement
shared-memory versions of Parallel Graph Colouring Algorithms, which hopefully can
gain a reasonably good performance in terms of speedup and the number of colours used.
Recently Gebremedhin and Manne [11] implemented a parallel version of a standard
sequential algorithm and claimed to have a good linear speedup. They also applied their
approach to a better colouring algorithm. This was done on a shared-memory machine
using OpenMP[11]. What we would like to know is whether their algorithm is better
than other parallel algorithms that have given no speedup, or is it because shared-memory
machine is better than distributed-memory machine for this particular applications?
This project is an extension of previous work on parallel graph-colouring algorithms by
Allwright et al. [2]. The programs in that work were written in old non-standard parallel
programming languages. These previous programs were written in Express Fortran (for
message passing) and run on an Intel iPSC/860 computer. For data parallel, the programs
were written in CM-Fortran and run on a 32-node Thinking Machine CM5.
1.3 Objective
The aim of this project is to implement a variety of graph-coloring algorithms, both Se-
quential and Parallel. We then compare their performance in diferent parallel computers
as well as graph with different number of vertices/edges.
3
-
8/8/2019 Parallel Graph Colouring Shared Memory
12/61
The project concentrates on graph colouring algorithms for shared memory parallel com-
puters. The programs are written in Java which supports the Thread mechanism for
developing parallel programs.
The organisation of this thesis will be as follows. Chapter 2 introduces the algorithms
for sequential graph colouring, while Chapter 3 describes the parallel versions of those
sequential algorithms. Chapter 4 describes how these algorithms are implemented in
Java Threads. The result of the experiment comparing different algorithms for graphsof different types and sizes can be seen in Chapter 5. A conclusion will be drawn in
Chapter 6 and some future work will be suggested.
4
-
8/8/2019 Parallel Graph Colouring Shared Memory
13/61
Chapter 2
Sequential Graph Colouring
2.1 Common Graph Colouring Algorithms
Many studies have been conducted on sequential graph colouring algorithms. Some of
these algorithms have proven to be quite efficient and reliable, such as Saturation-Degree-
Ordering [3], Incidence-Degree-Ordering [6], Smallest-Degree-Last (SDL) [19], Largest
Degree First (LDF) [26] and First Fit or Greedy Ordering [1, 2, 16]. The NP-hard prob-
lem, such as the timetabling problem[26], can have an almost optimal solutions when
solved using these algorithms.
2.2 FirstFit (FF)
FirstFit (or Greedy) Algorithm is the simplest algorithm of all. It basically starts by
getting an arbitrary vertex
in the graph
and colouring it by the lowest available colour
(which is obviously 0 for the start). The next step is to get the next vertex arbitrarily
and get the vertex coloured in the same fashion until all vertices are coloured, as shown
in Figure 2.1
For to do
find lowest available colour
, for vertex
set colour of vertex
end for
Figure 2.1: First Fit (FF) Algorithm
5
-
8/8/2019 Parallel Graph Colouring Shared Memory
14/61
2.3 LargestDegreeFirst Algorithm (LDF)
The Largest-Degree-First Algorithm is described in Figure 2.2 and Figure 2.3. Every
vertex in a graph will be assigned its degree of vertex , i.e. total number of neigh-
bouring vertices connected to that vertex. The algorithm will use the degree of vertex
to determine which vertex to be coloured first. The vertex with a highest degree
(among neighbouring vertices) will be coloured first.
while (not all vertices in are coloured)
for to do
if
find lowest colour available, , for vertex
set the colour of vertex
to
end if
end for
end while
Figure 2.2: Largest Degree First (LDF) Algorithm
2.4 SmallestDegreeLast (SDL)
The Smallest-Degree-Last (SDL) algorithm, on the other hand, has a different system
in numbering the vertices. First of all, every vertex
having the same lowest degree ofvertex, , will be assigned a weight, as can be seen in figure 2.4. This set
of vertices
will then be removed from the graph, which will affect the degree of its
neighbours. In the next step, all the vertices with degree of , will again be removed,
but will be given successively larger weight, . If there is no vertex of degree
, the algorithm will then remove all vertices with degree of and assign the
next weight, . The neighbouring vertex will pushed back to the next weight. The
same step will then be repeated again, until all the vertices were assigned to a weight.
The colouring will then proceed as in LDF algorithm, starting from the highest value of
weight. The detail of the algorihtm is shown in Figure 2.5
2.5 IncidenceDegreeOrdering (IDO)
The IDO algorithm, as in figure 2.5, first identify the highest degree among the vertices
and then selects the set of vertices with the highest degree . The set
6
-
8/8/2019 Parallel Graph Colouring Shared Memory
15/61
Figure 2.3: Largest Degree First (LDF) Algorithm
7
-
8/8/2019 Parallel Graph Colouring Shared Memory
16/61
Figure 2.4: The first phase of Smallest Degree Last (SDL) Algorithm
8
-
8/8/2019 Parallel Graph Colouring Shared Memory
17/61
find lowest degree of vertex, , among all vertices;
; while (not all vertices in are weighted)
for to do
if
assign them a weight,
end for
increase
end while
while (not all vertices in are coloured)
for
to
do
find vertices with weighting
find lowest colour available, , for vertex
set colour of
end for decrease
end while;
Figure 2.5: Smallest Degree Last (SDL) Algorithm
9
-
8/8/2019 Parallel Graph Colouring Shared Memory
18/61
find the highest degree of vertex, ,
for to do
if
find lowest available colour, , for vertex
set the colour
end if
end for
while (not all vertices in are coloured)
for to do
get the number of coloured neighbour,
if
find lowest available colour,
, for
set colour of to
end if
end for
end while
Figure 2.6: Incidence Degree Ordering (IDO)
will then have to look for the lowest available colour for its members. Having some
vertices coloured, the algorithm will then select vertices that have the highest incidence
degree, i.e. number of coloured neighbours,
, and colour them with lowest available
colour
. The step is repeated until all the vertices are coloured.
2.6 SaturationDegreeOrdering (SDO)
Instead of counting the number of coloured neighbour as in IDO, SDO takes into consid-
eration the number of differently coloured neighbours. Therefore, a vertex , which has
neighbours, but only colours, would be in the same degree as vertex , which
has only neighbours with all of them coloured differently. The pseudo-code of
SDO is the same as IDO, except that now it will count the number of differently coloured
neighbours. IDO and SDO take much longer than other colouring algorithms but usually
give lower number of colours.
10
-
8/8/2019 Parallel Graph Colouring Shared Memory
19/61
Chapter 3
Parallel Graph Colouring
In practice, the Graph Colouring problem is usually part of a larger computation problem.
If the Graph Colouring cannot be solved in a relatively short period of time, it may affect
the whole computation[23]. For a small graph, sequential algorithms might be attractive,but when it comes to large graphs, the sequential solution might cause a bottle neck to
the overall computation problem. Therefore we need parallel graph colouring algorithms.
Even if the result of the parallel heuristic might not give as good quality colouring as the
sequential version, it will reduce the amount of time for the computation problem.
Studies on parallel graph colouring algorithms are very limited. Most of the parallel al-
gorithms are originated from sequential algorithms, which were parallelized. The basic
approach to parallel algorithm is by finding an independent set of vertices to be updated
[2], or in other words the algorithm cannot accept a pair of connected vertices to be up-
dated simultaneously. One of the first parallel algorithm was written by Luby [18], called
Maximum Independent Set (MIS) algorithm. The MIS algorithm is based on selection
of the largest set of independent vertices i.e. vertices which are unconnected, which can
then be coloured and removed from the graph. The next step will be looking for the next
largest independent set and so on, until all vertices have been coloured.
Another parallel algorithm based on independent sets was developed by Jones and Plass-
mann [14], (which is not from a sequential version). Every vertex in the graph was
assigned a random number. The algorithm will then check if none of the neighbouring
vertices have a higher random number, it will then colour that particular vertex. This
selection creates an independent set of vertices that can be coloured in parallel. Thisalgorithm has some deficiencies. First, the number of colours used in this heuristic is a
little bit more than number of colours in the best sequential heuristic. Secondly, it can
not provide a balanced colouring, an approximate equal distribution of colours among
the threads, especially for graphs which have highly variable local structure [11].
Other examples of parallel algorithms are the parallel versions of the two sequential al-
11
-
8/8/2019 Parallel Graph Colouring Shared Memory
20/61
gorithm (LDF and SDL) described in section 2.3 and section 2.4 that were parallelized
by Allwright et al. [2]. They basically work on the same principle namely selecting a set
of independent nodes to be coloured in parallel in the next stage.
Gjertsen, Jones and Plassman worked on improving the previous Jones-Plassman algo-
rithm, trying to fix the deficiencies by introducing two new algorithms, namely Parallel
Deviance Reduction (PDR(k)) and Parallel Largest First (PLF(k)) [15]. These two al-
gorithms improve the balance of an existing colouring without increasing the number ofrequired colours.
The research on parallel implementation was halted for quite some time until a recent
work of Gebremedhin and Manne [11] described a parallel algorithm which is suited to
shared memory programming and gives a linear speed up on the PRAM model. Another
heuristic which was developed by the same authors, shows an improvement in the number
of colours used. The experiments of these algorithms were done on an SGI Origin 2000.
Further work also shows that his approach is also suitable for an application on a coarse
grained multithread [10].
There is also one work implementing a parallel algorithm in Java Threads. Umland[24]in his paper claimed that he has implemented the Java version of First Fit Algorithm, and
give a reasonable speedup. Nevertheless, in his paper, the speedup gained is not linear
with a maximum of about 2 and slowly getting smaller for a high number of threads.
Umland uses a pipelined approach which is not scalable and has overheads in filling the
pipeline.
3.1 Parallel Graph Colouring Algorithm
As has been discussed in Chapter 1, basically the graph Colouring Algorithm is finding a
set of vertices in a graph and colouring them in such a way that none of the neighbouring
vertices would have the same colour. If we examine the existing sequential graph colour-
ing algorithms, there are some algorithms in which the selection of vertices creates an
independent set of vertices while the rest of the algorithms creates a non-independent set
of vertices. The algorithms included in the first group are JP and LDF in which it selects
a vertex in such a manner that none of the following vertices are neighbours. We also
need to assign random numbers to vertices to break ties. The rest of the algorithms such
as SDL, SDO, IDO and FF uses a non-independent sets, in which random numbers might
also required.
The fact that the first group of algorithms are having independent set of vertices, has
made them easy to be parallelised. Those vertices in the set can be distributed among
the processors and coloured concurrently. Some of the algorithms in the second group
of algorithms can be directed to produce an independent set of vertices. For example,
the selection of nodes in SDL can use a random number to break the ties between two
12
-
8/8/2019 Parallel Graph Colouring Shared Memory
21/61
neighbours having the same weight. However, there are still some algorithms which are
quite hard to produce an independent set of algorithms, for example First Fit Algorithm,
due to its nature of selecting vertices.
Parallel Graph colouring algorithms need to communicate between the processors.
They need to know what is the condition (e.g. colour number, weighting, random num-
ber) of its neighbours, which might be on other processors. All parallel algorithms need
to get this information, which is why shared-memory machines should be better thandistributed-memory machines in this application.
This chapter describes the major component of this project, that is composing the parallel
versions of the previous sequential algorithms in Chapter 2. In the parallel version, the
vertices in the graph will be distributed among a certain number of processors. The dis-
tribution is based on the number of vertices, , divided by the number of processors
available, p. Hence each processor will colour number of vertices.
In this Chapter, we will divide the discusson of the development of the parallel algorithms
based on the approaches discussed above. The first section will discuss the importance
of synchronisation in a parallel graph colouring. The next section will then describethose algorithms which produces set of independent vertices such as Largest-Degree-First
(LDF) algorithm [26] and Jones-Plassmann (JP) [14] and Smallest-Degree-Last (SDL) al-
gorithm [19], while the second section will talk about the rest of the algorithms using the
second approach, namely FirstFit Algorithm (FF) [1, 2, 16], Incidence Degree Ordering
(IDO) [6] and Saturation Degree Ordering (SDO) [3] and Gebremedhin and Medhin [11]
algorithm.
3.2 Synchronisation
Synchronisation holds an important role when developing a parallel version of the algo-
rithm. A proper synchronisation is required at certain stages of the algorithm in order to
minimise the running time and avoid any race condition.
Synchronisation takes place in such cases : threads have to be synchronised after forming
the set of independence vertices. For example, after giving weight to a set of vertices ,
the thread has to wait for other still-running threads. Otherwise, it will result in wrong
selection of vertices.
In most of the algorithms, colouring will take place just after forming the set of vertices
, therefore a synchronisation is required. In the colouring phase, all threads will colour
the vertices assigned to them concurrently. A race condition might occur here where 2
adjacent vertices in 2 different threads are being coloured by the thread at the same time
with the same colour. Thread 1, for example, is trying to find the lowest colour available
for vertex , and it will look at s neighbour, say , in which at this stage has not
13
-
8/8/2019 Parallel Graph Colouring Shared Memory
22/61
been coloured yet and therefore is ignored. At the same time, thread 2 is trying to colour
vertex , and searching for the lowest available colour among s neighbour, say one of
them is , which at this stage has not been coloured yet and therefore is ignored. Hence,
both threads might end up colouring both vertices in the same colour or in other words
the colouring is wrong. Figure 3.1 shows how this might happened in a graph colouring
using 4 processors machines.
Figure 3.1: Incorrect Graph Colouring
Therefore, we need to make sure that both threads will not assign the same colour toboth vertices. There are 2 proposals to correct this : The first proposal is to make sure
that thread 1 will colour vertex
, after or before vertex
, and not at the same time.
Therefore, vertex
has to find out whether its neighbour belongs to other threads or not
(since only in this case the race condition will happen). We also need to call the barrier
synchroniser to hold thread 2 from checking the lowest available colour, until thread 1 has
finished colouring vertex . The drawback of this method is that if the conflict happened
in a significant number of times, the essence of parallelism wont be achieved, since this
method would use up more resources both in time as well as memory.
Another proposal is to let those errors happen but afterwards conduct a checking through
the whole graph, to search for any adjacent vertices which have been coloured wrongly.
These pairs of vertices will be then be stored, and then fixed sequentially [11].
Other issue that might create problem in the synchronisation is the different number of
iterations for each thread. Once a thread has finished its part in one stage of the algorithm,
the barrier synchroniser will tell this thread to wait for other threads that are still running
14
-
8/8/2019 Parallel Graph Colouring Shared Memory
23/61
their tasks. In these tasks, a thread might need to synchronise its work with other threads
and hence will invoke the barrier synchroniser. This call to the barrier synchroniser might
cause those threads that have been put to sleep to be woken up and continue with the next
step of the algorithm. This will result in an incorrectly coloured Graph.
Nevertheless, synchronisation has a major drawback in terms of speed-up. We must be
very careful in selecting methods or classes of Java in which some of them might be
synchronised and therefore slow down the whole process.
3.3 Independent Set
3.3.1 JonesPlassmann (JP)
The first phase of this algorithm is assigning a random number to every vertex in the
graph. The algorithm will then form a set of independent vertices in the following man-ner: Each vertex will look at its neighbour and see whether it has got the highest random
number among its neighbours. The next step is the colouring of all these highest ver-
tices by the lowest available colour (which has not been used by any of its neighbour)
and remove them from the graph. The algorithm will then choose the next set of highest
(random number) vertices and again colour them in the same manner. Figure 3.3 and
3.2 shows how the algorithm actually works. All threads need to be synchronised once
it has formed the set of independent vertices , before moving on to the colouring step.
Similarly, once has been coloured, all the threads need to be synchronised once more,
to avoid any wrong selection of vertices in the following . The algorithm then will
iterate until all vertices in
are coloured in each thread.
3.3.2 LargestDegreeFirst Algorithm (LDF)
The basic principle is similar to the sequential version, i.e. to form set of vertices which
has the largest degree of vertex, and colour them independently (see Figure 3.4). In the
parallel version, the vertices in each thread will look at the degree of all its neighbours,
even though they might belong to other threads. Any conflict two vertices having the
same degree will be solved by comparing its random number. Having formed the set
of independent vertices, all the threads are now need to be synchronised before moving
on to the colouring process. The synchronisation process is essential in obtaining correct
colouring, without which two threads might colour two adjacent vertices with the same
colour and hence produce a mistake. This could happen when one thread has finished
finding the set of independent vertices, while the others are still searching. After being
synchronized, the colouring phase will then take place concurrently (since all of them
are independent and not connected to each other). Nevertheless, each vertex still has
15
-
8/8/2019 Parallel Graph Colouring Shared Memory
24/61
assign random number to each vertex ;
while (not all vertices in are coloured)
for i=1 to
do
if
then
end if
end for
for to do
find the lowest available colour,
, for vertex
;
set the colour of vertex to ;
end for
SYNCHRONISE ALL THE THREADS;
end while
Figure 3.2: JonesPlassmann(JP) Algorithm
to find out what is the lowest colour available (by looking at colour of its neighbours).
The threads once again, need to be synchronised before moving on to the next stage
of forming another set of independent vertices, otherwise in the next step one thread
might select those vertices which are not coloured yet, but soon to be coloured by other
still-running threads. Figure 3.5 describes the process of colouring using Parallel LDF
method.
3.4 Non-independent Set
The methods below are using the approach of forming a non-independent set of vertices,
, to then start with the colouring. When applied in parallel, most of these algorithms
will give an incorrectly coloured graph. This will occur when two threads happen to
access two adjacent vertices at the same time, looking at each others colour (which has
yet to be coloured) and assign them the same colour. In the previous algorithms, this will
not happened, since all of them are independent. Therefore a step has to be taken either to
make sure that when they have neighbours in other threads, they colouring phase would
be synchronised, or else fix up those vertices which are assigned the wrong colour, after
the entire colouring process finished.
16
-
8/8/2019 Parallel Graph Colouring Shared Memory
25/61
Figure 3.3: The colouring stages in Jones-Plassmann Algorithm
17
-
8/8/2019 Parallel Graph Colouring Shared Memory
26/61
Figure 3.4: Largest Degree First (LDF) Algorithm
18
-
8/8/2019 Parallel Graph Colouring Shared Memory
27/61
assign random number for each vertex ;
assign vertices to each thread;
while (not all vertices in are coloured)
for to do
if ;
then
else if
and
then
end if
end for
for to do
find the lowest colour available, , for vertex ;
set the colour of vertex
to
;
end for
synchronise all the threads;
end while
Figure 3.5: Largest Degree First (LDF) Algorithm
19
-
8/8/2019 Parallel Graph Colouring Shared Memory
28/61
3.4.1 First Fit (FF)
As has been described in section 2.2, First Fit Algorithm will colour the vertices by
choosing the vertex arbitrarily. This also apply in the parallel version. The consequences
of having wrong colour might occurred here. As described previously, to prevent this
from happening we have to synchronize all other threads accessing two adjacent vertices.
This will cause a big overhead for the overall computation time. Gebremedhin and Manne
[11] introduced a new approach that we should check for any possible wrong colouredvertices at the end of the session and give them the appropriate coloured afterwards. This
part will be done sequentially, to ensure there will no more race condition between the
threads. As we can see in figure 3.6, the thread need only be synchronised once the
colouring is done, before the checking commences.
distribute vertices to each thread;
while (not all vertices are Coloured)
select an arbitrary vertex in each thread ;
give them the lowest colour available
synchronised all threads;
end while
for each thread,
check if the graph is correct
if not, store those incorrect vertices
end for
colour incorrect vertices sequentially
Figure 3.6: First Fit (FF) Algorithm
3.4.2 GebremedhinManne (GEBMAN)
Gebremedhin and Manne developed two algorithms. The first one is basically the imple-
mentation of FF algorithm in parallel. The other version (GEBMAN algorithm) involves
another phase before coming to the checking and correcting stage. The first phase ofthis algorithm works exactly the same as FF but the result of the colouring is regarded
as a pseudo-colouring. We group those vertices which have the same colour into a
, start from 0 up to the highest colour
. Hence if the graph with 5 differ-
ent colours, there will be 5 ColourClass (see Figure 3.7). The second phase is working on
the basis that if we re-apply FF algorithm to the graph and use the ColourClass with the
highest colour to start the colouring, we will be able to first colour the vertices which are
20
-
8/8/2019 Parallel Graph Colouring Shared Memory
29/61
step 1: colour the graph as in FF
vertices are coloured from
to
;
step 2:
for down to do
distribute evenly among the threads ;
for each vertices,
get the lowest colour available, , for
set the colour
to
;
end for
end for
step3: same as before : check whether the graph is correct or not
step4: correct the graph if it is wrong (sequentially)
Figure 3.7: GebremedhinManne (GEBMAN) Algorithm
hardest to be coloured. In this manner, the colouring of the graph are actually in reverse
order [11]. This will hopefully reduce the number of colours.
3.4.3 SmallestDegreeLast (SDL)
The parallel version of SDL as can be seen in figure 3.8 is quite similar with its
sequential version, except in a few parts. The algorithm will determine what is the lowest
degree of vertex, , in the graph and then search those vertices that has got
such degree. The work is then distributed in number of thread in which each threads
will look for the vertices who has the degree of vertex, , and assign them the lowest
weight, . This set of vertices will then be removed from the graph, and the next
iteration will find another set of vertices which has degree of vertex less than or equal to
and given the next weight, . This weighting stage will continue until
all the vertices are given a weight.
The next stage is the colouring phase, which starts from the vertices that have been as-
signed the highest weight down to the lowest weight . The colour-
ing phase uses the approach introduced by Gebremedhin and Manne, namely ignore any
wrong colouring at the first stage then correct them later on. SDL algorithm could alsobe directed to produce a set of independent vertices by introducing a random number to
break ties between 2 adjacent vertices, similar to parallel LDF.
21
-
8/8/2019 Parallel Graph Colouring Shared Memory
30/61
find the lowest degree of vertex, , in all vertices;
distribute the vertices into number of threads;
;
while (not all vertices in
weighted)
for to do
if
give
a weight of
end for
increase ;
;
end while;
SYNCHRONISE ALL THREADS;
while (not all vertices in coloured)
for to do
if
find the lowest colour available, , for vertex
set colour of =
end for
decrease
SYNCHRONISE ALL THREADS;
end while;
for each thread
check if the graph is correct
if not, store the incorrect vertices
end for
fix up incorrect vertices sequentially
Figure 3.8: Smallest Degree Last (SDL) Algorithm
22
-
8/8/2019 Parallel Graph Colouring Shared Memory
31/61
3.4.4 IncidenceDegreeOrdering (IDO)
The first part of parallel IDO algorithm, i.e. searching for the highest degree of vertex
in the whole graph
. As figure 3.9 shows, after this stage, the work will
be done in parallel among number of threads. Having done the first set of vertices
coloured (with the lowest available colour), we can now can start with the gist of the
algorithm i.e. selecting vertices based on the total number of its coloured neighbours,
. Each vertex in every thread will look at its neighbour and count howmany of them is coloured even though the neighbour might belong to other threads. The
highest ones among them will then be coloured with the lowest available colour . Again
the colouring is done based on Gebremedhin and Manne approach. The algorithm will
iterate until all the vertices is coloured.
find the highest degree of vertex, , in graph ,
distribute the work on number of threads.
while (not all vertices coloured)
. . . same as the sequential version
end while
for each thread
check if the graph is correct
if not, store the incorrect vertices
end for
fix up those vertices which are incorrect
Figure 3.9: Incidence Degree Ordering (IDO) Algorithm
3.4.5 SaturationDegreeOrdering (SDO)
There is no significant difference between the parallel version of IDO and SDO except
that now it take account the number of differently coloured neighbour (which must be
less or equal to the number of coloured neighbours)
.
The algorithm can be seen in figure 3.9. SDO and IDO are among the best Graph Colour-
ing Algorithms because they give the lowest number of colour. These algorithms have not
been implemented in parallel before, therefore this is the first implementation of parallel
version of IDO and SDO.
23
-
8/8/2019 Parallel Graph Colouring Shared Memory
32/61
3.5 Balanced Colouring
Having the fastest and lowest number of colours for each algorithm, is one of the aims.
Another aim of this project is to achieve a balanced graph colouring. To achieve this,
there are few techniques that can be implemented. We have looked at 2 techniques of
balanced colouring :
1. Balancing during colouring
Within the colouring phase, every thread should have the knowledge of how many
colours other threads have so far and how many of them for each colour. Hence, a
public variable is required in the program so that every thread could know the num-
ber of vertices of a given colours in other threads. Therefore, instead of assigning
the lowest colour available, we might have to give a vertex a higher colour, in order
to maintain the balanced between colours. This might result in the increase of the
number of colours used. Some extra computation time might also be required to
check other threads colour composition.
Figure 3.10: Balanced Coloured Graph
2. Balancing after colouring
We can also colour the graph initially with the lowest colour available, and then
24
-
8/8/2019 Parallel Graph Colouring Shared Memory
33/61
check the composition of each colour in every thread. Having this information, we
can then sweep every single colour and exchange the colour
for a vertex to a
higher / lower colour
(which has a lower number of colours in the whole graph).
Here, we also have to make sure that the new colour should conform to the basic
requirement of graph colouring i.e. none of the neighbours has the same colour.
Gjertsen, Jones and Plassman implemented the second balanced colouring method intheir later algorithm and allow several passes to the graph to reorder the balancing of
the graph. This is the k factor in their PLF(k) and PDR(k) algorithms[15].
25
-
8/8/2019 Parallel Graph Colouring Shared Memory
34/61
Chapter 4
Implementation
4.1 Previous Work
The algorithm of Maximum Independent Set (MIS) by Luby [18] takes an average time
O(log n) using the P-RAM model, however this was not implemented on a real ma-
chine. The next algorithm was introduced by Jones-Plassmann, in which they reported
no speedup for their algorithm which used PVM on a distributed memory machine. A
further implementation of JP algorithm was developed by Gjertsen Jr. et. al. [15] in which
they developed a set of new algorithm PLF(k) and PDR(k) which require fewer colours
than its older algorithm JP but used slightly more execution time. This work also does
not report any speedup on their new algorithm although they achieved a good balanced
colouring algorithm. Allwright et al. [2] parallelized some well-known sequential algo-
rithms such as LDF and SDL, and implemented them both in SIMD and MIMD parallelarchitectures. Unfortunately, their work also did not achieve any speed up for any of
these algorithms.
Most of these algorithms were implemented on distributed-memory machines. There are
also some recent works which have implemented the algorithms on a shared-memory
machine. A work done by Umland [24] has implemented the parallel version of First
Fit (FF) Algorithm in Java Threads in a 4 processor machines and achieved maximum
speedup of 2. Another work of Gebremedhin and Manne [11] developed two new algo-
rithms and claimed that they have achieved an almost linear speedup as well as improv-
ing the number of colours used compared to the standard FF algorithm. Their algorithms
were implemented using Fortran90 using OpenMP on a SGI Origin 2000 super computer.Since they only implemented one particular algorithm, namely First Fit, we would like to
find out whether their good speedup is due to the algorithm or is it showing that shared-
memory machine would perform better in Parallel Graph Colouring algorithm than a
distributed memory machine.
26
-
8/8/2019 Parallel Graph Colouring Shared Memory
35/61
4.2 Structure
The implementation of the algorithms in Chapter 2 and Chapter 3 is using Java Thread.
The selection of Java is due to the fact that Object-oriented programming language, such
as Java, is good for graph algorithms. Moreover, Java has inbuilt support for shared-
memory parallelism using its Thread class.
4.2.1 Java Thread
A thread is part of a program which has a beginning, and executions and an end, just
like any other sequential program. Multithreading is a mechanism in which we can run
several jobs concurrently in one program. Java supports multithread programming in
which we can assign several tasks to different threads at the same time. There are two
methods of implementing Threads in Java [13, 21]:
Subclassing Thread and overriding its run method
The implementation should be the subclass of Thread Class and create a run method
in our Class to overide the run method of Thread Class. The run method will then
be invoked by calling the start method of the Thread Class.
Implementing the Runnable Interface
Instead of subclassing the Thread class, we can also implement the Runnable inter-
face, which means we have to implement the run method defined in the interface.
This is very useful when our class has to subclass other Class (other than Thread).
In our implementation, however, we choose to use the second method since we create
a Class which subClass Thread Class with the hope that this class would be generic
and can be used for all other class in our program. Nevertheless, in the later stage of
the development, we find out that we need almost a different Thread Class for every
algorithm we develop. Therefore we change the implementation using the first method.
4.2.2 Data Structures
Java does not have a graph class and therefore we implemented our own graph class.The Class contains the data structure of the graph, which store the vertices and edges
as well as various methods to invoke or access the data in the graph, such as method
of firstNode() which return the first Node in the list of vertices, firstEdgefrom(Node n)
which return the first Edge of vertex v and so on. The Class also need to read an input
file either in stardard form (for Sparse matrix graphs) or the user-defined format (for the
Random Graphs). Therefore we wrote 2 separate input Parser in order to do this.
27
-
8/8/2019 Parallel Graph Colouring Shared Memory
36/61
Initially the data structure of the graph was stored in a Vector, since the size of the Vec-
tor can grow by itself and we dont know how many vertices or edges the input file will
have. But, this selection has a major drawback which affects the speedup since Vector
is synchronized. Hence, every time a thread is trying to access a particular vertex in the
graph, other threads have to wait until it is finished. This fact defeated the purpose of
parallel programming. We therefore changed the data structure to an array to avoid any
synchronisation. The work of Gortz [12] also shows that there is unnecessary synchroni-
sation using Vector as the data structure. This is acceptable since most of the graphs arestatic.
4.2.3 Class Structure
The algorithms are implemented in Java Thread and organised in such a way that com-
mon methods are collected in one Class. Those algorithms which are implemented are
discussed in Chapter 2 and Chapter 3.
For every algorithm, few Classes are written:
1. Main file: containing the main method, a method of parsing the graph, a method
of distributing jobs to different threads and invoking the run method of the Thread
class.
2. Thread Class: overwrite the run method in the Thread class, which invoke the
method in color / algorithm class.
3. Algorithm Class: consists of methods to form the set of vertices.
4. Colouring Class: containing a method to colour the set of vertices. In simplealgorithms, this class is combined with the algorithm class in one class.
On the top of these classes, there are also other general classes:
1. Graph Generator: creating file input of random graphs with a certain paramater, e.g.
the number of vertices, the number of edges, the percentage of edge per vertex.
2. Graph Parser: to read and form the graph from the file input.
3. Barrier Synchronisation: used in the parallel version, containing method to inform
the thread to wait for other threads until they are finished running (synchronising
the threads)
4. Function Class: a collection of common methods used in most of the algorithm,
for example finding the lowest/highest degree of vertex, lowest colour available,
checking the balanced colour etc.
28
-
8/8/2019 Parallel Graph Colouring Shared Memory
37/61
Other than all these files we also have developed a Graph generator Class in order to
create random Graph input files, and sets of testing files for different number of threads
in different machines.
4.3 Sequential version
There are 6 sequential algorithms implemented namely Jones-Plassman (JP), Largest De-
gree First(LDF), Smallest Degree Last (SDL), Incidence Degree of Ordering (IDO), Sat-
uration Degree of Ordering (SDO), First Fit (FF) Algorithm. The degree of complexity
of these algorithms, starts from FF being the simplest one, JP, LDF, SDL, IDO and SDO.
All of these algorithms are choosing a vertex to be coloured following a set of rules. The
vertex is then coloured one after another (with the lowest colour available) until all the
vertices in the graph is coloured. Note that JP algorithm does not actually have any
sequential version, but we developed its sequential algorithm (which has the same princi-
pal as its parallel version, i.e using the biggest random number to choose the vertex to be
coloured) for the purpose of comparison of speedup achieved by its parallel algorithm.
4.4 Parallel version
The main issue with parallel colouring is that we cannot in general colour nodes inde-
pendently, otherwise we might get a wrong colouring i.e. 2 adjacent vertices having the
same colour. In sequential version, the vertex is coloured one after another, therefore we
can make sure that none of its neighbour would have the same colour. On the contrary,
the parallel colouring require the colouring to be done simultaneously and at the sametime, avoid any mistake in the colouring phase. Hence, to achieve this we need a few
synchronisation methods in some stage of the program.
We have developed a barrier synchroniser which help the thread to understand whether
they have to wait to execute next part of the program. To do this, we use two Java Thread
Class methods, namely wait() and notifyAll() to let other threads know whether the caller
of this methods wants other threads to wait or to release itself from the waiting queue
[13, 7]. Once a thread invokes a wait() method, it will wait until another thread calls the
nofityAll() method, in which all the waiting threads are woken up and start executing the
next part of the program.
Barrier synchroniser is invoked mostly at 3 places:
After the formation of independent (or non-independent) set of vertices. Neverthe-
less, this only apply to those algorithms which take into consideration the number
of coloured neighbours, such as in SDL, SDO and IDO. Other algorithms such as
29
-
8/8/2019 Parallel Graph Colouring Shared Memory
38/61
FF, JP and LDF need not be synchronized at this stage. To illustrate the importance
of synchronisation, lets take a look at an example of the SDO Algorithm: Say
we have two Threads, in which Thread 1 is faster than Thread 2. Having finished
selecting the set of vertices, say
, for the 1st iteration, Thread 1 moved on by
colouring those set of vertices. Thread 2 , on the other hand, is still selecting the
vertices which have the highest number of differently coloured neighbours, say
.
While Thread 2 is selecting
, those set of vertices in
(which might be the
neigbours of vertices in
) are being coloured. When Thread 2 is selecting
,
might not be coloured yet, but it might be so just after
is formed. Hence the
selection of
is wrong.
After the colouring phase. This synchronisation basically has the same function as
the first one, that is to avoid any possibility of one thread identifying a vertex as an
independent set while the other thread is colouring one of its neighbour.
After all the vertices in the graph are coloured and before we want to perform any
checking for any wrong coloured vertices. The reason for this is quite obvious,
since uncoloured vertex will be ignored and later on might have a wrong colour.
4.4.1 Independent Set Vertices
The algorithms of this category will produce a correctly coloured graph since all the
vertices in the set is independent, and hence no wrong colour would be given to any ad-
jacent vertices. The method of finding the lowest colour available holds a very important
rule in making sure the all the vertices are correctly coloured. Nevertheless, a checking
is performed at the end of the algorithm for debugging purpose. The time taken for the
checking is quite and since this is not required in the algorithm therefore it is not included
in the timing. Synchronisation for these set of algorithms are taking place as mentionedabove, namely after the grouping of vertices, and after the colouring of the set of vertices.
4.4.2 non-Independent Set Vertices
For each algorithm, the set of vertices will be coloured according to the order it was
stored in the collection. Errors of giving same colour to adjacent vertices are likely to
occurred during this phase, since the threads are not forced to wait for others until they
finished colouring (see Figure 3.1). In the implementation, we choose to use the second
approach (as in section 4.4).Hence, checking is very essential in the later stage of the
algorithm, in order to fix the colour of those vertices. The checking of the graph is done
in parallel, but the correction is done in sequential in order to avoid any further errors.
30
-
8/8/2019 Parallel Graph Colouring Shared Memory
39/61
Set the Threshold value, t;
Loop over vertices
;
if
and
for to
if vertex
having the colour
Check if colour
exists in the
if not then swap(
)
if
, threshold
then stop
else
iterate until
or
end if
end if
end for
end loop
Figure 4.1: Colour Balancing method
4.5 Balanced Colouring
The balancing method used in the algorithm is the second approach explain in section 3.5,
with some modification. The algorithm is described in Figure 4.5. The method is, first of
all, colouring the graph as per normal, and thus we know what is the number of color
.
The number of each colors will be stored in an array and then compared with the ideal
number of colors. The ideal number of colors is defined as the number of vertices per
processor
divided by the number of color
, ideal
. In the case where
all the threads have a different number of colours, we will use the highest colour among
all threads. Those vertices which have been colored with a colour which has a higher
number of colors than the ideal number, will have to be re-coloured with another colour
which has a lower number of colour than the ideal number. These swapping of colours
will also consider the main rule of Graph Colouring that is none of the new colour is
belong to any of the adjacent vertices.
We also set a threshold to stop the process of re-colouring in the case where the colour
of a vertex cannot be swapped with another colour (since all of the colours are already
31
-
8/8/2019 Parallel Graph Colouring Shared Memory
40/61
exist in the adjacent vertices). The threshold here is a percentage of the ideal number of
colour which we are trying to achieve for every thread. The method will keep checking,
if the distribution of a given colour within all the threads is less than the threshold, then
the iteration of swapping colors should be stopped. The drawback of this method is that
it sweeps the graph once and it will stop even though some of the colours might not be
distributed evenly. Ideally, we might need a few sweep across the graph to re-order the
distribution of colouring in the case where no further swap of colours can be done. This
balancing method is very simple and it could be improved in many ways.
32
-
8/8/2019 Parallel Graph Colouring Shared Memory
41/61
Chapter 5
Performance measurement and
Analysis
A major component of this project is to observe the performance of the newly devel-oped algorithms and find out whether these algorithms have gained any speed up in the
computation time. Most of the previous work has not gained any or much speedup. The
work of Allwright et al. report that they did not get any speedup [2]. Jones-Plassmann
in their paper in which they describe the JP algorithm does not describe any speed-up
in their algorithm [14]. The only work that has shown good speedup is Gebremedhin
and Manne [11] who used a shared-memory machine. This chapter describes the per-
formance of these algorithms which we have developed, in terms of the running time,
speed-up gained and the number of colours used in the graph.
5.1 Experiment conditions
The testing of the algorithms was conducted in a 4-processors shared-memory machine,
Sun E420R (Orion) of Physics Department, University of Adelaide. Orion is made up of
40 Sun E420R servers machine, in which each processor is 450 MHz Ultrasparc II with
4 MB of level 2 cache[5]. These tests were done on few nodes of the Orion machine.
We tried to make sure that during the execution of the program, there were no other jobs
running in order to obtain a reliable result. In the later part of the experiment, we also
tested the algorithms on a larger machine, Titan, a SGI Power Challenge of 20-processors
with 195 MHz MIPS R10000 processors with 2 MB of level 2 cache[25].
The test graphs here are of 2 different types :
Random Graph: We developed a graph generator to produce a random graph with
a certain number of nodes, and certain percentage of edges per nodes. A few large
33
-
8/8/2019 Parallel Graph Colouring Shared Memory
42/61
graphs in the order of several hundred nodes were selected, with different number
of edges.
Sparse Graph: This was taken from the collection of standard Sparse Matrix Graphs
available on the internet [22].
Table 5.1 and 5.2 shows the number of vertices and edges for each test graph. The
tests were conducted for each algorithm for 1,2,3,4 processors, since E420R has only 4processors. Any speed-up shown in the graphs was the time taken by the parallel version
of the algorithm against the time taken by the sequential version.
Nodes Edges
250 6062
500 9490
1000 19764
Table 5.1: Testing Graphs 1 : Random Graph
Name Nodes Edges
3elt 4720 27444
4elt2 11143 65636
4elt 15606 91756
Table 5.2: Testing Graphs 2 : Sparse Matrix
5.2 Results
5.2.1 Different types of graphs
Sparse matrix graphs
For the sparse graphs, the algorithm has shown a good speedup. Tests were conducted
on small graphs (3elt) as well as large graphs (4elt and 4elt2). In terms of the time taken
to solve the GCP, figure 5.1 and figure 5.2 shows that FF algorithms took the smallest
amount of time, followed by its similar version, GEBMAN. IDO and SDO algorithms
are the slowest among the algorithms, while JP and LDF are in between. On the contrary,
in terms of speedup table 5.3 FF Algorithm, being the simplest and fastest algorithm,
have a fairly reasonable gained between 2-4; while SDO and IDO which are the slowest,
gain a high speedup between 5-6. This gain might be due to the fact that the sequential
version of these two algorithms are very slow and Orion might have had a heavy load
34
-
8/8/2019 Parallel Graph Colouring Shared Memory
43/61
-
8/8/2019 Parallel Graph Colouring Shared Memory
44/61
Figure 5.1: Computation time for 3elt problem
36
-
8/8/2019 Parallel Graph Colouring Shared Memory
45/61
Figure 5.2: Computation time for 4elt2 problem
37
-
8/8/2019 Parallel Graph Colouring Shared Memory
46/61
Figure 5.3: Speed up for 3 elt problem
38
-
8/8/2019 Parallel Graph Colouring Shared Memory
47/61
-
8/8/2019 Parallel Graph Colouring Shared Memory
48/61
Figure 5.4: Speed up for Random Graph (250 Nodes)
40
-
8/8/2019 Parallel Graph Colouring Shared Memory
49/61
-
8/8/2019 Parallel Graph Colouring Shared Memory
50/61
Figure 5.6: Computation Time for Random Graph (250 Nodes)
42
-
8/8/2019 Parallel Graph Colouring Shared Memory
51/61
-
8/8/2019 Parallel Graph Colouring Shared Memory
52/61
Figure 5.7: Computation Time for Graph of same nodes (500 Nodes) and different num-
ber of Edges
44
-
8/8/2019 Parallel Graph Colouring Shared Memory
53/61
5.2.3 Different number of processors
The algorithms were tested on another machine called Titan, which has 20 processors but
having less cache memory. The graph used in the test is 3elt problem from the Sparse
matrix. If we compare the timing result for Titan in table 5.11 and for Orion in table 5.4,
for the same number of processor (4), Orion performed better in terms of the time taken,
as Orion has faster processors (450 MHz compared to 195 MHz).
No of FF SDL GEBMAN JP LDF IDO SDO
processors
1 922.8 2246.5 1492.4 5022.6 5660.7
2 493.2 1503.2 739.7 2848.2 2775.0
4 264.6 857.5 391.2 1521.0
8 133.4 839.3 220.6 776.0
12 91.2 883.3 192.3 555.3
16 114.7 1095.4 155.5 485.6
Table 5.11: Computation time for each algorithm in different machines (TITAN)
Figure 5.8 shows the overall performance of all the algorithms. The bigger the number of
threads used to solve the problem, the smaller the time taken. Nevertheless, the speedup
or time taken tend to be go down or even constant after the number of processor above
12. Hence, we can say that most of the algorithms scale well to more than 4 processors.
From figure 5.9 we can see that algorithms such as FF have an increasing speedup until
it reach 12, while LDF, GEBMAN and JP algorithms are having a increasing speedup
upto 16 processors. Interestingly, SDL have a low speedup, which might be due to the
machine had a heavy load at the running time. The result obtained for 16 processors
might not be reliable either for the same reason.
5.3 Balanced Colouring Graph
The approach for balanced colouring graph is using the Checking and fixing method
of section 4.5. Table 5.12 shows the colour distribution for 4elt problem using a FF
algorithm before we apply the balancing method. Note that the standard deviation is
difference of number of a given colour in each processor. Hence a big standard deviation
refers to the condition that the number of a given colour in each thread is not balancedor far apart. Table 5.12 shows how the improvement of balanced colouring for the same
graph problem. Before balancing, the deviations are large. After balancing, most of the
colours have zero standard deviation (all threads are having the same number of a given
colour), with some of them have a big standard deviation, but still in the order of 10
percent of the ideal number of colour. The fact that there are threads which are having
more colours than the rest (see colour 3 and 5 of table 5.13) is because the balancing
45
-
8/8/2019 Parallel Graph Colouring Shared Memory
54/61
Figure 5.8: Computation Time for 3elt Graph in Titan
46
-
8/8/2019 Parallel Graph Colouring Shared Memory
55/61
Figure 5.9: Speedup for 3elt Graph in Titan
47
-
8/8/2019 Parallel Graph Colouring Shared Memory
56/61
-
8/8/2019 Parallel Graph Colouring Shared Memory
57/61
-
8/8/2019 Parallel Graph Colouring Shared Memory
58/61
Chapter 6
Conclusions and Future Work
6.1 Conclusions
We have implemented some of the existing sequential Graph Colouring algorithms, namely
Smallest Degree Last (SDL), Largest Degree First (LDF), First Fit (FF), Saturation De-
gree Ordering (SDO) and Incidence Degree Ordering (IDO) in Java. We also have
developed their parallel version and the parallel algorithm of Jones-Plassman (JP), for
shared-memory machines using Java Threads. These algorithms were also transformed
to parallel versions using two different approaches, forming the Independent Set and non-
independent set. The algorithms were implemented in Java since it supports the parallel
programming using its Thread Class.
The performance of these algorithms shows that FF Algorithms is the fastest but gives alarger number of colours. On the other hand, SDO and IDO are the slowest algorithms,
but gives a better colouring in terms of number of colours.
The choice of the Graph Colouring Algorithm depends on what sort of problem it is
trying to solve. For problems in which we require the lowest number of colours, IDO and
SDO will probably the most suitable (even though these are slow). But if the application
does not require this, FF, SDL or JP algorithm will be sufficient (and they are faster as
well).
Most of the algorithms have shown a reasonably good speedup on shared-memory ma-
chines, so we come to the conclusion it is the shared-memory machines which is betterthan distributed-memory machines in this sort of applications, and not the algorithms as
stated by Gebremedhin and Manne [11].
This project also has developed a parallel implementation of the best sequential algo-
rithm, namely SDO and IDO, which is the first implementation of such algorithms.
50
-
8/8/2019 Parallel Graph Colouring Shared Memory
59/61
6.2 Future Work
1. There are some algorithms, for example SDL, which can be parallelised by form-
ing the independent-set of vertices. It would be interesting to observe the per-
formance of both algorithms, one using a selection of independent-vertices and
the other which are implemented here, using the non-independent set (which also
Gebremedhin-Manne approach in fixing up the incorrect vertices). We can com-
pare which one is better in terms of time, speedup and the number of colours used.
2. We also recommend that more efforts should be done on testing on larger machines
as well as larger graphs. The performance of algorithms should also be tested
against the complexity of the graphs to classify which algorithms work best for
which type of graphs.
3. There are also other sequential algorithms which are yet to be parallelised and
compared with the existing ones.
4. The balanced colouring is an interesting feature of Graph Colouring Algorithms.
There are other methods besides the ones which are mentioned in this paper, which
can be implemented. We strongly recommend to improve the balanced colouring
method by allowing the method to have several sweep on the graphs in refine the
distribution of colouring for each colour class.
51
-
8/8/2019 Parallel Graph Colouring Shared Memory
60/61
Bibliography
[1] A. Aho, J. Hopcroft, and J. Ullman. Data Structure and Algorithms. Addison-
Wesley Publishing Company, 1983.
[2] J. R. Allwright, R. Bordawekar, P. Coddington, K. Dincer, and C. L. Martin. A
Comparison or Parallel Graph Colouring Algorithms. Technical Report SCCS-666,
Northeast Parallel Architecture Centre, Syracuse University, 1995.
[3] D. Brelaz. New Methods to Colour the Vertices of a Graph. Communications Of
The ACM, 22:251, 1979.
[4] G. J. Chaitin, M. Auslander, A. K. Chandra, J. Cocke, M. E. Hopkins, and P. Mark-
stein. Register Allocation via Colouring. Computer Languages, 6:47 57, 1981.
[5] P. Coddington. DHPC Groups Beowulf Cluster Projects.
http://www.dhpc.adelaide.edu.au/projects/beowulf/index.html, accessed online
on 24th June 2002.
[6] T. Coleman and J. J.More. Estimation of Sparse Jacobian Matrices and Graph
Colouring Problems. SIAM Journal of Numerical Analysis, 20:187 209, 1983.
[7] EPCC. The Java Grande Forum Multithreaded Benchmarks.
http://www.epcc.ed.ac.uk/javagrande/ threads/contents.html, accessed online
on 24th May 2002.
[8] M. Garey and D. Johnson. Computers and Intractability. W.H. Freeman, New
York, 1979.
[9] M. Garey, D. Johnson, and H. C. So. An Application of Graph Colouring to Printed
Circuit Testing. IEEE Transactions On Circuit and Systems, pages 591 599, 1976.
[10] A. Gebremedhin, I. Lassous, J. Gustedt, and J. Telle. Graph Colouring on ACoarse
Grained Microprocessor. In Proceedings on WG 2000, 26th International Workshopon Graph-Theoretic Concepts in Computer Science, Germany, 15 17 Jun 2000.
[11] A. H. Gebremedhin and F. Manne. Scalable Parallel Graph Colouring Algorithms.
Concurrency: Practice and Experience, 12:1131 1146, May 2000.
52
-
8/8/2019 Parallel Graph Colouring Shared Memory
61/61