graphs and trees graph theory purpose: – in cs, data can either be linear or nonlinear –...

47
Graphs and Trees Graph theory • Purpose: In CS, data can either be linear or nonlinear Nonlinear data is used to depict relationships or a hierarchy A graph is a general way to model nonlinear data Overall goals Definition, examples, properties Representation (will look familiar to you!) Special types of graphs Ways to manipulate and analyze graphs

Upload: eleanor-george

Post on 29-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Graphs and Trees

• Graph theory• Purpose:– In CS, data can either be linear or nonlinear– Nonlinear data is used to depict relationships or a

hierarchy– A graph is a general way to model nonlinear data

• Overall goals– Definition, examples, properties– Representation (will look familiar to you!) – Special types of graphs– Ways to manipulate and analyze graphs

Page 2: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Graph

• Not like the graphs you see in calculus.• These graphs are more like “connect the dots.”• Many applications in CS:– Finite automata and other computer models– Networking– Compiler design: finding loops, syntax, track variable use– Data structures and algorithms: sometimes the things we

want to manipulate are relationships among data, instead of numbers

• You have seen graphs when evaluating Fibonacci numbers, which creates a “tree” of recursive calls.

Page 3: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Graph

Page 4: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Graph

• A nonlinear data structure• Useful to model any kind of network, or set of relationships• Questions we may want to ask:– How many vertices / edges are there?– Does an edge exist from x to y?– How far apart are x and y?– How many edges incident on x? (i.e. find the degree)– How many nodes are within some distance from x?– Is y reachable from x?– Is there a systematic way to visit every node and return

back to the beginning?

Page 5: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Definition & examples

• A graph has 2 sets: vertices and edges. The purpose of an edge is to “connect” two vertices to make them adjacent to each other.

A

B C

A B

DCF

E D

C

BA

Vertices: A, B, C A, B, C, D A, B, C, D, E, F

Edges: AB, BC, AC AB, AD AB, BC, CD, DE, EF, FA, AD, BE, CF

Page 6: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Representations

• Adjacency list– For each vertex in the graph, we maintain a list (e.g. linked

list or array list) of other vertices that are directly connected to this one

• Adjacency matrix – A 2-d array– The vertices are in some order, such as alphabetical order

(A, B, C, …)– The entry in row / column indicates whether there is an

edge or not. (1 or 0)– Elegantly handles weighted and directed graphs.

Page 7: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Internal rep’n

• Inside the computer, a graph is usually represented as an adjacency matrix.

• To check, should be symmetric.

A B C D E F G H

A 1 1 1

B 1 1 1

C 1 1 1

D 1 1 1 1

E 1 1 1

F 1 1 1

G 1 1 1

H 1 1

Page 8: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Degree sequence

• Degree of vertex = # of edges incident to it• Degree sequence: list of all degrees– For example, 2nd graph on previous slide: 0, 1, 1, 2

• Tells a lot about a graph (though not everything)• # edges in graph = ½ (sum of degrees)• Are these possible degree sequences?– 2, 3, 3, 4, 4, 5– 2, 3, 4, 4, 5– 5, 4, 3, 3, 3, 2– 4, 3, 3, 2, 2– 1, 1, 3, 3, 5, 5

Page 9: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Properties of graphs

• Graph properties– Bipartiteness– Isomorphic to another graph– Pseudograph, multigraph, subgraph

• Path• Cycle– Hamiltonian– Euler

Page 10: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Bipartite• Property that a graph may have• Useful to find a variable’s usage in a program• Bipartite = it is possible to partition the set of

vertices into 2 subsets, such that within a subset no two vertices are adjacent– As a consequence, you won’t see triangles anywhere.– To determine: try to partition vertices. Adjacent vertices

must go to different camps.

• Are these bipartite?

Page 11: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Isomorphism

• Meaning: same shape• How can we tell if 2 graphs are essentially the same?• Definition: the vertices can be put into a 1-1

correspondence. ( same adj matrix)• Easier to disprove when not isomorphic. Checklist:– Same # vertices and same # edges– Same degree sequence– Connectedness (both are, or both are not)– Existence of cycles– Adjacency of conspicuous vertices– Consider the complement if e > n(n – 1) / 4

Page 12: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Examples

• Isomorphic since A, B, C, D correspond to W, Z, Y, X

• Check out adjacency of degree-2 vertices!• Question: can a graph be isomorphic to its

complement?

BA

Z

X

YC

W

D

Page 13: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

How many graphs…

• How many nonisomorphic graphs have 4 vertices and 3 edges?

• For this question, let’s expand the definition of “graph” to include– Loop(s) on a single vertex: pseudograph– Multiple edges between same pair of vertices: multigraph

• I think the answer is 20. Can you draw all of them?– 3 loops (3); 2 loops (5); 1 loop (4)– Parallel (3); parallel and loop (2)– Simple (3)

Page 14: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Subgraph

• Analogous to a subset• To obtain a subgraph from an existing graph, feel free

to remove edges/vertices. But you can’t remove a vertex if some edge needs it!

• What are the subgraphs of a triangle? It’s convenient to classify by the number of edges.– e = 0: You only have vertices. 8 subsets of 3 elements.– e = 1: Choose which vertex is not connected. For each

case, you can even remove that vertex.– e = 2: Just choose which edge not to draw.– e = 3: The entire graph itself.

Page 15: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Paths and Cycles

• A path is simply a sequence of edges that allow us to “travel” from one vertex to another.– Formally, each edge’s first vertex must match the second

vertex of the previous edge. And analogously, each edge’s second vertex must match the next edge’s first vertex. In other words, you can’t just list the edges in random order.

– (Should not repeat vertices or edges along the way!

• Cycle = a path where first vertex = last vertex.• There are 2 interesting types of cycles – Hamiltonian: passes thru every vertex once– Euler: includes every edge once

Page 16: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Hamiltonian

• Can refer to a path or a cycle that…– goes thru every vertex exactly once– And, in the case of a cycle, returns home.

• When does a graph contain one?– There needs to be a subgraph where all vertices have

degree 2, such as a polygon.– The trick is to remove edges we don’t really need, and

recognize edges that are essential.– Essential edges we may be forced to visit a vertex twice.– Removing unnecessary edges graph may become

disconnected.

• *** Examples in book

Page 17: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Euler

• Can refer to a path or a cycle that…– Goes thru every edge exactly once

• Euler said it was impossible to cross all 7 bridges of Königsberg and return to the same point.

• Key: every vertex must have even degree

Page 18: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Tree

• A Tree is a special kind of graph– Definition– Why they are used– Huffman code– Binary search tree– Creating a tree using BFS or DFS– Mathematical expressions

Page 19: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Tree

• Connected acyclic graph• If n vertices, then n – 1 edges• Vertices partitioned into 2 types– Internal– External (leaf)

• Rooted tree: one specific vertex identified as special– Otherwise it’s called a “free” tree

• Important terminology for rooted trees:– Parent, child, sibling, ancestor, descendant, (uncle, niece)

Page 20: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Some tree applications

• Any hierarchical classification system• Structure of a document• File system

• A method for compressing data: Huffman code• Efficient data structure: Binary search tree• Visiting vertices of a graph systematically• Mathematical expression• Computer program / Call graph / Find loops• Depicting relationships among data

Page 21: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Huffman code example

• Suppose you want to send a message, and you know the only letters you need are A, D, E, L, N, P, S.

• A Huffman code might look like this table:

• How would you decode this message? 01110000101001000100110001

A D E L N P S

001 100 01 101 0001 0000 11

Page 22: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

How to create code

• We’re given the set of letters used for the message, and their frequencies.– Ex. A = 5, B = 10, C = 20, D = 25, E = 30– Ex. P = 5, N = 10, D = 10, L = 15, A = 20, S = 20, E = 30

• It’s convenient to arrange the frequencies in order.• Group the letters in pairs, always looking for the smallest sum

of frequencies.• The resulting structure is a “tree”. Each left arm = “0” in the

code; each right arm is a “1”.• When done, let’s compute average # bits per symbol.

Page 23: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Fun with trees

• Binary search tree• Tree traversals– Breadth first search– Depth first search

• Trees to model expressions• Traversals on binary trees– Inorder– Preorder– Postorder

Page 24: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Binary search tree

• Each vertex in tree has a key value. Can be number or text (ASCII code).

• For each vertex:– left child you right child– Or better yet: all in L subtree you all elements in R

subtree

• How do we …?– Find a value that may be in the tree– Insert a value– Find the highest/lowest key values– Find the range of values that can go in a vacant child

location

Page 25: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Traversing a tree

2 basic strategies• Breadth-first search (BFS)– Start at the root (top) of tree– Fan out in all directions simultaneously– Good if you think what you’re looking for is near the top.

• Depth-first search (DFS)– Start at the root, as usual– Go as far as you can down one path of the tree.– If not found, back up and try another path.– Good if you have an idea what area to search first.

Page 26: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

ExampleSuppose we’re looking for file at node 9. We visit the nodes of the tree in the following order.BFS: 1, 2, 3, 4, 5, 6, 7, 8, 9

DFS: 1, 2, 4 – back up5 – back up3, 6, 7, 9

1

2 3

4 56

7 8

9 10 11

Page 27: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Expression as tree

• Arithmetic expression is inherently hierarchical• We also have linear/text representations.

– Infix, prefix, postfix– Note: prefix and postfix do not need grouping symbols

• Example: (25 – 5) * (6 + 7) + 9 into a tree– Which is the last operator performed? This is the root.

And we can deduce where left and right subtrees are.– Next, for the subtree: (25 – 5) * (6 + 7), last op is the *, so this is the

“root” of this subtree.– Note:

• Numbers are leaves; operators are internal. This is why the tree drawing is straightforward.

Page 28: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Tree & traversal

• Given a (binary) tree, we can find its traversals. √• How about the other way?

– Mathematical expression had enough context information that 1 traversal would be enough.

– But in general, we need 2 traversals, one of them being inorder.

• Example: Draw the binary tree having these traversals. Postorder: S C X H R J Q T Inorder: S R C H X T J Q

– Hint: End of the postorder is the root of the tree. Find where the root lies in the inorder. This will show you the 2 subtrees. Continue with each subtree, finding its root and subtrees, etc.

• Exercise: Find 2 distinct binary trees t1 and t2 where preorder(t1) = preorder(t2) and postorder(t1) = postorder(t2).

Page 29: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Other graph types

• Directed graphs– Application: finding a loop in code

• Weighted graph– Finding the shortest path– Finding the cheapest network

Page 30: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Directed graph

• Also called digraphs• Each edge has a direction• Adjacency matrix rep’n: not necessarily

symmetric

• Used by compiler to represent control flow; or for analyzing relations

• How do you find a loop?

Page 31: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Where are the loops?

• Not hard for us

• But control-flow data just has sequence of blocks

• Need to gather info about transitions between blocks.

1

2

3

4

5

6

7

Page 32: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Finding loops

• For each block, determine– Successors

Where can I go immediately after this block?

– PredecessorsWhere could I have just come from?

– DominatorsWhere must I have been, to reach here?

Page 33: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Example1

2

3

4

5

6

7

block pred succ dom

1 - 2

2 1,6 3

3 2 4,6

4 3,5 5

5 4 4,6

6 3,5 2,7

7 6 -

Page 34: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Example1

2

3

4

5

6

7

block pred succ dom

1 - 2 1

2 1,6 3 1,2

3 2 4,6 1,2,3

4 3,5 5 1,2,3,4

5 4 4,6 1,2,3,4,5

6 3,5 2,7 1,2,3,6

7 6 - 1,2,3,6,7

Page 35: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Aha! A loop

• We have a loop whenever a block can say: “One of my successors is also one of my dominators.”

• In other words, I’m going to a place I’ve already been. Hence a back edge, and a loop.

Page 36: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Weighted graph

• Each edge is labelled with a number, implying some distance or cost.

• Adjacency matrix stores these values.– How do we represent that 2 vertices are not adjacent?

• Two big questions for weighted graph– Cheapest path to go from one vertex to another: This is

called Dijkstra’s algorithm.– Cheapest “network”, i.e. spanning tree

BFS & DFS don’t care about the weight of edges, so we need a different approach.

Page 37: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Adjacency matrix

• The values inside the adjacency matrix have a different meaning if the graph is weighted vs. unweighted.

• Here is what the numbers mean:

Situation Unweighted WeightedVertices adjacent 1 Non-zero numberVertices not adjacent 0 InfinityVertex itself Not a special case 0

Page 38: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Dijkstra’s algorithm

• How do you find the shortest path in a network?• General case solved by Edsger Dijkstra, 1959

4 7

3

6 8

3

1 6

9

2

7 4

Page 39: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

• Let’s say we want to go from “A” to “Z”.• The idea is to label each vertex with a

number – its best known distance from A. As we work, we may find a cheaper distance, until we “mark” or finalize the vertex.

1. Label A with 0, and mark A.2. Label A’s neighbors with their distances

from A.3. Find the lowest unmarked vertex and

mark it. Let’s call this vertex “B”.4. Recalculate distances for B’s neighbors via

B. Some of these neighbors may now have a shorter known distance.

5. Repeat steps 3 and 4 until you mark Z.

4 7

3 4

2

A

B C

Z

Page 40: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

First, we label A with 0. Mark A as final.The neighbors of A are B and C. Label B = 4

and C = 7.Now, the unmarked vertices are B=4 and C=7.

The lowest of these is B.Mark B, and recalculate B’s neighbors via B.

The neighbors of B are C and Z. – If we go to C via B, the total distance is

4+2 = 6. This is better than the old distance of 7. So re-label C = 6.

– If we go to Z via B, the total distance is 4 + 3 = 7.

4 7

3 4

2

A

B C

Z

Page 41: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Now, the unmarked vertices are C=6 and Z=7. The lowest of these is C.

Mark C, and recalculate C’s neighbors via B. The only unmarked neighbor of C is Z. – If we go to Z via C, the total distance is

6+4 = 10. This is worse than the current distance to Z, so Z’s label is unchanged.

The only unmarked vertex now is Z, so we mark it and we are done. Its label is the shortest distance from A.

4 7

3 4

2

A

B C

Z

Page 42: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Postscript. I want to clarify something…

The idea is to label each vertex with a number – its best known distance from A. As we work, we may find a cheaper distance, until we “mark” or finalize the vertex.

When you mark a vertex and look to recalculate distances to its neighbors:– We don’t need to recalculate distance

for a vertex if marked. So, only consider unmarked neighbors.

– We only update a vertex’s distance if it is an improvement: if it’s shorter than what we previously had.

4 7

3 4

2

A

B C

Z

Page 43: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Graph applications

• Shortest paths:– Practice Dijkstra’s algorithm– Traveling salesman problem

• Cheapest network– a.k.a. “minimum spanning tree”– Kruskal’s algorithm– Prim’s algorithm

Page 44: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Shortest Paths

• Dijkstra’s algorithm: What is the shortest distance between 2 points in a network/graph ?

• A related problem:What is the shortest distance for me to visit all the points in the graph and return home?This is called the traveling salesman problem. Nobody knows how to solve this problem without doing an exhaustive search! Open question in CS: why is this problem so hard?

Page 45: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

B

A C

DE

8 6

6

4

3

2

4

9

5

12

Page 46: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

Min spanning tree

• MST also known as the shortest network problem– Want to connect all vertices with minimum total length of edges.

• Applications– Sources of oil need to be connected to pipelines. Want to minimize

total mileage.– Private telecom networks are billed according to total mileage of the

network. Client should not have to pay for phone company’s inefficiency.

• Some Algorithms– O. Boruvka (1926) – published Slovak paper in obscure journal (first

known solution)– J. B. Kruskal (1956) – AT&T Bell Labs– R. C. Prim (1957) – also from Bell Labs

Page 47: Graphs and Trees Graph theory Purpose: – In CS, data can either be linear or nonlinear – Nonlinear data is used to depict relationships or a hierarchy

How to make one

• Kruskal’s algorithm– For n vertices, we want n – 1 cheapest edges– Repeatedly add edges from low to high, until you have

added n – 1 edges.

• Prim’s algorithm– Start with any vertex– Tree grows during algorithm…– Add cheapest edge that brings new vertex into tree

• In both cases, make sure you never create a cycle!