week 11: march 26-30, 2018 - ee.psu.edu · message passing algorithms discussed here are ... 11-2...

EE564/CSE554: Error Correcting Codes Spring 2018

Week 11: March 26-30, 2018Lecturer: Viveck R. Cadambe Scribe: Himanshu Sukheja

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.They may be distributed outside this class only with the permission of the Instructor.

11.1 Message Passing Algorithms - Some elementary puzzles andalgorithms.

It turns out that quite a few decoding problems for codes on graphs can be solved by message − passingalgorithms, in which simple messages are passed locally among simple processors whose operations lead,after sometime, to the solution of the global problem. Message passing algorithms discussed here are usefulfor more general applications outside of coding theory, specifically for inference in graphical models. We willmotivate these algorithms through some simple puzzles.

All information node i receives is ”Extrinsic” to node j

11.1.1 Algorithm 1 - Counting Number of Nodes in a Tree

Consider a situation where there are nodes in an undirected (finite) tree, and each node knows only itsneighbors. The goal is for each node to be able to count all the nodes in the graph. We give a ”messagepassing” algorithm below, where each node passes messages to its neighbors, and is able to evaluate the totalnumber of nodes.

1. Count your number of neighbors, N.

2. Keep count of number of messages you have received from your neighbors, m, and of the values v1, v2,. . . , vn of each of those messages. Let V be the running total of the messages you have received.

3. If the number of messages you have received, m is equal to N - 1, then identify the neighbor who hasnot sent you a message and tell them the number V + 1.

11-1

11-2 Week 11: March 26-30, 2018

4. If the number of messages you have received is equal to N, then:

• the number V + 1 is required total.

• for each neighbor n [ say to neighbor n the number V +1 - vn. ]

11.1.2 Algorithm 2 - Path counting problem

A more thoughtful task than counting messages is the task of counting the number of paths through a grid,and finding how many paths pass through any given point in the grid.

Messages sent in the forward path

In the above figure we are given a rectangular grid, and a path through the grid, connecting points A to B.A valid path is one that starts from A and proceeds to B by rightward and downward moves. Questionswhich arise are :

1) How many paths from A to B ?

2) For an arbitrary node how many paths go through that node?

Counting all the paths from A to B doesn’t seem straightforward. The number of paths is expected to bepretty huge - even if the permitted grid were a diagonal strip only three nodes wide, there would still beabout 2N/2 possible paths.

For any node i

Let P(i) be number of paths to i from A ; N(i) be inward-neighbor of i then,

P (i) =∑

j∈N(i)

P (j)

The more computational breakthrough in this problem is to realize that to find the number of paths, we donot have to enumerate all the paths explicitly. Pick a point P in the grid and consider the number of pathsfrom A to P. every path from A to P must come in to P through one of its upstream neighbors (as given inthe above figure).

We start by sending the ’1’ message from A. When any node has received messages from all its upstreamneighbors, it sends the sum of them on to its downstream neighbors. At B, number 5 emerges: we havecounted the number of paths from A to B without enumerating them all.

Week 11: March 26-30, 2018 11-3

Messages sent in the forward and backward passes

Probability of passing through a node

By making a backward pass as well as the forward pass, we can deduce how many of the paths go througheach node; and if we divide that by the total number of paths, we obtain the probability that a randomlyselected path passes through that node. The above figure shows the backward passing messages in thelower right corners of the tables, and the original forward-passing messages in the upper left corners. Bymultiplying these two numbers at a given vertex we find the total number of paths passing through thatvertex.

11.1.3 Algorithm 3 - Minimum Sum Algorithm/Viterbi Algorithm

Route diagram for A to B, showing costs associated with the edges

Imagine you wish to travel as fast as possible from A to B. The various possible routes are shown in thefigure, along with the cost of traversing each edge in the graph. For example, the route A-I-L-N-B has cost of8. W e would like to find the lowest cost path without explicitly evaluating the cost of all paths. We can dothis efficiently by finding for each node what the cost of the lowest-cost path to that node from A is. Thesequantities can be computed by message-passing, starting from node A. The messagepassing algorithm iscalled the min-sum algorithm or Viterbi algorithm.

For brevity, we’ll call the cost of the lowest cost path from node A to node x ’the cost of x’. Each node canbroadcast its cost to its descendants once it knows the cost of all its possible predecessors.

11-4 Week 11: March 26-30, 2018

The cost of A is zero. We pass this news to H and I. As the message passes along each edge in the graph, thecost of that edge is added. We find the cosets of H and I are 4 and 1 respectively. Similarly then, the costs Jand L are found to be 6 and 2 respectively. Out of edge H-K comes the message that a path of cost 5 existsfrom A to K via H; and from edge I-K we learn of an alternative path of cost 3. The min-sum algorithm setsthe cost of K equal to minimum of these, and records which was the smallest-cost route into K by retainingonly the edge I-K and pruning away the other edges leading to K. We can follow similar iterations of thealgorithm which reveal that there is a path from A to B with cost 6.

Note: If the min-sum algorithm encounters a tie, where the minimum cost path to a node is achieved bymore than one route to it, then the algorithm can pick any of those routes at random.

We can recover this lowest cost path by backtracking from B, following the trail of surviving edges back toA. We deduce that the lowest-cost path is A-I-K-M-B.

The lowest cost route from A to B

Application in decoding: The Viterbi algorithm is used in decoding to determine the most probablepath from A to B. Imagine a graph G as above, except that the number on the edge is replaced by theprobability p that the path goes through the edge, given that the path goes through the leading vertex. Notethat weights of all outgoing edges from a vertex sum to 1. Note also that the probability of any path is theproduct of the weights of the edges along that path (Exercise: Prove this!). Then, the most probable aathis equivalent to finding the least cost path on the same graph, with the edge weight replaced by −logp; thecost of a path in the latter graph is negative of the logarithm of probability of the path the original graph.

11.2 Directed Graphical Model

A directed graphical model consists of a directed acyclig graph G = (V, ε), where vertices in V representirandom variables, and directed edges(arrows) ε ⊆ V x V. ( the notation (i, j) ∈ ε means that there is adirected edge from i to j.)

Directed graphs define families of distributions which factor by functions of nodes and their parents. Inparticular, we assign to each node i a random variable xi and a non-negative-valued function fi(xi, xπi) suchthat ∑

xi∈Xfi(xi, xπi) = 1

∏i

fi(xi, xπi) = p(xi, . . . , xn)

where πi denotes the set of parents of node i. Assuming the graph is acyclic (has no directed cycles),we must have fi(xi, xπi) = px|xπi (xi|xπi), i.e. fi(·, ·) represents the conditional probability distribution of

Week 11: March 26-30, 2018 11-5

xi conditioned on its parents. If graph has a cycle, then there is no consistent way to assign conditionalprobability distributions for the cycle. Therefore, we assume that all directed graphical models are overdirected acyclic graphs (DAGs).

G = (V, E ) , V = [x1, x2, . . . , xn]

In general, by the chain rule, any joint distribution of any n random variables [x1, x2, . . . , xn] can be writtenas

px1, . . . , xn(x1, . . . , xn) = px1

(x1)px2|x1(x2|x1) . . . pxn|x1,...,xn−1

(xn|x1 . . . , xn−1).

From the above figure we can deduce that,

P (A,B,C,D,E) = P (A)P (B/A)P (C/A,B)P (D/A,B,C)P (E/A,B,C)

=⇒ P (A)P (B)P (C/A,B)P (D/B)P (E/C)

11.2.1 Factor Graphs

Factor graphs are capable of capturing structure that the traditional directed (and undirected) graphicalmodels are not capable of capturing. A factor graph consists of a vector of random variables x = (x1, . . . , xN )and graph G = (V, E, F), which in addition to normal nodes also has a factor nodes F. Furthermore, thegraph is constrained to be a bipartite graph between variable nodes and factor nodes.

Joint probability distribution associated to a factor graph is given by

p(x1, . . . , xN ) =1

Z

m∏j=1

fj(xfj ).

11-6 Week 11: March 26-30, 2018

A general factor graph

For example, in the above figure f1 is a function of x1 and x2

For a binary code such as an LDPC code, the tanner graph corresponds its factor graph, e.g.,

x1 + x2 = 0 ; x3 + x4 = 0 ; x5 + x6 = 0 . . . x(n−1) + xn = 0

Also,

• Given x1, x2, . . . xn as per graphical model.

P (xi = xn)→ Marginalization → Sum product algorithm.

• max~x P (~x)→ called the max-sum algorithm.

11.3 Sum-Product Algorithm on trees

We can obtain the marginals for every node in the graph by computing 2(N − 1) messages, one for eachdirection along each edge. When computing the message mi→j(xj), we need the incoming messages mk→i(xi)for its other neighbors k ∈ N(i) \ j.

Px1,x2,...,xn =∏ni=1 Φi(xi)Pxi , x(πi) [where πi → parents]

Note that in a tree, every node has only one parent, so all factors have two terms. The term Φi(xi) knownas potentials makes it convenient to calculate marginals and conditional marginal distributions in manyapplications. In the example graph, note that

Px1,x2,x3,x4,x5(x1, x2, x3, x4, x5)

Our goal is to calculate

Px1(x1) =∑

x2,x3,x4,x5

φ1(x1)φ2(x2)φ3(x3)φ4(x4)φ5(x5)f4(x2, x4)f5(x2, x5)f2(x2, x1)f3(x3, x1)

We compute the following messages from the bottom of the tree rooted at x1.

m4→2(x2) =∑x4φ4(x5)f4(x4, x2)

m5→2(x2) =∑x5φ5(x5)f5(x5, x2)

Write, Px1,x2,x3(x1, x2, x3) = φ1(x1)φ2(x2)φ3x3f2(x4, x1)f3(x3, x1)m5→2(x2)m4→2(x2)

Week 11: March 26-30, 2018 11-7

Similarly, m2→1(x1) =∑x2f2(x2, x1)φ2(x2)m5→2(x2)m4→2(x2)

Hence, we can compute

Px1(x1) = φ1(x1)m2→1(x1)m3→1(x1)

m1→3(x3) =∑x1φ1(x1)f3(x3, x1)m2→1(x1)

We can say that in sum-product algorithm , messages can be computed using following rule:

mi→j(xj) =∑xi

φi(xi)fi|j(xi, xj)∏

k∈N(i)\(j)

mk→i(xi)

The marginal for each variable is obtained using the formula:

pxi(xi) =∏

j∈N(i)

mj→i(xi)

week 11: march 26-30, 2018 - ee.psu.edu · message passing algorithms discussed here are ... 11-2...

Documents