network information processing

A Quick Safari Through Network A Quick Safari Through Network Information ProcessingInformation Processing

Reza Rahimi,Software Engineering Systems,

University of Regina, Canada.

Problem Formulation Problem Formulation In NetworkingIn Networking

Main Problem: Main Problem: In a given network we want to transfer data from the group of sources to the group of destinations.

Constraints:Constraints: Network Topology and Architecture (in abstract level:

Directed Graph, Undirected Graph, Special Family of Graph( Trees, Mesh, Layered Graphs, Random Graphs, Geometric Graphs,…).

Physical Constraints ( capacity of the link, power constraint, noise,…).

Optimization Metrics:Optimization Metrics: Maximum amount of information into terminals

(Internet). Energy (Wireless Sensor Networks). Delay (Internet Telephone). Load Balancing (almost important in every networks). Fault Tolerant (specially in wireless networks).

It also can be divided into the 3 main sub problems for simplicity: Unicast:

We consider the transfer of data from one source to one destination.

Multicast: We consider the transfer of data from one source to group of

destinations but not all of the nodes.

Broadcast: We consider the transfer of data from one source to all of the

entire nodes in the network.

http://upload.wikimedia.org/wikipedia/commons/d/dc/Broadcast.svg

http://upload.wikimedia.org/wikipedia/commons/3/30/Multicast.svg

http://upload.wikimedia.org/wikipedia/commons/7/75/Unicast.svg

The main problem can be formulated in general using optimization methods, and sometimes can be solved in centralized or distributed manners at least in theory.

In many cases, the optimization approach will give us Integer Optimization which is generally NP-Hard problem. We should use relaxation to make it traceable.

Using another methods will usually give us much better insight and algorithms for the problem.

In this note I try to consider this problem according to the Maximum Amount of Information that Could be Transferred metric.

We will investigate the theoretical bounds for the problem and consider different techniques for achieving it.

Network Information Network Information ProcessingProcessing

Assumptions: Almost every time we consider directed acyclic

graphs (DAG). We assume that the capacity of each edge is

one unit.( one can easily converts each integer weighted graph to the normalized graph).

1

1

1

1

2

1

1

OR

Question1: What is the Maximum Amount of

Information could be transferred in Unicast scenario?

Question2: Is this Maximum Amount Traceable with

Deterministic, Randomized, or Distributed Algorithms?

Maximum-Flow Min-Cut Theorem Flow Network: A Flow network is a Finite

Directed Graph (not necessarily acyclic) G=(V,E) with the following features: Each edge e has positive capacity There is one single source. The is one terminal or destination source.

s

2

3

4

5 t 10

10

9

8

4

10

10 6 2

0ec

Flow Function: S-t Flow function is f:ER which has the following properties: It must be positive and should not exceed the capacity of each

edge.

��For each node except for s and t sum of the input flow must be equal to some of the output flow (Physical Law: ex. Information Conservation).

Flow Value: amount of information that enters into destination

node.

ecef )(0

Flow value = 12

capacity

s

2

3

4

5 t 10

10

9

8

4

10

10 6 2

5

3

3

7 7 9

2

0 0

flow

vofout vinto

)()( t}{s,-Vvee

efef

Question1: What is the maximum amount of information flow Question1: What is the maximum amount of information flow achievable in this network?achievable in this network?

First Attempt: Using LP to compute the amount in polynomial time (if integer valued are allowed it will be NP-Hard).

Second Attempt: Heuristic Methods Algorithm(G,s,t)

Assign the initial flow to zero. For every simple path from s to t in Graph G

(Greedily) push positive flow on with respect to constraints. update the flow.

S D

20/20

20/30

0/10

0/10 20/20

Flow Value = 20

S D

20/20

10/30

10/10

10/10 20/20

Flow Value = 30

=Max Flow

But How can we correct the previous algorithm? Suppose we made push forward in one path but

maybe our choice was not suitable so we put it on mind and write the reverse path.

With collecting this information, we get the second graph which is called Residual GraphResidual Graph.

S D

20/20

20/30

0/10

0/10 20/20

S D20

10

10 20

20

10

G: Gf:

So we can edit the previous algorithm as below: Ford-FulkersonFord-Fulkerson Method (G,s,t):

Start with zero flow. While there is a simple path between source and

destination in residual graph Gf : Push flow in it and update the flow function.

Lets consider one example graphically:

G:

Flow value = 0

capacity

s

2

3

4

5 t 10

10

9

8

4

10

10 6 2

0

0

0

0 0 0

0

0 0

flow

s

2

3

4

5 t 10 9

4

10 6 2

Gf:10 8

10

residualcapacity

s

2

3

4

5 t 10

10

9

8

4

10

10 6 2

8

0

0

0 0 8

8

0 0

G:

s

2

3

4

5 t 10

4

10 6 Gf:

8

8

8

Flow value = 8

9

22

2

10

210

X

X

X2X

0

s

2

3

4

5 t 10

10

9

8

4

10

10 6 2

10

0

0

0 2 10

8

2

G:

s

2

3

4

5 t

4

2

Gf:

10

8

Flow value = 10

10

2

10 7

10 6

X

66

6

X

X

8X

s

2

3

4

5 t 10

10

9

8

4

10

10 6 2

10

0

6

6 8 10

8

2

G:

s

2

3

4

5 t1

6 Gf:

10

8

Flow value = 16

10

8

6

6

6

4

4

4

2

X

8

2

8

X

X

0X

s

2

3

4

5 t 10

10

9

8

4

10

10 6 2

10

2

8

8 8 10

8

0

G:

s

2

3

4

5 t

6 2

Gf:

10

Flow value = 18

10

8

6

8

8

2

2 1

2

8 2

X

9

7 9

X

X

9X

X 3

s

2

3

4

5 t 10

10

9

8

4

10

10 6 2

10

3

9

9 9 10

7

0

G:

s

2

3

4

5 t 1 9

1

1 6 2

Gf:

10

7

Flow value = 19

10

6

9

9

3

1

Cut: s-t cut is a portion of the vertex set V into sets A and B such:

Cut Capacity: The capacity of and s-t cut denoted by :

BAVBA

BtAs

,

,

AofouteecBAc

),(

And finally we have the famous Max-Flow Min-Cut Theorem:

Max-Flow Min-Cut Theorem:

In every flow network the Ford-Fulkerson In every flow network the Ford-Fulkerson method Reaches the graph maximum flow method Reaches the graph maximum flow

and it is equal to minimum cut capacity.and it is equal to minimum cut capacity.

There are several Polynomial Time Algorithms suggested for this problem. The following table shows some of the famous ones.

SoSo wewe cancan reachreach thethe maximum maximum information transferring with information transferring with routingrouting (only with forwarding)(only with forwarding) inin polynomial time polynomial time inin Unicast Unicast

Scenario.Scenario.

Maximum Information Maximum Information Transferring in Transferring in

Multicasting ScenarioMulticasting Scenario

What is the maximum amount of information What is the maximum amount of information that could be transferred in multicasting that could be transferred in multicasting scenario?scenario?

The following graph shows the basic idea for multicasting.

Super Terminal

∞

∞

Insert super nodeand use max-FlowMin-Cut Theorem.

So we can not exceed this bound. Now another question arises:

How can we make much more diversity of How can we make much more diversity of independent packets in each independent packets in each

destination?destination?

Simple Routing with Forwarding

Packet DuplicationRouting with Duplicate and Forward

Lesson That we have learned:

With the usage of some functions in With the usage of some functions in each routing node, we could get each routing node, we could get

much more diversity of information much more diversity of information in each terminal nodes.in each terminal nodes.

Duplicate Duplicate

R+B

Duplicate

B,R+B

RR+B

Routing with Addition and Subtraction

In general we can model this technique as below (Linear OperationLinear Operation):

Note that one can not achieve more that Note that one can not achieve more that max flow for each terminal (Upper Bound).max flow for each terminal (Upper Bound).

x

z

ya

b

232221

131211

b

a

z

y

x

232221

131211

The previous technique is divided into two categories: Duplicate and Forward (Routing).Duplicate and Forward (Routing). Network Coding.Network Coding.

The first strategy is something that is used in current networking technology.

The second one may be used in near future.

Duplicate and Forward Duplicate and Forward SterategySterategy

It is obvious that if we let duplication a packet path would be treetree in DAG graph.

So we could formulate follows:

Packing Trees for getting Maximum Throughput in each terminal node.

There are some points about this formulation. Generally the number of trees are exponential

according to the size of input graph. If we consider only integer values it will be

Linear Integer Programming.

So where is the exact location of the tree packing problem in polynomial time hierarchy?

It can be proved that this problem is NP-Hard.

So it seems that in general the problem is hard.

Let’s simplify the problem a bit to see if it will be traceable.

Lets assume that we want to pack tree in a way that all of the terminals get the same number of colors.

It is obvious that the number of colors could not exceed than min max-flow (s,T).

Unfortunately this version again is not traceable.

It is equal to Packing Steiner TreesPacking Steiner Trees which is NP-Hard. Generally there is Generally there is nono PolynomialPolynomial TimeTime

AlgorithmsAlgorithms that we could optimally that we could optimally transfer packets with only transfer packets with only DuplicateDuplicate and and ForwardForward strategy in strategy in Multicasting (P≠NP).Multicasting (P≠NP).

Now if we empower each node with complete linear operation what will happen? (switching to network coding).

Linear Network Coding Linear Network Coding

In MulticastingIn Multicasting

We are working in GF(2GF(2qq)) field and assuming each packet is in this field.

All mathematical calculation is valid like real number field.

Just like previous session we assume that the graph is DAG.

There is no delay in each node for scrambling inputs to make outputs.

For Inputs we use XX variable, for intermediate Nodes YY and for the output signals to be recovered, ZZ.

x1

x2

xn

y(e1)

y(em)1

1

1

1

11

111

)(

.

)(

.

.

...

.

mmnnnmemem

nee

ey

ey

x

x

1

1

1

*

1*

11

)(

.

)(

)(

.

)(

.

...

.

*1

*

*1

*

mmnn

nmemeeme

eeee

ey

ey

ey

ey

n

n

y(e*1) y(e1)

y(em)y(e*2)

y(e*n)

z1

zn

y(e1)

y(e2)

y(em)1

1

1

1

,,1

1,1,1

.

)(

.

)(

.

...

.

nnmmmnnemne

eme

z

z

ey

ey

Type of nodes and their input-output relationType of nodes and their input-output relation

Conversion

v1

v2

v3

v4

e1

e5

e6

e2

e3

e7

x1

x2

x3

z1

z2

z3

e4

e1

e2

e3

e5

e4

e6

e7

x1

x2

x3

z1

z2

z3

It seems that each edge plays muchimportant rule

than nodes so we convert the original

to the new graphwhich each node

stands as the edge of the previous one.

e1

e2

e3

e5

e4

e6

e7

x1

x2

x3

z1

z2

z3

βe1,e5

βe1,e4

βe3,e7

βe2,e5

βe4,e7

βe4,e6βe2,e4

βe3,e6

α1,e1

α1,e2

α1,e3

α2,e1

α3,e1

α2,e2

α3,e2

α2,e3

α3,e3

εe5,3

εe5,1

εe5,2

εe6,1

εe7,1

εe6,2

εe7,2

εe6,3

εe7,3

0000000

0000000

0000000

00000

00000

00000

00000

7,46,4

7,36,3

5,24,2

5,14,1

eeee

eeee

eeee

eeee

F

Internal Matrix:

3,73,63,5

2,72,62,5

1,71,61,5

0000

0000

0000

eee

eee

eee

B

Output Matrix:

0

0

0

000

000

000

3,32,31,3

3,22,21,2

3,12,11,1

eee

eee

eee

A

Input Matrix:

Question: How we can relate inputs and Question: How we can relate inputs and outputs using these Matrices?outputs using these Matrices? It is obvious that A shows the inputs inject into

the network and the same, B shows that how network information inject into outputs.

How can we get the propagation in the network?

We must find all walk between source edges and output edges.

It can be proved easily, according to some algebraic graph theory algorithms that:

T

i

i BFAxz0

We can simplify the previous equation by the following assumption. If we make the graph in topological order then we will

get the simpler equation:

And finally with some more challenges with have the famous network coding theorem:

TBFIAxz 1

Consider a DAG G with unit capacities Consider a DAG G with unit capacities that has a single source node s (with h that has a single source node s (with h sources) and a set of terminal nodes T. sources) and a set of terminal nodes T. The multicast property with rate h is said The multicast property with rate h is said to be satisfied if max-Flow (s,Ti) ≥ h for to be satisfied if max-Flow (s,Ti) ≥ h for all Ti. If G satisfied the multicast property all Ti. If G satisfied the multicast property a network code that supports the a network code that supports the multicast rate h is guaranteed to exist as multicast rate h is guaranteed to exist as long as the field size is larger than |T |.long as the field size is larger than |T |.

So if the field size is large enough there always exists network coding scheme that reaches the limit.

The are some polynomial time algorithms suggested for making network codes, for example LIF and Randomized Network Coding Algorithms.

For some special graphs with network coding we could reach the maximum flow for each node.

So in summary we have:

With network coding we can With network coding we can reach the maximum throughput reach the maximum throughput

in polynomial time.in polynomial time.

Comparison between two Comparison between two methods in Multicasting methods in Multicasting

ScenarioScenario

What is the theoretical Gap between What is the theoretical Gap between Network Coding and Routing?Network Coding and Routing?

It can be proved that if the graph is directed the gap is very large (Ω(logn): where n is the number of terminals).

But if the graph is not directed the gap is in the order of constant number.

Network Coding Example

Suppose the following Directed Graph: Gha,b

Lemma: Under routing the capacity of the Gh

2h,C(2h,h) is less than 2. with network coding the capacity of the

network could be h. with some error control coding codes

we can get the maximum capacity for network coding.

Example Reed-Solomon Codes:

nkkk

nkn

nn

n

kk

MC

CCCC

qGF

MMMM

knRSolomoned

12

11

1

2111

1101

i21

1101

...

......

...

1...11

],...,,[

.0, )(,...,,

],...,,[

::),( Re

The structure of the above matrix is VandermondeAnd with any h subset of the Codeword we can makethe original message.

So in the source node we can use RS(a,h) and in the terminals the original signal can be made.

This concept is sometimes categorized as the source coding.

Maximum Information Maximum Information Transferring in Transferring in

Broadcasting ScenarioBroadcasting Scenario

In Broadcasting according to the Edmond’s paper we can always pack k- edge disjoint spanning trees where k=min max-flow (s,Ti).

So in this scenario, routing with duplication has the same power as network coding in general case.

Conclusion

The basics of routing and its theoretical bounds are reviewed.

The basics of network coding and its theoretical bounds are reviewed.

It seems that in general network coding gives us much more throughput, but contains more computational complexity than general routing.

Unicast Multicast BroadcastNetwork Coding, Routing

The same as each

other.

The performance of Network Coding is much better and to use routing we face NP-Hard Problem.

The same as each other.

network information processing

Education