Information Capacity of the BSC Permutation Channel
Anuran Makur
EECS Department, Massachusetts Institute of Technology
Allerton Conference 2018
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 1 / 21
Outline
1 IntroductionMotivation: Coding for Communication NetworksThe Permutation Channel ModelCapacity of the BSC Permutation Channel
2 Achievability
3 Converse
4 Conclusion
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 2 / 21
Motivation: Point-to-point Communication in Networks
NETWORK
SENDER RECEIVER
Model communication network as a channel:
Alphabet symbols = all possible L-bit packetsmultipath routed networkpackets are impaired
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 3 / 21
Motivation: Point-to-point Communication in Networks
NETWORK
SENDER RECEIVER
Model communication network as a channel
:
Alphabet symbols = all possible L-bit packetsmultipath routed networkpackets are impaired
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 3 / 21
Motivation: Point-to-point Communication in Networks
NETWORK
SENDER RECEIVER
Model communication network as a channel:
Alphabet symbols = all possible L-bit packets ⇒ 2L input symbols
multipath routed networkpackets are impaired
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 3 / 21
Motivation: Point-to-point Communication in Networks
NETWORK
SENDER RECEIVER
Model communication network as a channel:
Alphabet symbols = all possible L-bit packetsmultipath routed network or evolving network topology
packets are impaired
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 3 / 21
Motivation: Point-to-point Communication in Networks
NETWORK
SENDER RECEIVER
Model communication network as a channel:
Alphabet symbols = all possible L-bit packetsmultipath routed network ⇒ packets received with transpositions
packets are impaired
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 3 / 21
Motivation: Point-to-point Communication in Networks
NETWORK
SENDER RECEIVER
Model communication network as a channel:
Alphabet symbols = all possible L-bit packetsmultipath routed network ⇒ packets received with transpositionspackets are impaired (e.g. deletions, substitutions)
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 3 / 21
Motivation: Point-to-point Communication in Networks
NETWORK
SENDER RECEIVER
Model communication network as a channel:
Alphabet symbols = all possible L-bit packetsmultipath routed network ⇒ packets received with transpositionspackets are impaired ⇒ model using channel probabilities
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 3 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
Abstraction:
n-length codeword = sequence of n packets:Random permutation block: Randomly permute packets of codeword
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
RANDOM DELETION
RANDOM PERMUTATION
Abstraction:
n-length codeword = sequence of n packets
:
Random permutation block: Randomly permute packets of codeword
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
RANDOM DELETION
RANDOM PERMUTATION
Abstraction:
n-length codeword = sequence of n packetsRandom deletion channel: Delete each symbol/packet of codewordindependently with probability p ∈ (0, 1)
Random permutation block: Randomly permute packets of codeword
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
RANDOM DELETION
RANDOM PERMUTATION
Abstraction:
n-length codeword = sequence of n packetsRandom deletion channel: Delete each symbol/packet of codewordindependently with probability p ∈ (0, 1)
Random permutation block: Randomly permute packets of codeword
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
RANDOM DELETION
RANDOM PERMUTATION
Abstraction:
n-length codeword = sequence of n packetsRandom deletion channel: Delete each symbol/packet of codewordindependently with probability p ∈ (0, 1)Random permutation block: Randomly permute packets of codeword
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
RANDOM DELETION
RANDOM PERMUTATION
Abstraction:
n-length codeword = sequence of n packetsRandom deletion channel: Delete each symbol/packet of codewordindependently with probability p ∈ (0, 1)Random permutation block: Randomly permute packets of codeword
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
ERASURE CHANNEL
RANDOM PERMUTATION?
?
Abstraction:
n-length codeword = sequence of n packetsEquivalent Erasure channel: Erase each symbol/packet of codewordindependently with probability p ∈ (0, 1)Random permutation block: Randomly permute packets of codeword
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
ERASURE CHANNEL
RANDOM PERMUTATION?
?1 2 33
31
1
Abstraction:
n-length codeword = sequence of n packetsErasure channel: Erase each symbol/packet of codewordindependently with probability p ∈ (0, 1)Random permutation block: Randomly permute packets of codewordCoding: Add sequence numbers (packet size = L + log(n) bits,alphabet size = n 2L)
More refined coding techniques simulate sequence numbers,e.g. [Mitzenmacher 2006], [Metzner 2009]
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
ERASURE CHANNEL
RANDOM PERMUTATION?
?1 2 33
31
1
Abstraction:
n-length codeword = sequence of n packetsErasure channel: Erase each symbol/packet of codewordindependently with probability p ∈ (0, 1)Random permutation block: Randomly permute packets of codewordCoding: Add sequence numbers and use standard coding techniques
More refined coding techniques simulate sequence numbers,e.g. [Mitzenmacher 2006], [Metzner 2009]
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
ERASURE CHANNEL
RANDOM PERMUTATION?
?1 2 33
31
1
Abstraction:
n-length codeword = sequence of n packetsErasure channel: Erase each symbol/packet of codewordindependently with probability p ∈ (0, 1)Random permutation block: Randomly permute packets of codewordCoding: Add sequence numbers and use standard coding techniquesMore refined coding techniques simulate sequence numbers,e.g. [Mitzenmacher 2006], [Metzner 2009]
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
Example: Coding for Random Deletion Network
Consider a communication network where packets can be dropped:
NETWORK
SENDER RECEIVER
ERASURE CHANNEL
RANDOM PERMUTATION?
?
Abstraction:
n-length codeword = sequence of n packetsErasure channel: Erase each symbol/packet of codewordindependently with probability p ∈ (0, 1)Random permutation block: Randomly permute packets of codeword
How do you code in such channelswithout increasing alphabet size?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 4 / 21
The Permutation Channel Model
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Sender sends message M ∼ Uniform(M)
Possibly randomized encoder fn :M→ X n produces codewordX n1 = (X1, . . . ,Xn) = fn(M) (with block-length n)
Discrete memoryless channel PZ |X with input and output alphabetsX and Y produces Zn
1 :
PZn1 |X n
1(zn1 |xn1 ) =
n∏i=1
PZ |X (zi |xi )
Random permutation generates Y n1 from Zn
1
Possibly randomized decoder gn : Yn →M produces estimateM = gn(Y n
1 ) at receiver
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 5 / 21
The Permutation Channel Model
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Sender sends message M ∼ Uniform(M)
Possibly randomized encoder fn :M→ X n produces codewordX n1 = (X1, . . . ,Xn) = fn(M) (with block-length n)
Discrete memoryless channel PZ |X with input and output alphabetsX and Y produces Zn
1 :
PZn1 |X n
1(zn1 |xn1 ) =
n∏i=1
PZ |X (zi |xi )
Random permutation generates Y n1 from Zn
1
Possibly randomized decoder gn : Yn →M produces estimateM = gn(Y n
1 ) at receiver
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 5 / 21
The Permutation Channel Model
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Sender sends message M ∼ Uniform(M)
Possibly randomized encoder fn :M→ X n produces codewordX n1 = (X1, . . . ,Xn) = fn(M) (with block-length n)
Discrete memoryless channel PZ |X with input and output alphabetsX and Y produces Zn
1 :
PZn1 |X n
1(zn1 |xn1 ) =
n∏i=1
PZ |X (zi |xi )
Random permutation generates Y n1 from Zn
1
Possibly randomized decoder gn : Yn →M produces estimateM = gn(Y n
1 ) at receiver
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 5 / 21
The Permutation Channel Model
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Sender sends message M ∼ Uniform(M)
Possibly randomized encoder fn :M→ X n produces codewordX n1 = (X1, . . . ,Xn) = fn(M) (with block-length n)
Discrete memoryless channel PZ |X with input and output alphabetsX and Y produces Zn
1 :
PZn1 |X n
1(zn1 |xn1 ) =
n∏i=1
PZ |X (zi |xi )
Random permutation generates Y n1 from Zn
1
Possibly randomized decoder gn : Yn →M produces estimateM = gn(Y n
1 ) at receiver
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 5 / 21
The Permutation Channel Model
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Sender sends message M ∼ Uniform(M)
Possibly randomized encoder fn :M→ X n produces codewordX n1 = (X1, . . . ,Xn) = fn(M) (with block-length n)
Discrete memoryless channel PZ |X with input and output alphabetsX and Y produces Zn
1 :
PZn1 |X n
1(zn1 |xn1 ) =
n∏i=1
PZ |X (zi |xi )
Random permutation generates Y n1 from Zn
1
Possibly randomized decoder gn : Yn →M produces estimateM = gn(Y n
1 ) at receiver
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 5 / 21
Coding for the Permutation Channel
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
General Principle:“Encode the information in an object that is invariant under the[permutation] transformation.” [Kovacevic-Vukobratovic 2013]
Multiset codes are studied in [Kovacevic-Vukobratovic 2013],[Kovacevic-Vukobratovic 2015], and [Kovacevic-Tan 2018]
What about the information theoreticaspects of this model?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 6 / 21
Coding for the Permutation Channel
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
General Principle:“Encode the information in an object that is invariant under the[permutation] transformation.” [Kovacevic-Vukobratovic 2013]
Multiset codes are studied in [Kovacevic-Vukobratovic 2013],[Kovacevic-Vukobratovic 2015], and [Kovacevic-Tan 2018]
What about the information theoreticaspects of this model?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 6 / 21
Coding for the Permutation Channel
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
General Principle:“Encode the information in an object that is invariant under the[permutation] transformation.” [Kovacevic-Vukobratovic 2013]
Multiset codes are studied in [Kovacevic-Vukobratovic 2013],[Kovacevic-Vukobratovic 2015], and [Kovacevic-Tan 2018]
What about the information theoreticaspects of this model?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 6 / 21
Information Capacity of the Permutation Channel
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Average probability of error Pnerror , P(M 6= M)
“Rate” of encoder-decoder pair (fn, gn):
R ,log(|M|)
log(n)
|M| = nR
Rate R ≥ 0 is achievable ⇔ ∃{(fn, gn)}n∈N such that limn→∞
Pnerror = 0
Definition (Permutation Channel Capacity)
Cperm(PZ |X ) , sup{R ≥ 0 : R is achievable}
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 7 / 21
Information Capacity of the Permutation Channel
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Average probability of error Pnerror , P(M 6= M)
“Rate” of encoder-decoder pair (fn, gn):
R ,log(|M|)
log(n)
|M| = nR
Rate R ≥ 0 is achievable ⇔ ∃{(fn, gn)}n∈N such that limn→∞
Pnerror = 0
Definition (Permutation Channel Capacity)
Cperm(PZ |X ) , sup{R ≥ 0 : R is achievable}
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 7 / 21
Information Capacity of the Permutation Channel
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Average probability of error Pnerror , P(M 6= M)
“Rate” of encoder-decoder pair (fn, gn):
R ,log(|M|)
log(n)
|M| = nR
Rate R ≥ 0 is achievable ⇔ ∃{(fn, gn)}n∈N such that limn→∞
Pnerror = 0
Definition (Permutation Channel Capacity)
Cperm(PZ |X ) , sup{R ≥ 0 : R is achievable}
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 7 / 21
Information Capacity of the Permutation Channel
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Average probability of error Pnerror , P(M 6= M)
“Rate” of encoder-decoder pair (fn, gn):
R ,log(|M|)
log(n)
|M| = nR because number of empirical distributions of Y n1 is poly(n)
Rate R ≥ 0 is achievable ⇔ ∃{(fn, gn)}n∈N such that limn→∞
Pnerror = 0
Definition (Permutation Channel Capacity)
Cperm(PZ |X ) , sup{R ≥ 0 : R is achievable}
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 7 / 21
Information Capacity of the Permutation Channel
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Average probability of error Pnerror , P(M 6= M)
“Rate” of encoder-decoder pair (fn, gn):
R ,log(|M|)
log(n)
|M| = nR
Rate R ≥ 0 is achievable ⇔ ∃{(fn, gn)}n∈N such that limn→∞
Pnerror = 0
Definition (Permutation Channel Capacity)
Cperm(PZ |X ) , sup{R ≥ 0 : R is achievable}
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 7 / 21
Information Capacity of the Permutation Channel
ENCODER CHANNEL RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Average probability of error Pnerror , P(M 6= M)
“Rate” of encoder-decoder pair (fn, gn):
R ,log(|M|)
log(n)
|M| = nR
Rate R ≥ 0 is achievable ⇔ ∃{(fn, gn)}n∈N such that limn→∞
Pnerror = 0
Definition (Permutation Channel Capacity)
Cperm(PZ |X ) , sup{R ≥ 0 : R is achievable}
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 7 / 21
Capacity of the BSC Permutation Channel
ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Channel is binary symmetric channel, denoted BSC(p):
∀z , x ∈ {0, 1}, PZ |X (z |x) =
{1− p, for z = x
p, for z 6= x
Alphabets are X = Y = {0, 1}Assume crossover probability p ∈ (0, 1) and p 6= 1
2
Main Question
What is the permutation channel capacity of the BSC?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 8 / 21
Capacity of the BSC Permutation Channel
ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Channel is binary symmetric channel, denoted BSC(p):
∀z , x ∈ {0, 1}, PZ |X (z |x) =
{1− p, for z = x
p, for z 6= x
Alphabets are X = Y = {0, 1}
Assume crossover probability p ∈ (0, 1) and p 6= 12
Main Question
What is the permutation channel capacity of the BSC?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 8 / 21
Capacity of the BSC Permutation Channel
ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Channel is binary symmetric channel, denoted BSC(p):
∀z , x ∈ {0, 1}, PZ |X (z |x) =
{1− p, for z = x
p, for z 6= x
Alphabets are X = Y = {0, 1}Assume crossover probability p ∈ (0, 1) and p 6= 1
2
Main Question
What is the permutation channel capacity of the BSC?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 8 / 21
Capacity of the BSC Permutation Channel
ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Channel is binary symmetric channel, denoted BSC(p):
∀z , x ∈ {0, 1}, PZ |X (z |x) =
{1− p, for z = x
p, for z 6= x
Alphabets are X = Y = {0, 1}Assume crossover probability p ∈ (0, 1) and p 6= 1
2
Main Question
What is the permutation channel capacity of the BSC?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 8 / 21
Outline
1 Introduction
2 AchievabilityEncoder and DecoderTesting between Converging HypothesesIntuition via Central Limit TheoremSecond Moment Method for TV Distance
3 Converse
4 Conclusion
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 9 / 21
Warm-up: Sending Two Messages
ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Fix a message m ∈ {0, 1}
, and encode m as fn(m) = X n1
i.i.d.∼ Ber(qm)
𝑞13
𝑞23
10
Memoryless BSC(p) outputs Zn1
i.i.d.∼ Ber(p ∗ qm), wherep ∗ qm , p(1− qm) + qm(1− p) is the convolution of p and qm
Random permutation generates Y n1
i.i.d.∼ Ber(p ∗ qm)
Maximum Likelihood (ML) decoder: M = 1{1n
∑ni=1 Yi ≥ 1
2
}1n
∑ni=1 Yi → p ∗ qm in probability as n→∞ [WLLN]
⇒ limn→∞
Pnerror = 0 as p ∗ q0 6= p ∗ q1
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 10 / 21
Warm-up: Sending Two Messages
ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Fix a message m ∈ {0, 1}, and encode m as fn(m) = X n1
i.i.d.∼ Ber(qm)
𝑞13
𝑞23
10
Memoryless BSC(p) outputs Zn1
i.i.d.∼ Ber(p ∗ qm), wherep ∗ qm , p(1− qm) + qm(1− p) is the convolution of p and qm
Random permutation generates Y n1
i.i.d.∼ Ber(p ∗ qm)
Maximum Likelihood (ML) decoder: M = 1{1n
∑ni=1 Yi ≥ 1
2
}1n
∑ni=1 Yi → p ∗ qm in probability as n→∞ [WLLN]
⇒ limn→∞
Pnerror = 0 as p ∗ q0 6= p ∗ q1
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 10 / 21
Warm-up: Sending Two Messages
ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Fix a message m ∈ {0, 1}, and encode m as fn(m) = X n1
i.i.d.∼ Ber(qm)
𝑞13
𝑞23
10
Memoryless BSC(p) outputs Zn1
i.i.d.∼ Ber(p ∗ qm), wherep ∗ qm , p(1− qm) + qm(1− p) is the convolution of p and qm
Random permutation generates Y n1
i.i.d.∼ Ber(p ∗ qm)
Maximum Likelihood (ML) decoder: M = 1{1n
∑ni=1 Yi ≥ 1
2
}1n
∑ni=1 Yi → p ∗ qm in probability as n→∞ [WLLN]
⇒ limn→∞
Pnerror = 0 as p ∗ q0 6= p ∗ q1
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 10 / 21
Warm-up: Sending Two Messages
ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Fix a message m ∈ {0, 1}, and encode m as fn(m) = X n1
i.i.d.∼ Ber(qm)
𝑞13
𝑞23
10
Memoryless BSC(p) outputs Zn1
i.i.d.∼ Ber(p ∗ qm), wherep ∗ qm , p(1− qm) + qm(1− p) is the convolution of p and qm
Random permutation generates Y n1
i.i.d.∼ Ber(p ∗ qm)
Maximum Likelihood (ML) decoder: M = 1{1n
∑ni=1 Yi ≥ 1
2
}1n
∑ni=1 Yi → p ∗ qm in probability as n→∞ [WLLN]
⇒ limn→∞
Pnerror = 0 as p ∗ q0 6= p ∗ q1
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 10 / 21
Warm-up: Sending Two Messages
ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Fix a message m ∈ {0, 1}, and encode m as fn(m) = X n1
i.i.d.∼ Ber(qm)
𝑞13
𝑞23
10
Memoryless BSC(p) outputs Zn1
i.i.d.∼ Ber(p ∗ qm), wherep ∗ qm , p(1− qm) + qm(1− p) is the convolution of p and qm
Random permutation generates Y n1
i.i.d.∼ Ber(p ∗ qm)
Maximum Likelihood (ML) decoder: M = 1{1n
∑ni=1 Yi ≥ 1
2
}
1n
∑ni=1 Yi → p ∗ qm in probability as n→∞ [WLLN]
⇒ limn→∞
Pnerror = 0 as p ∗ q0 6= p ∗ q1
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 10 / 21
Warm-up: Sending Two Messages
ENCODER BSC 𝒑 RANDOM PERMUTATION DECODER
𝑀 𝑋 𝑍 𝑌 𝑀
Fix a message m ∈ {0, 1}, and encode m as fn(m) = X n1
i.i.d.∼ Ber(qm)
𝑞13
𝑞23
10
Memoryless BSC(p) outputs Zn1
i.i.d.∼ Ber(p ∗ qm), wherep ∗ qm , p(1− qm) + qm(1− p) is the convolution of p and qm
Random permutation generates Y n1
i.i.d.∼ Ber(p ∗ qm)
Maximum Likelihood (ML) decoder: M = 1{1n
∑ni=1 Yi ≥ 1
2
}1n
∑ni=1 Yi → p ∗ qm in probability as n→∞ [WLLN]
⇒ limn→∞
Pnerror = 0 as p ∗ q0 6= p ∗ q1
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 10 / 21
Encoder and Decoder
Suppose M = {1, . . . , nR} for some R > 0
Randomized encoder: Given m ∈M, fn(m) = X n1
i.i.d.∼ Ber( m
nR
)10
𝑛
Given m ∈M, Y n1
i.i.d.∼ Ber(p ∗ m
nR
)ML decoder: For yn1 ∈ {0, 1}n, gn(yn1 ) = arg max
m∈MPY n
1 |M(yn1 |m)
Challenge: Although 1n
∑ni=1 Yi → p ∗ m
nRin probability as n→∞,
consecutive messages become indistinguishable i.e. mnR− m+1
nR→ 0
Fact: Consecutive messages distinguishable ⇒ limn→∞
Pnerror = 0
What is the largest R such that two consecutive messagescan be distinguished?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 11 / 21
Encoder and Decoder
Suppose M = {1, . . . , nR} for some R > 0
Randomized encoder: Given m ∈M, fn(m) = X n1
i.i.d.∼ Ber( m
nR
)10
𝑛
Given m ∈M, Y n1
i.i.d.∼ Ber(p ∗ m
nR
)ML decoder: For yn1 ∈ {0, 1}n, gn(yn1 ) = arg max
m∈MPY n
1 |M(yn1 |m)
Challenge: Although 1n
∑ni=1 Yi → p ∗ m
nRin probability as n→∞,
consecutive messages become indistinguishable i.e. mnR− m+1
nR→ 0
Fact: Consecutive messages distinguishable ⇒ limn→∞
Pnerror = 0
What is the largest R such that two consecutive messagescan be distinguished?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 11 / 21
Encoder and Decoder
Suppose M = {1, . . . , nR} for some R > 0
Randomized encoder: Given m ∈M, fn(m) = X n1
i.i.d.∼ Ber( m
nR
)10
𝑛
Given m ∈M, Y n1
i.i.d.∼ Ber(p ∗ m
nR
)(as before)
ML decoder: For yn1 ∈ {0, 1}n, gn(yn1 ) = arg maxm∈M
PY n1 |M(yn1 |m)
Challenge: Although 1n
∑ni=1 Yi → p ∗ m
nRin probability as n→∞,
consecutive messages become indistinguishable i.e. mnR− m+1
nR→ 0
Fact: Consecutive messages distinguishable ⇒ limn→∞
Pnerror = 0
What is the largest R such that two consecutive messagescan be distinguished?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 11 / 21
Encoder and Decoder
Suppose M = {1, . . . , nR} for some R > 0
Randomized encoder: Given m ∈M, fn(m) = X n1
i.i.d.∼ Ber( m
nR
)10
𝑛
Given m ∈M, Y n1
i.i.d.∼ Ber(p ∗ m
nR
)ML decoder: For yn1 ∈ {0, 1}n, gn(yn1 ) = arg max
m∈MPY n
1 |M(yn1 |m)
Challenge: Although 1n
∑ni=1 Yi → p ∗ m
nRin probability as n→∞,
consecutive messages become indistinguishable i.e. mnR− m+1
nR→ 0
Fact: Consecutive messages distinguishable ⇒ limn→∞
Pnerror = 0
What is the largest R such that two consecutive messagescan be distinguished?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 11 / 21
Encoder and Decoder
Suppose M = {1, . . . , nR} for some R > 0
Randomized encoder: Given m ∈M, fn(m) = X n1
i.i.d.∼ Ber( m
nR
)10
𝑛
Given m ∈M, Y n1
i.i.d.∼ Ber(p ∗ m
nR
)ML decoder: For yn1 ∈ {0, 1}n, gn(yn1 ) = arg max
m∈MPY n
1 |M(yn1 |m)
Challenge: Although 1n
∑ni=1 Yi → p ∗ m
nRin probability as n→∞,
consecutive messages become indistinguishable i.e. mnR− m+1
nR→ 0
Fact: Consecutive messages distinguishable ⇒ limn→∞
Pnerror = 0
What is the largest R such that two consecutive messagescan be distinguished?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 11 / 21
Encoder and Decoder
Suppose M = {1, . . . , nR} for some R > 0
Randomized encoder: Given m ∈M, fn(m) = X n1
i.i.d.∼ Ber( m
nR
)10
𝑛
Given m ∈M, Y n1
i.i.d.∼ Ber(p ∗ m
nR
)ML decoder: For yn1 ∈ {0, 1}n, gn(yn1 ) = arg max
m∈MPY n
1 |M(yn1 |m)
Challenge: Although 1n
∑ni=1 Yi → p ∗ m
nRin probability as n→∞,
consecutive messages become indistinguishable i.e. mnR− m+1
nR→ 0
Fact: Consecutive messages distinguishable ⇒ limn→∞
Pnerror = 0
What is the largest R such that two consecutive messagescan be distinguished?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 11 / 21
Encoder and Decoder
Suppose M = {1, . . . , nR} for some R > 0
Randomized encoder: Given m ∈M, fn(m) = X n1
i.i.d.∼ Ber( m
nR
)10
𝑛
Given m ∈M, Y n1
i.i.d.∼ Ber(p ∗ m
nR
)ML decoder: For yn1 ∈ {0, 1}n, gn(yn1 ) = arg max
m∈MPY n
1 |M(yn1 |m)
Challenge: Although 1n
∑ni=1 Yi → p ∗ m
nRin probability as n→∞,
consecutive messages become indistinguishable i.e. mnR− m+1
nR→ 0
Fact: Consecutive messages distinguishable ⇒ limn→∞
Pnerror = 0
What is the largest R such that two consecutive messagescan be distinguished?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 11 / 21
Testing between Converging Hypotheses
Binary Hypothesis Testing:
Consider hypothesis H ∼ Ber(12
)with uniform prior
For any n ∈ N, q ∈ (0, 1), and R > 0, consider likelihoods:
Given H = 0 : X n1
i.i.d.∼ PX |H=0 = Ber(q)
Given H = 1 : X n1
i.i.d.∼ PX |H=1 = Ber
(q +
1
nR
)Define the zero-mean sufficient statistic of X n
1 for H:
Tn ,1
n
n∑i=1
Xi − q − 1
2nR
Let HnML(Tn) denote the ML decoder for H based on Tn with
minimum probability of error PnML , P(Hn
ML(Tn) 6= H)
Want: Largest R > 0 such that limn→∞
PnML = 0?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 12 / 21
Testing between Converging Hypotheses
Binary Hypothesis Testing:
Consider hypothesis H ∼ Ber(12
)with uniform prior
For any n ∈ N, q ∈ (0, 1), and R > 0, consider likelihoods:
Given H = 0 : X n1
i.i.d.∼ PX |H=0 = Ber(q)
Given H = 1 : X n1
i.i.d.∼ PX |H=1 = Ber
(q +
1
nR
)
Define the zero-mean sufficient statistic of X n1 for H:
Tn ,1
n
n∑i=1
Xi − q − 1
2nR
Let HnML(Tn) denote the ML decoder for H based on Tn with
minimum probability of error PnML , P(Hn
ML(Tn) 6= H)
Want: Largest R > 0 such that limn→∞
PnML = 0?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 12 / 21
Testing between Converging Hypotheses
Binary Hypothesis Testing:
Consider hypothesis H ∼ Ber(12
)with uniform prior
For any n ∈ N, q ∈ (0, 1), and R > 0, consider likelihoods:
Given H = 0 : X n1
i.i.d.∼ PX |H=0 = Ber(q)
Given H = 1 : X n1
i.i.d.∼ PX |H=1 = Ber
(q +
1
nR
)Define the zero-mean sufficient statistic of X n
1 for H:
Tn ,1
n
n∑i=1
Xi − q − 1
2nR
Let HnML(Tn) denote the ML decoder for H based on Tn with
minimum probability of error PnML , P(Hn
ML(Tn) 6= H)
Want: Largest R > 0 such that limn→∞
PnML = 0?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 12 / 21
Testing between Converging Hypotheses
Binary Hypothesis Testing:
Consider hypothesis H ∼ Ber(12
)with uniform prior
For any n ∈ N, q ∈ (0, 1), and R > 0, consider likelihoods:
Given H = 0 : X n1
i.i.d.∼ PX |H=0 = Ber(q)
Given H = 1 : X n1
i.i.d.∼ PX |H=1 = Ber
(q +
1
nR
)Define the zero-mean sufficient statistic of X n
1 for H:
Tn ,1
n
n∑i=1
Xi − q − 1
2nR
Let HnML(Tn) denote the ML decoder for H based on Tn with
minimum probability of error PnML , P(Hn
ML(Tn) 6= H)
Want: Largest R > 0 such that limn→∞
PnML = 0?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 12 / 21
Testing between Converging Hypotheses
Binary Hypothesis Testing:
Consider hypothesis H ∼ Ber(12
)with uniform prior
For any n ∈ N, q ∈ (0, 1), and R > 0, consider likelihoods:
Given H = 0 : X n1
i.i.d.∼ PX |H=0 = Ber(q)
Given H = 1 : X n1
i.i.d.∼ PX |H=1 = Ber
(q +
1
nR
)Define the zero-mean sufficient statistic of X n
1 for H:
Tn ,1
n
n∑i=1
Xi − q − 1
2nR
Let HnML(Tn) denote the ML decoder for H based on Tn with
minimum probability of error PnML , P(Hn
ML(Tn) 6= H)
Want: Largest R > 0 such that limn→∞
PnML = 0?
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 12 / 21
Intuition via Central Limit Theorem
For large n, PTn|H(·|0) and PTn|H(·|1) are Gaussian distributions [CLT]
|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR
Standard deviations are Θ(1/√n)
Figure:
𝑡0
𝑃 | 𝑡|0 𝑃 | 𝑡|1
12𝑛
12𝑛
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 13 / 21
Intuition via Central Limit Theorem
For large n, PTn|H(·|0) and PTn|H(·|1) are Gaussian distributions [CLT]
|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR
Standard deviations are Θ(1/√n)
Figure:
𝑡0
𝑃 | 𝑡|0 𝑃 | 𝑡|1
𝛳1𝑛
1𝑛
𝛳1𝑛
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 13 / 21
Intuition via Central Limit Theorem
For large n, PTn|H(·|0) and PTn|H(·|1) are Gaussian distributions [CLT]
|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR
Standard deviations are Θ(1/√n)
Figure:
𝑡0
𝑃 | 𝑡|0 𝑃 | 𝑡|1
𝛳1𝑛
1𝑛
𝛳1𝑛
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 13 / 21
Intuition via Central Limit Theorem
For large n, PTn|H(·|0) and PTn|H(·|1) are Gaussian distributions [CLT]
|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR
Standard deviations are Θ(1/√n)
Case R < 12 :
𝑡0
𝑃 | 𝑡|0 𝑃 | 𝑡|1
𝛳1𝑛
1𝑛
𝛳1𝑛
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 13 / 21
Intuition via Central Limit Theorem
For large n, PTn|H(·|0) and PTn|H(·|1) are Gaussian distributions [CLT]
|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR
Standard deviations are Θ(1/√n)
Case R < 12 : Decoding is possible ,
𝑡0
𝑃 | 𝑡|0 𝑃 | 𝑡|1
𝛳1𝑛
1𝑛
𝛳1𝑛
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 13 / 21
Intuition via Central Limit Theorem
For large n, PTn|H(·|0) and PTn|H(·|1) are Gaussian distributions [CLT]
|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR
Standard deviations are Θ(1/√n)
Case R > 12 :
𝑡0
𝑃 | 𝑡|0 𝑃 | 𝑡|1
𝛳1𝑛
1𝑛
𝛳1𝑛
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 13 / 21
Intuition via Central Limit Theorem
For large n, PTn|H(·|0) and PTn|H(·|1) are Gaussian distributions [CLT]
|E[Tn|H = 0]− E[Tn|H = 1]| = 1/nR
Standard deviations are Θ(1/√n)
Case R > 12 : Decoding is impossible /
𝑡0
𝑃 | 𝑡|0 𝑃 | 𝑡|1
𝛳1𝑛
1𝑛
𝛳1𝑛
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 13 / 21
Second Moment Method for TV Distance
Lemma (2nd Moment Method [Evans-Kenyon-Peres-Schulman 2000])∥∥PTn|H=1 − PTn|H=0
∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
where ‖P − Q‖TV = 12 ‖P − Q‖`1 is the total variation (TV) distance
between the distributions P and Q.
Proof:
�
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 14 / 21
Second Moment Method for TV Distance
Lemma (2nd Moment Method [Evans-Kenyon-Peres-Schulman 2000])∥∥PTn|H=1 − PTn|H=0
∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
where ‖P − Q‖TV = 12 ‖P − Q‖`1 is the total variation (TV) distance
between the distributions P and Q.
Proof: Let T+n ∼ PTn|H=1 and T−n ∼ PTn|H=0(
E[T+n
]− E
[T−n])2
=
(∑t
t(PTn|H(t|1)− PTn|H(t|0)
))2
�
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 14 / 21
Second Moment Method for TV Distance
Lemma (2nd Moment Method [Evans-Kenyon-Peres-Schulman 2000])∥∥PTn|H=1 − PTn|H=0
∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
where ‖P − Q‖TV = 12 ‖P − Q‖`1 is the total variation (TV) distance
between the distributions P and Q.
Proof: Let T+n ∼ PTn|H=1 and T−n ∼ PTn|H=0(
E[T+n
]− E
[T−n])2
=
(∑t
t√PTn(t)
(PTn|H(t|1)− PTn|H(t|0)
)√PTn(t)
)2
�
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 14 / 21
Second Moment Method for TV Distance
Lemma (2nd Moment Method [Evans-Kenyon-Peres-Schulman 2000])∥∥PTn|H=1 − PTn|H=0
∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
where ‖P − Q‖TV = 12 ‖P − Q‖`1 is the total variation (TV) distance
between the distributions P and Q.
Proof: Cauchy-Schwarz inequality
(E[T+n
]− E
[T−n])2
=
(∑t
t√
PTn(t)
(PTn|H(t|1)− PTn|H(t|0)
)√PTn(t)
)2
≤
(∑t
t2PTn(t)
)(∑t
(PTn|H(t|1)− PTn|H(t|0)
)2PTn(t)
)
�
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 14 / 21
Second Moment Method for TV Distance
Lemma (2nd Moment Method [Evans-Kenyon-Peres-Schulman 2000])∥∥PTn|H=1 − PTn|H=0
∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
where ‖P − Q‖TV = 12 ‖P − Q‖`1 is the total variation (TV) distance
between the distributions P and Q.
Proof: Recall that Tn is zero-mean(E[T+n
]− E
[T−n])2
=
(∑t
t√
PTn(t)
(PTn|H(t|1)− PTn|H(t|0)
)√PTn(t)
)2
≤ VAR(Tn)
(∑t
(PTn|H(t|1)− PTn|H(t|0)
)2PTn(t)
)
�
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 14 / 21
Second Moment Method for TV Distance
Lemma (2nd Moment Method [Evans-Kenyon-Peres-Schulman 2000])∥∥PTn|H=1 − PTn|H=0
∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
where ‖P − Q‖TV = 12 ‖P − Q‖`1 is the total variation (TV) distance
between the distributions P and Q.
Proof: Hammersley-Chapman-Robbins bound
(E[T+n
]− E
[T−n])2
=
(∑t
t√
PTn(t)
(PTn|H(t|1)− PTn|H(t|0)
)√PTn(t)
)2
≤ 4VAR(Tn)
(1
4
∑t
(PTn|H(t|1)− PTn|H(t|0)
)2PTn(t)
)︸ ︷︷ ︸
Vincze-Le Cam distance
�
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 14 / 21
Second Moment Method for TV Distance
Lemma (2nd Moment Method [Evans-Kenyon-Peres-Schulman 2000])∥∥PTn|H=1 − PTn|H=0
∥∥TV≥ (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
where ‖P − Q‖TV = 12 ‖P − Q‖`1 is the total variation (TV) distance
between the distributions P and Q.
Proof:(E[T+n
]− E
[T−n])2
=
(∑t
t√
PTn(t)
(PTn|H(t|1)− PTn|H(t|0)
)√PTn(t)
)2
≤ 4VAR(Tn)
(1
4
∑t
(PTn|H(t|1)− PTn|H(t|0)
)2PTn(t)
)≤ 4VAR(Tn)
∥∥PTn|H=1 − PTn|H=0
∥∥TV
�A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 14 / 21
Achievability Proof
Theorem (Achievability)
For any 0 < R < 1/2, consider the binary hypothesis testing problem with
H ∼ Ber(12
), and X n
1i.i.d.∼ Ber
(q + h
nR
)given H = h ∈ {0, 1}.
Then, limn→∞
PnML = 0. This implies that:
Cperm(BSC(p)) ≥ 1
2.
Proof: Start with Le Cam’s relation
PnML =
1
2
(1−
∥∥PTn|H=1 − PTn|H=0
∥∥TV
)
�
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 15 / 21
Achievability Proof
Theorem (Achievability)
For any 0 < R < 1/2, consider the binary hypothesis testing problem with
H ∼ Ber(12
), and X n
1i.i.d.∼ Ber
(q + h
nR
)given H = h ∈ {0, 1}.
Then, limn→∞
PnML = 0. This implies that:
Cperm(BSC(p)) ≥ 1
2.
Proof: Apply second moment method lemma
PnML =
1
2
(1−
∥∥PTn|H=1 − PTn|H=0
∥∥TV
)≤ 1
2
(1− (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
)
�
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 15 / 21
Achievability Proof
Theorem (Achievability)
For any 0 < R < 1/2, consider the binary hypothesis testing problem with
H ∼ Ber(12
), and X n
1i.i.d.∼ Ber
(q + h
nR
)given H = h ∈ {0, 1}.
Then, limn→∞
PnML = 0. This implies that:
Cperm(BSC(p)) ≥ 1
2.
Proof: After explicit computation and simplification...
PnML =
1
2
(1−
∥∥PTn|H=1 − PTn|H=0
∥∥TV
)≤ 1
2
(1− (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
)
�
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 15 / 21
Achievability Proof
Theorem (Achievability)
For any 0 < R < 1/2, consider the binary hypothesis testing problem with
H ∼ Ber(12
), and X n
1i.i.d.∼ Ber
(q + h
nR
)given H = h ∈ {0, 1}.
Then, limn→∞
PnML = 0. This implies that:
Cperm(BSC(p)) ≥ 1
2.
Proof: For any 0 < R < 12 ,
PnML =
1
2
(1−
∥∥PTn|H=1 − PTn|H=0
∥∥TV
)≤ 1
2
(1− (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
)≤ 3
2n1−2R
�
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 15 / 21
Achievability Proof
Theorem (Achievability)
For any 0 < R < 1/2, consider the binary hypothesis testing problem with
H ∼ Ber(12
), and X n
1i.i.d.∼ Ber
(q + h
nR
)given H = h ∈ {0, 1}.
Then, limn→∞
PnML = 0.
This implies that:
Cperm(BSC(p)) ≥ 1
2.
Proof: For any 0 < R < 12 ,
PnML =
1
2
(1−
∥∥PTn|H=1 − PTn|H=0
∥∥TV
)≤ 1
2
(1− (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
)≤ 3
2n1−2R→ 0 as n→∞
�A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 15 / 21
Achievability Proof
Theorem (Achievability)
For any 0 < R < 1/2, consider the binary hypothesis testing problem with
H ∼ Ber(12
), and X n
1i.i.d.∼ Ber
(q + h
nR
)given H = h ∈ {0, 1}.
Then, limn→∞
PnML = 0. This implies that:
Cperm(BSC(p)) ≥ 1
2.
Proof: For any 0 < R < 12 ,
PnML =
1
2
(1−
∥∥PTn|H=1 − PTn|H=0
∥∥TV
)≤ 1
2
(1− (E[Tn|H = 1]− E[Tn|H = 0])2
4VAR(Tn)
)≤ 3
2n1−2R→ 0 as n→∞
�A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 15 / 21
Outline
1 Introduction
2 Achievability
3 ConverseFano’s Inequality ArgumentCLT Approximation
4 Conclusion
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 16 / 21
Converse: Fano’s Inequality Argument
Consider the Markov chain M → X n1 → Zn
1 → Y n1 → Sn ,
∑ni=1 Yi ,
and a sequence of encoder-decoder pairs {(fn, gn)}n∈N such that|M| = nR and lim
n→∞Pnerror = 0
Standard argument, cf. [Cover-Thomas 2006]:
R log(n)
= H(M|M) + I (M; M)
≤ 1 + PnerrorR log(n) + I (M;Y n
1 )
= 1 + PnerrorR log(n) + I (M;Sn)
≤ 1 + PnerrorR log(n) + I (X n
1 ;Sn)
Divide by log(n)
and let n→∞:
R ≤ I (X n1 ;Sn)
log(n)
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 17 / 21
Converse: Fano’s Inequality Argument
Consider the Markov chain M → X n1 → Zn
1 → Y n1 → Sn ,
∑ni=1 Yi ,
and a sequence of encoder-decoder pairs {(fn, gn)}n∈N such that|M| = nR and lim
n→∞Pnerror = 0
Standard argument, cf. [Cover-Thomas 2006]: M is uniform
R log(n) = H(M)
= H(M|M) + I (M; M)
≤ 1 + PnerrorR log(n) + I (M;Y n
1 )
= 1 + PnerrorR log(n) + I (M;Sn)
≤ 1 + PnerrorR log(n) + I (X n
1 ;Sn)
Divide by log(n)
and let n→∞:
R ≤ I (X n1 ;Sn)
log(n)
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 17 / 21
Converse: Fano’s Inequality Argument
Consider the Markov chain M → X n1 → Zn
1 → Y n1 → Sn ,
∑ni=1 Yi ,
and a sequence of encoder-decoder pairs {(fn, gn)}n∈N such that|M| = nR and lim
n→∞Pnerror = 0
Standard argument, cf. [Cover-Thomas 2006]: Fano’s inequality, DPI
R log(n) = H(M|M) + I (M; M)
≤ 1 + PnerrorR log(n) + I (M;Y n
1 )
= 1 + PnerrorR log(n) + I (M;Sn)
≤ 1 + PnerrorR log(n) + I (X n
1 ;Sn)
Divide by log(n)
and let n→∞:
R ≤ I (X n1 ;Sn)
log(n)
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 17 / 21
Converse: Fano’s Inequality Argument
Consider the Markov chain M → X n1 → Zn
1 → Y n1 → Sn ,
∑ni=1 Yi ,
and a sequence of encoder-decoder pairs {(fn, gn)}n∈N such that|M| = nR and lim
n→∞Pnerror = 0
Standard argument, cf. [Cover-Thomas 2006]: sufficiency
R log(n) = H(M|M) + I (M; M)
≤ 1 + PnerrorR log(n) + I (M;Y n
1 )
= 1 + PnerrorR log(n) + I (M; Sn)
≤ 1 + PnerrorR log(n) + I (X n
1 ;Sn)
Divide by log(n)
and let n→∞:
R ≤ I (X n1 ;Sn)
log(n)
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 17 / 21
Converse: Fano’s Inequality Argument
Consider the Markov chain M → X n1 → Zn
1 → Y n1 → Sn ,
∑ni=1 Yi ,
and a sequence of encoder-decoder pairs {(fn, gn)}n∈N such that|M| = nR and lim
n→∞Pnerror = 0
Standard argument, cf. [Cover-Thomas 2006]: DPI
R log(n) = H(M|M) + I (M; M)
≤ 1 + PnerrorR log(n) + I (M;Y n
1 )
= 1 + PnerrorR log(n) + I (M; Sn)
≤ 1 + PnerrorR log(n) + I (X n
1 ;Sn)
Divide by log(n)
and let n→∞:
R ≤ I (X n1 ;Sn)
log(n)
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 17 / 21
Converse: Fano’s Inequality Argument
Consider the Markov chain M → X n1 → Zn
1 → Y n1 → Sn ,
∑ni=1 Yi ,
and a sequence of encoder-decoder pairs {(fn, gn)}n∈N such that|M| = nR and lim
n→∞Pnerror = 0
Standard argument, cf. [Cover-Thomas 2006]:
R log(n) = H(M|M) + I (M; M)
≤ 1 + PnerrorR log(n) + I (M;Y n
1 )
= 1 + PnerrorR log(n) + I (M; Sn)
≤ 1 + PnerrorR log(n) + I (X n
1 ;Sn)
Divide by log(n)
and let n→∞:
R ≤ 1
log(n)+ Pn
errorR +I (X n
1 ;Sn)
log(n)
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 17 / 21
Converse: Fano’s Inequality Argument
Consider the Markov chain M → X n1 → Zn
1 → Y n1 → Sn ,
∑ni=1 Yi ,
and a sequence of encoder-decoder pairs {(fn, gn)}n∈N such that|M| = nR and lim
n→∞Pnerror = 0
Standard argument, cf. [Cover-Thomas 2006]:
R log(n) = H(M|M) + I (M; M)
≤ 1 + PnerrorR log(n) + I (M;Y n
1 )
= 1 + PnerrorR log(n) + I (M; Sn)
≤ 1 + PnerrorR log(n) + I (X n
1 ;Sn)
Divide by log(n) and let n→∞:
R ≤ limn→∞
I (X n1 ; Sn)
log(n)
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 17 / 21
Converse: CLT Approximation
Upper bound on I (X n1 ;Sn):
I (X n1 ;Sn) = H(Sn)− H(Sn|X n
1 )
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H(bin(k , 1− p) + bin(n − k , p))
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H
(bin(n
2, p))
= log(n + 1)− 1
2log(πep(1− p)n) + O
(1
n
)Hence, we have:
R ≤ limn→∞
I (X n1 ;Sn)
log(n)=
1
2
Theorem (Converse)
Cperm(BSC(p)) ≤ 1
2
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 18 / 21
Converse: CLT Approximation
Since Sn ∈ {0, . . . , n},I (X n
1 ;Sn) = H(Sn)− H(Sn|X n1 )
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H(Sn|X n
1 = xn1 )
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H(bin(k , 1− p) + bin(n − k , p))
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H
(bin(n
2, p))
= log(n + 1)− 1
2log(πep(1− p)n) + O
(1
n
)Hence, we have:
R ≤ limn→∞
I (X n1 ;Sn)
log(n)=
1
2
Theorem (Converse)
Cperm(BSC(p)) ≤ 1
2
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 18 / 21
Converse: CLT Approximation
Given X n1 = xn1 with
∑ni=1 xi = k , Sn = bin(k , 1− p) + bin(n − k , p):
I (X n1 ;Sn) = H(Sn)− H(Sn|X n
1 )
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H(bin(k , 1− p) + bin(n − k , p))
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H
(bin(n
2, p))
= log(n + 1)− 1
2log(πep(1− p)n) + O
(1
n
)Hence, we have:
R ≤ limn→∞
I (X n1 ;Sn)
log(n)=
1
2
Theorem (Converse)
Cperm(BSC(p)) ≤ 1
2
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 18 / 21
Converse: CLT Approximation
Using Problem 2.14 in [Cover-Thomas 2006],
I (X n1 ;Sn) = H(Sn)− H(Sn|X n
1 )
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H(bin(k , 1− p) + bin(n − k , p))
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H
(bin(n
2, p))
= log(n + 1)− 1
2log(πep(1− p)n) + O
(1
n
)Hence, we have:
R ≤ limn→∞
I (X n1 ;Sn)
log(n)=
1
2
Theorem (Converse)
Cperm(BSC(p)) ≤ 1
2
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 18 / 21
Converse: CLT Approximation
Approximate binomial entropy using CLT, cf. [Adell-Lekuona-Yu 2010]:
I (X n1 ;Sn) = H(Sn)− H(Sn|X n
1 )
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H(bin(k , 1− p) + bin(n − k , p))
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H
(bin(n
2, p))
= log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )
(1
2log(πep(1− p)n) + O
(1
n
))
= log(n + 1)− 1
2log(πep(1− p)n) + O
(1
n
)Hence, we have:
R ≤ limn→∞
I (X n1 ;Sn)
log(n)=
1
2
Theorem (Converse)
Cperm(BSC(p)) ≤ 1
2
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 18 / 21
Converse: CLT Approximation
Upper bound on I (X n1 ;Sn):
I (X n1 ;Sn) = H(Sn)− H(Sn|X n
1 )
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H(bin(k , 1− p) + bin(n − k , p))
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H
(bin(n
2, p))
= log(n + 1)− 1
2log(πep(1− p)n) + O
(1
n
)
Hence, we have:
R ≤ limn→∞
I (X n1 ;Sn)
log(n)=
1
2
Theorem (Converse)
Cperm(BSC(p)) ≤ 1
2
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 18 / 21
Converse: CLT Approximation
Upper bound on I (X n1 ;Sn):
I (X n1 ;Sn) = H(Sn)− H(Sn|X n
1 )
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H(bin(k , 1− p) + bin(n − k , p))
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H
(bin(n
2, p))
= log(n + 1)− 1
2log(πep(1− p)n) + O
(1
n
)Hence, we have:
R ≤ limn→∞
I (X n1 ; Sn)
log(n)=
1
2
Theorem (Converse)
Cperm(BSC(p)) ≤ 1
2
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 18 / 21
Converse: CLT Approximation
Upper bound on I (X n1 ;Sn):
I (X n1 ;Sn) = H(Sn)− H(Sn|X n
1 )
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H(bin(k , 1− p) + bin(n − k , p))
≤ log(n + 1)−∑
xn1∈{0,1}nPX n
1(xn1 )H
(bin(n
2, p))
= log(n + 1)− 1
2log(πep(1− p)n) + O
(1
n
)Hence, we have:
R ≤ limn→∞
I (X n1 ; Sn)
log(n)=
1
2
Theorem (Converse)
Cperm(BSC(p)) ≤ 1
2
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 18 / 21
Outline
1 Introduction
2 Achievability
3 Converse
4 Conclusion
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 19 / 21
Conclusion
Theorem (Pemutation Channel Capacity of BSC)
Cperm(BSC(p)) =
1, for p = 0, 112 , for p ∈
(0, 12)∪(12 , 1)
0, for p = 12
𝑝0
𝐶perm BSC 𝑝
0
1
112
12
Remarks:
Cperm(·) is discontinuousand non-convex
Cperm(·) is generally agnosticto parameters of channel
Computationally tractablecoding scheme in proof
Proof technique yields moregeneral results
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 20 / 21
Conclusion
Theorem (Pemutation Channel Capacity of BSC)
Cperm(BSC(p)) =
1, for p = 0, 112 , for p ∈
(0, 12)∪(12 , 1)
0, for p = 12
𝑝0
𝐶perm BSC 𝑝
0
1
112
12
Remarks:
Cperm(·) is discontinuousand non-convex
Cperm(·) is generally agnosticto parameters of channel
Computationally tractablecoding scheme in proof
Proof technique yields moregeneral results
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 20 / 21
Conclusion
Theorem (Pemutation Channel Capacity of BSC)
Cperm(BSC(p)) =
1, for p = 0, 112 , for p ∈
(0, 12)∪(12 , 1)
0, for p = 12
𝑝0
𝐶perm BSC 𝑝
0
1
112
12
Remarks:
Cperm(·) is discontinuousand non-convex
Cperm(·) is generally agnosticto parameters of channel
Computationally tractablecoding scheme in proof
Proof technique yields moregeneral results
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 20 / 21
Conclusion
Theorem (Pemutation Channel Capacity of BSC)
Cperm(BSC(p)) =
1, for p = 0, 112 , for p ∈
(0, 12)∪(12 , 1)
0, for p = 12
𝑝0
𝐶perm BSC 𝑝
0
1
112
12
Remarks:
Cperm(·) is discontinuousand non-convex
Cperm(·) is generally agnosticto parameters of channel
Computationally tractablecoding scheme in proof
Proof technique yields moregeneral results
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 20 / 21
Conclusion
Theorem (Pemutation Channel Capacity of BSC)
Cperm(BSC(p)) =
1, for p = 0, 112 , for p ∈
(0, 12)∪(12 , 1)
0, for p = 12
𝑝0
𝐶perm BSC 𝑝
0
1
112
12
Remarks:
Cperm(·) is discontinuousand non-convex
Cperm(·) is generally agnosticto parameters of channel
Computationally tractablecoding scheme in proof
Proof technique yields moregeneral results
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 20 / 21
Thank You!
A. Makur (MIT) Capacity of BSC Permutation Channel 5 October 2018 21 / 21