task 1: decoding - cw.fel.cvut.cz€¦ · the task will be to try the variational bayesian learning...

Task 1: Decoding

X1 X2 X3 X4

Y1 Y2 Y3 Y4

X5 X6 X7

Y5 Y6 Y7

2

Task 1: Decoding

Message bits

Parity bits

received signal

received

Reduced factor graph

Gaussian noise

Model:

For i = 1 . . . 4: p(Yi |X) = pN (Yi;Xi,�2)

p(Y5 |X) = pN (Y5;X1 �X2 �X3,�2)

X5 X6 X7

X1 X2 X3 X4

p(Y6 |X) = pN (Y6;X2 �X3 �X4,�2)

p(Y7 |X) = pN (Y7;X3 �X4 �X1,�2)

p(X1, X2, X3, X4) - uniform

Xi 2 {0, 1} - bits, Yi 2 R – measured signal

This defines p(X,Y )

• Metrics: • 2) whole packet is broken if one bit is broken • 1) the number of bad bits

• Step 1, given y and sigma • Given y, compute the baseline naive solution, x1…4 using only inputs y1…4 • Given y, compute the maximum posteriori (MAP) solution x1…4, by enumerating all solutions

• Step 2, using the decoders above • Implement the encoder and the noisy channel: for a given x, generate y from p(Y|x) • Compute the average error rates:

• enumerate all x • generate y from p(Y|x) as above • decode y to obtain the solution x’

• compute packet error rate: frequency of the case when not all bits of x’ are correct • compute bit error rate: the average number of bits that are wrong

• Step 3: Plot error rates versus sigma for sigma in [0.001, 2] • Step 4:

• Implement the MM decoder by computing marginal distributions p(Xi) and the most probable marginal assignment xi for each i

• For each error metric plot the rates with 3 methods: baseline, MAP, MM • Which method is better for low noise, sigma<0.2 • Which method is better for high noise, sigma = 1, and packet error metric?

3

Task 1: Decoding

• Report (zip): • Your implementation (any language) • Plots

• Time spent working on the task (including debugging and preparing the report). The estimate is 2-4 hours.

4

Task 1: Decoding

Bit Error

baselineMAPMM

Packet Error

baselineMAPMM

sigma sigma0 2 20

Task 2: Variational Bayes

The task will be to try the variational Bayesian Learning on a small test problem.

• Measurements: x 2 R• Class y 2 {0, 1}

• Model: p(y | x) = 1/(1 + exp(�(wx + b))) – logistic model with 1D input

• Parameters: ✓ = (w , b)

Data:

• For training, generate N/2 points of class 0 and N/ points of class 1

• For y = 0, generate x ⇠ pN (�2, 12) (mean and variance parameters)

• For y = 1, generate x ⇠ pN (2, 22)

• Form two arrays of length N:

x = (samples of class 0, samples of class 1)

y = (0, . . . , 0, 1, . . . 1)

6


example data for N=8

7

Step 1 - Maximum Likelihood

def prediction(x, theta): """ model posterior probability p(y=0 | x; theta) """ pass

def log_likelihood(py, y): """ log likelihood of a given prediction at ground truth y""" r = np.log(py[y == 0]).sum() r += np.log(1 - py[y == 1]).sum() return r

nll = lambda theta: -log_likelihood(prediction(x, theta), y) theta0 = np.ones([2]) o = scipy.optimize.minimize(nll, theta0, method='Nelder-Mead')

Step 1: Maximum Likelihood estimator

• Find

ˆ✓ = argmin✓ �P

t log p(yt |x t ; ✓)

• use simplest optimization method, e.g.:

Compute the validation accuracy:

• Generate new data sample (x , y) with N = 1000

• Classify them using model: y

t= 0 if p(y = 0 | x t , ˆ✓) > p(y = 1 | x t , ˆ✓) and y

t= 1 otherwise

• Compute test error rate:

Pt [[y

t 6= y

t]]/N

8

Step 1

Plot the model prediction p(y = 0|x , ˆ✓) for x 2 [�5, 5] example for

ˆ✓ = (1, 1)

• Parameters w , b are considered as random variables

• w ⇠ N (µw ,�2w )

• b ⇠ N (µb,�2b)

• We will optimize over (µw , µb,�2w ,�

2b)

• For variances use parametrization:

�2=

8<

:exp(z), z < 0,

1 + z , z � 0;

• denote ⌘ = (µw , µb, zw , zb)

• can use now unconstrained minimization over ⌘

9

Step 2 - Variational Bayes

-10 -5 5 10

2

4

6

8

10

function

10

Step 2 - Variational Bayes

def expected_prediction(x: np.array, mu_w, mu_b, var_w, var_b) -> np.array: """ compute expectation of prediction p(y=0| x; theta) over theta ~ q""" v_0 = math.pi ** 2 / 3 m = mu_w * x + mu_b v = var_w * x**2 + var_b a = m/np.sqrt(v/v_0 + 1) py = 1 / (1 + np.exp(-a)) return py

For prediction with q, we need the expectation

E✓⇠qp(y | x , ✓)q(✓)Z

p(y | x , ✓)q(✓)d✓ =

Zp(y | x ,w , b)q(w)q(b)dwdb

We will not use sampling to compute it, instead we use an analytic approximation:

11

Step 2 - Variational BayesVB objective: KL(q(✓|⌘)kp(✓ |D)) = �E✓⇠q(✓|⌘) log p(D | ✓) + KL(q(✓|⌘)kp(✓))

• For the firs term use the approximation

�E✓⇠q(✓|⌘) log p(D | ✓) = �E✓⇠q(✓|⌘)X

t

log p(y

t | x t , ✓) = �X

t

E✓⇠q(✓|⌘) log p(yt | x t , ✓)

⇡ �X

t

logE✓⇠q(✓|⌘)p(yt | x t , ✓)

• Notice that the expression under the logarithm is the prediction with q, for which we already

have approximate analytic solution from the previous slide

For the second term use that

• KL divergence of independent variables (w and b under q) is sum of KL divergences

• Assume prior on w is N (0, 106) and prior on b is N (0, 108) (uninformative)

• Use formula for KL divergence of two Gaussian variables

12


KL(N (µ1,�21)kN (µ2,�

22)) =

1

2

⇣log �2

2 � log �21 +

�21 + (µ1 � µ2)

2

�22

� 1

⌘

• Repeat the plot of prediction, using the approximate expected posterior

• Compute test error

Report

• For N=8 repeat the experiment several times, show plots as above with both methods for two

cases where data happens to be separable (training error 0) and non-separable

• Report the average test accuracy of both methods over 100 trials with N=8 points

• Repeat the above for N=20, N=1000

• Write your observations and conclusions for these cases. Were the results expectable for you?

Which method provided a more reasonable confidence estimate?

• Indicate the time spent working on this problem. Estimate 3-6 h.

task 1: decoding - cw.fel.cvut.cz€¦ · the task will be to try the variational bayesian learning...

Documents