task 1: decoding - cw.fel.cvut.cz€¦ · the task will be to try the variational bayesian learning...
TRANSCRIPT
Task 1: Decoding
X1 X2 X3 X4
Y1 Y2 Y3 Y4
X5 X6 X7
Y5 Y6 Y7
2
Task 1: Decoding
Message bits
Parity bits
received signal
received
Reduced factor graph
Gaussian noise
Model:
For i = 1 . . . 4: p(Yi |X) = pN (Yi;Xi,�2)
p(Y5 |X) = pN (Y5;X1 �X2 �X3,�2)
X5 X6 X7
X1 X2 X3 X4
p(Y6 |X) = pN (Y6;X2 �X3 �X4,�2)
p(Y7 |X) = pN (Y7;X3 �X4 �X1,�2)
p(X1, X2, X3, X4) - uniform
Xi 2 {0, 1} - bits, Yi 2 R – measured signal
This defines p(X,Y )
• Metrics: • 2) whole packet is broken if one bit is broken • 1) the number of bad bits
• Step 1, given y and sigma • Given y, compute the baseline naive solution, x1…4 using only inputs y1…4 • Given y, compute the maximum posteriori (MAP) solution x1…4, by enumerating all solutions
• Step 2, using the decoders above • Implement the encoder and the noisy channel: for a given x, generate y from p(Y|x) • Compute the average error rates:
• enumerate all x • generate y from p(Y|x) as above • decode y to obtain the solution x’
• compute packet error rate: frequency of the case when not all bits of x’ are correct • compute bit error rate: the average number of bits that are wrong
• Step 3: Plot error rates versus sigma for sigma in [0.001, 2] • Step 4:
• Implement the MM decoder by computing marginal distributions p(Xi) and the most probable marginal assignment xi for each i
• For each error metric plot the rates with 3 methods: baseline, MAP, MM • Which method is better for low noise, sigma<0.2 • Which method is better for high noise, sigma = 1, and packet error metric?
3
Task 1: Decoding
• Report (zip): • Your implementation (any language) • Plots
• Time spent working on the task (including debugging and preparing the report). The estimate is 2-4 hours.
4
Task 1: Decoding
Bit Error
baselineMAPMM
Packet Error
baselineMAPMM
sigma sigma0 2 20
Task 2: Variational Bayes
The task will be to try the variational Bayesian Learning on a small test problem.
• Measurements: x 2 R• Class y 2 {0, 1}
• Model: p(y | x) = 1/(1 + exp(�(wx + b))) – logistic model with 1D input
• Parameters: ✓ = (w , b)
Data:
• For training, generate N/2 points of class 0 and N/ points of class 1
• For y = 0, generate x ⇠ pN (�2, 12) (mean and variance parameters)
• For y = 1, generate x ⇠ pN (2, 22)
• Form two arrays of length N:
x = (samples of class 0, samples of class 1)
y = (0, . . . , 0, 1, . . . 1)
6
Task 2: Variational Bayes
example data for N=8
7
Step 1 - Maximum Likelihood
def prediction(x, theta): """ model posterior probability p(y=0 | x; theta) """ pass
def log_likelihood(py, y): """ log likelihood of a given prediction at ground truth y""" r = np.log(py[y == 0]).sum() r += np.log(1 - py[y == 1]).sum() return r
nll = lambda theta: -log_likelihood(prediction(x, theta), y) theta0 = np.ones([2]) o = scipy.optimize.minimize(nll, theta0, method='Nelder-Mead')
Step 1: Maximum Likelihood estimator
• Find
ˆ✓ = argmin✓ �P
t log p(yt |x t ; ✓)
• use simplest optimization method, e.g.:
Compute the validation accuracy:
• Generate new data sample (x , y) with N = 1000
• Classify them using model: y
t= 0 if p(y = 0 | x t , ˆ✓) > p(y = 1 | x t , ˆ✓) and y
t= 1 otherwise
• Compute test error rate:
Pt [[y
t 6= y
t]]/N
8
Step 1
Plot the model prediction p(y = 0|x , ˆ✓) for x 2 [�5, 5] example for
ˆ✓ = (1, 1)
• Parameters w , b are considered as random variables
• w ⇠ N (µw ,�2w )
• b ⇠ N (µb,�2b)
• We will optimize over (µw , µb,�2w ,�
2b)
• For variances use parametrization:
�2=
8<
:exp(z), z < 0,
1 + z , z � 0;
• denote ⌘ = (µw , µb, zw , zb)
• can use now unconstrained minimization over ⌘
9
Step 2 - Variational Bayes
-10 -5 5 10
2
4
6
8
10
function
10
Step 2 - Variational Bayes
def expected_prediction(x: np.array, mu_w, mu_b, var_w, var_b) -> np.array: """ compute expectation of prediction p(y=0| x; theta) over theta ~ q""" v_0 = math.pi ** 2 / 3 m = mu_w * x + mu_b v = var_w * x**2 + var_b a = m/np.sqrt(v/v_0 + 1) py = 1 / (1 + np.exp(-a)) return py
For prediction with q, we need the expectation
E✓⇠qp(y | x , ✓)q(✓)Z
p(y | x , ✓)q(✓)d✓ =
Zp(y | x ,w , b)q(w)q(b)dwdb
We will not use sampling to compute it, instead we use an analytic approximation:
11
Step 2 - Variational BayesVB objective: KL(q(✓|⌘)kp(✓ |D)) = �E✓⇠q(✓|⌘) log p(D | ✓) + KL(q(✓|⌘)kp(✓))
• For the firs term use the approximation
�E✓⇠q(✓|⌘) log p(D | ✓) = �E✓⇠q(✓|⌘)X
t
log p(y
t | x t , ✓) = �X
t
E✓⇠q(✓|⌘) log p(yt | x t , ✓)
⇡ �X
t
logE✓⇠q(✓|⌘)p(yt | x t , ✓)
• Notice that the expression under the logarithm is the prediction with q, for which we already
have approximate analytic solution from the previous slide
For the second term use that
• KL divergence of independent variables (w and b under q) is sum of KL divergences
• Assume prior on w is N (0, 106) and prior on b is N (0, 108) (uninformative)
• Use formula for KL divergence of two Gaussian variables
12
Task 2: Variational Bayes
KL(N (µ1,�21)kN (µ2,�
22)) =
1
2
⇣log �2
2 � log �21 +
�21 + (µ1 � µ2)
2
�22
� 1
⌘
• Repeat the plot of prediction, using the approximate expected posterior
• Compute test error
Report
• For N=8 repeat the experiment several times, show plots as above with both methods for two
cases where data happens to be separable (training error 0) and non-separable
• Report the average test accuracy of both methods over 100 trials with N=8 points
• Repeat the above for N=20, N=1000
• Write your observations and conclusions for these cases. Were the results expectable for you?
Which method provided a more reasonable confidence estimate?
• Indicate the time spent working on this problem. Estimate 3-6 h.