![Page 1: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/1.jpg)
Training Binarized Neural Networks usingMIP and CP
Rodrigo Toro Icarte Leon Illanes Margarita P. CastroAndre A. Cire Sheila A. McIlraith J. Christopher Beck
CP 2019October 4
![Page 2: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/2.jpg)
Binarized Neural Networks
![Page 3: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/3.jpg)
Deep learning
Binarized Neural Networks (BNNs)
BNNs are NNs with binary weights and activations.
Similar performance to standard deep learning.
More efficient (w.r.t. energy and memory) at deploy time.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 2 / 42
![Page 4: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/4.jpg)
Deep learning
Binarized Neural Networks (BNNs)
BNNs are NNs with binary weights and activations.
Similar performance to standard deep learning.
More efficient (w.r.t. energy and memory) at deploy time.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 2 / 42
![Page 5: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/5.jpg)
Deep learning
Binarized Neural Networks (BNNs)
BNNs are NNs with binary weights and activations.
Similar performance to standard deep learning.
More efficient (w.r.t. energy and memory) at deploy time.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 2 / 42
![Page 6: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/6.jpg)
How to train BNNs
![Page 7: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/7.jpg)
Supervised learning
Objective
Learn a function that maps inputs to outputs from examples.
5 0 4 1 8 6 2 7 3 9
Example: Digit recognition.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 4 / 42
![Page 8: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/8.jpg)
Supervised learning
Objective
Learn a function that maps inputs to outputs from examples.
5 0 4 1 8 6 2 7 3 9
Example: Digit recognition.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 4 / 42
![Page 9: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/9.jpg)
The binarized perceptron
... ... ...
{-1,1}
n =
{+1 if
∑i wi · xi ≥ 0
−1 otherwise, where wi ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 5 / 42
![Page 10: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/10.jpg)
The binarized perceptron
... ... ...
-1
n =
{+1 if
∑i wi · xi ≥ 0
−1 otherwise, where wi ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 5 / 42
![Page 11: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/11.jpg)
The binarized perceptron
... ... ...
-1
n =
{+1 if
∑i wi · xi ≥ 0
−1 otherwise, where wi ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 5 / 42
![Page 12: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/12.jpg)
The binarized perceptron
... ... ...
+1
n =
{+1 if
∑i wi · xi ≥ 0
−1 otherwise, where wi ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 5 / 42
![Page 13: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/13.jpg)
The binarized perceptron
... ... ...
-1
n =
{+1 if
∑i wi · xi ≥ 0
−1 otherwise, where wi ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 5 / 42
![Page 14: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/14.jpg)
The binarized perceptron
... ... ...
-1
How can we use the perceptron for digit recognition?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 5 / 42
![Page 15: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/15.jpg)
The binarized perceptron
... ... ...
nj =
{+1 if
∑i wij · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 16: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/16.jpg)
The binarized perceptron
... ... ...
nj =
{+1 if
∑i wij · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 17: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/17.jpg)
The binarized perceptron
... ... ...
n0 =
{+1 if
∑i wi0 · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 18: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/18.jpg)
The binarized perceptron
... ... ...
n1 =
{+1 if
∑i wi1 · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 19: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/19.jpg)
The binarized perceptron
... ... ...
n2 =
{+1 if
∑i wi2 · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 20: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/20.jpg)
The binarized perceptron
... ... ...
n3 =
{+1 if
∑i wi3 · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 21: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/21.jpg)
The binarized perceptron
... ... ...
n4 =
{+1 if
∑i wi4 · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 22: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/22.jpg)
The binarized perceptron
... ... ...
n5 =
{+1 if
∑i wi5 · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 23: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/23.jpg)
The binarized perceptron
... ... ...
n6 =
{+1 if
∑i wi6 · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 24: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/24.jpg)
The binarized perceptron
... ... ...
n7 =
{+1 if
∑i wi7 · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 25: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/25.jpg)
The binarized perceptron
... ... ...
n8 =
{+1 if
∑i wi8 · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 26: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/26.jpg)
The binarized perceptron
... ... ...
n9 =
{+1 if
∑i wi9 · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 27: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/27.jpg)
The binarized perceptron
... ... ...
nj =
{+1 if
∑i wij · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 28: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/28.jpg)
The binarized perceptron
... ... ...
nj =
{+1 if
∑i wij · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 29: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/29.jpg)
The binarized perceptron
... ... ...
nj =
{+1 if
∑i wij · xi ≥ 0
−1 otherwise, where wij ∈ {−1, 0, 1}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 30: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/30.jpg)
The binarized perceptron
... ... ...
Problem: for most training sets, this problem is infeasible.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 6 / 42
![Page 31: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/31.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 32: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/32.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 33: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/33.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 34: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/34.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 35: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/35.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 36: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/36.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 37: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/37.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 38: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/38.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 39: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/39.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 40: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/40.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 41: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/41.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 42: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/42.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 43: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/43.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 44: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/44.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 45: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/45.jpg)
(Deep) binarized neural networks
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 7 / 42
![Page 46: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/46.jpg)
Formal definition
Any assignment to W defines a function NW : RN0 → {−1, 1}NL :
n0j = xj (first layer).
n`j = 1 if∑
i wi`j · n(`−1)i ≥ 0; -1 otherwise.
Training a BNN
Given T = {(x1, y1), . . . , (xT , yT )}, find W s.t. NW(xk) ≈ yk ∀k .
How can we train a BNN?
Use gradient descent!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 8 / 42
![Page 47: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/47.jpg)
Formal definition
Any assignment to W defines a function NW : RN0 → {−1, 1}NL :
n0j = xj (first layer).
n`j = 1 if∑
i wi`j · n(`−1)i ≥ 0; -1 otherwise.
Training a BNN
Given T = {(x1, y1), . . . , (xT , yT )}, find W s.t. NW(xk) ≈ yk ∀k .
How can we train a BNN?
Use gradient descent!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 8 / 42
![Page 48: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/48.jpg)
Formal definition
Any assignment to W defines a function NW : RN0 → {−1, 1}NL :
n0j = xj (first layer).
n`j = 1 if∑
i wi`j · n(`−1)i ≥ 0; -1 otherwise.
Training a BNN
Given T = {(x1, y1), . . . , (xT , yT )}, find W s.t. NW(xk) ≈ yk ∀k .
How can we train a BNN?
Use gradient descent!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 8 / 42
![Page 49: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/49.jpg)
Formal definition
Any assignment to W defines a function NW : RN0 → {−1, 1}NL :
n0j = xj (first layer).
n`j = 1 if∑
i wi`j · n(`−1)i ≥ 0; -1 otherwise.
Training a BNN
Given T = {(x1, y1), . . . , (xT , yT )}, find W s.t. NW(xk) ≈ yk ∀k .
How can we train a BNN?
Use gradient descent!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 8 / 42
![Page 50: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/50.jpg)
How to train binarized neural networks
Wait! we cannot compute gradients over discrete weights.
Train over continuous weights and activations.
Binarize the weights and activations during the forward pass.
Use continuous weights and activations in the backward pass.
[To me] It feels like an odd hack to GD... but it works in practice.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 9 / 42
![Page 51: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/51.jpg)
How to train binarized neural networks
Wait! we cannot compute gradients over discrete weights.
Train over continuous weights and activations.
Binarize the weights and activations during the forward pass.
Use continuous weights and activations in the backward pass.
[To me] It feels like an odd hack to GD... but it works in practice.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 9 / 42
![Page 52: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/52.jpg)
How to train binarized neural networks
Wait! we cannot compute gradients over discrete weights.
Train over continuous weights and activations.
Binarize the weights and activations during the forward pass.
Use continuous weights and activations in the backward pass.
[To me] It feels like an odd hack to GD... but it works in practice.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 9 / 42
![Page 53: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/53.jpg)
About this work
![Page 54: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/54.jpg)
Main contributions
1. Show that training BNNs is a discrete optimization problem.
2. Propose a MIP, CP, and MIP/CP hybrid model to train BNNs.
3. Run an extensive experimental comparison.
Code: https://bitbucket.org/RToroIcarte/bnn
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 11 / 42
![Page 55: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/55.jpg)
Related Work
SAT and MIP have been used for generating adversarial examplesand verifying properties of BNNs:
e.g. Fischetti et al. (2017), Tjeng et al. (2017), Khalil et al. (2018),
Narodytska (2018), Cheng et al. (2018), among others.
To the best of our knowledge, this is the first work that proposes
model-based approaches to train BNNs... but why?
Model-based approaches can find provably optimal solutions, butthey have two (fundamental) issues:
Scalability.
Overfitting.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 12 / 42
![Page 56: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/56.jpg)
Related Work
SAT and MIP have been used for generating adversarial examplesand verifying properties of BNNs:
e.g. Fischetti et al. (2017), Tjeng et al. (2017), Khalil et al. (2018),
Narodytska (2018), Cheng et al. (2018), among others.
To the best of our knowledge, this is the first work that proposes
model-based approaches to train BNNs.
.. but why?
Model-based approaches can find provably optimal solutions, butthey have two (fundamental) issues:
Scalability.
Overfitting.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 12 / 42
![Page 57: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/57.jpg)
Related Work
SAT and MIP have been used for generating adversarial examplesand verifying properties of BNNs:
e.g. Fischetti et al. (2017), Tjeng et al. (2017), Khalil et al. (2018),
Narodytska (2018), Cheng et al. (2018), among others.
To the best of our knowledge, this is the first work that proposes
model-based approaches to train BNNs... but why?
Model-based approaches can find provably optimal solutions, butthey have two (fundamental) issues:
Scalability.
Overfitting.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 12 / 42
![Page 58: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/58.jpg)
Related Work
SAT and MIP have been used for generating adversarial examplesand verifying properties of BNNs:
e.g. Fischetti et al. (2017), Tjeng et al. (2017), Khalil et al. (2018),
Narodytska (2018), Cheng et al. (2018), among others.
To the best of our knowledge, this is the first work that proposes
model-based approaches to train BNNs... but why?
Model-based approaches can find provably optimal solutions
, butthey have two (fundamental) issues:
Scalability.
Overfitting.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 12 / 42
![Page 59: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/59.jpg)
Related Work
SAT and MIP have been used for generating adversarial examplesand verifying properties of BNNs:
e.g. Fischetti et al. (2017), Tjeng et al. (2017), Khalil et al. (2018),
Narodytska (2018), Cheng et al. (2018), among others.
To the best of our knowledge, this is the first work that proposes
model-based approaches to train BNNs... but why?
Model-based approaches can find provably optimal solutions, butthey have two (fundamental) issues:
Scalability.
Overfitting.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 12 / 42
![Page 60: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/60.jpg)
Scalability
![Page 61: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/61.jpg)
A rough estimation
AlexNet:
Neurons: 154K (input) – 707K (hidden) – 1K (output).
Weights: 62.3 millions.
ImageNet: 14 million examples.
How many discrete decision variables are needed?62.3M (weights) + 0.7M× 14M (hidden activations)
≈ 9.89 · 1012 decision variables!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 14 / 42
![Page 62: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/62.jpg)
A rough estimation
AlexNet:
Neurons: 154K (input) – 707K (hidden) – 1K (output).
Weights: 62.3 millions.
ImageNet: 14 million examples.
How many discrete decision variables are needed?
62.3M (weights) + 0.7M× 14M (hidden activations)≈ 9.89 · 1012 decision variables!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 14 / 42
![Page 63: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/63.jpg)
A rough estimation
AlexNet:
Neurons: 154K (input) – 707K (hidden) – 1K (output).
Weights: 62.3 millions.
ImageNet: 14 million examples.
How many discrete decision variables are needed?62.3M (weights)
+ 0.7M× 14M (hidden activations)≈ 9.89 · 1012 decision variables!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 14 / 42
![Page 64: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/64.jpg)
A rough estimation
AlexNet:
Neurons: 154K (input) – 707K (hidden) – 1K (output).
Weights: 62.3 millions.
ImageNet: 14 million examples.
How many discrete decision variables are needed?62.3M (weights) + 0.7M× 14M (hidden activations)
≈ 9.89 · 1012 decision variables!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 14 / 42
![Page 65: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/65.jpg)
A rough estimation
AlexNet:
Neurons: 154K (input) – 707K (hidden) – 1K (output).
Weights: 62.3 millions.
ImageNet: 14 million examples.
How many discrete decision variables are needed?62.3M (weights) + 0.7M× 14M (hidden activations)
≈ 9.89 · 1012 decision variables!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 14 / 42
![Page 66: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/66.jpg)
Fortunately, not all is about big data ;)
![Page 67: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/67.jpg)
Few-shot learning
Challenge: learn a good classifier given limited training data.
Why?1. Humans learn with far less examples than deep networks.2. Collecting large amounts of labeled data is expensive
... and sometimes impossible (e.g., healthcare).
E.g., let’s say that we only have access to the following examples:
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 16 / 42
![Page 68: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/68.jpg)
Few-shot learning
Challenge: learn a good classifier given limited training data.
Why?1. Humans learn with far less examples than deep networks.2. Collecting large amounts of labeled data is expensive
... and sometimes impossible (e.g., healthcare).
E.g., let’s say that we only have access to the following examples:
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 16 / 42
![Page 69: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/69.jpg)
Few-shot learning
Challenge: learn a good classifier given limited training data.
Why?1. Humans learn with far less examples than deep networks.2. Collecting large amounts of labeled data is expensive
... and sometimes impossible (e.g., healthcare).
E.g., let’s say that we only have access to the following examples:
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 16 / 42
![Page 70: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/70.jpg)
Few-shot learning
Challenge: learn a good classifier given limited training data.
Why?1. Humans learn with far less examples than deep networks.2. Collecting large amounts of labeled data is expensive
... and sometimes impossible (e.g., healthcare).
E.g., let’s say that we only have access to the following examples:
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 16 / 42
![Page 71: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/71.jpg)
Few-shot learning
Challenge: learn a good classifier given limited training data.
Why?1. Humans learn with far less examples than deep networks.2. Collecting large amounts of labeled data is expensive
... and sometimes impossible (e.g., healthcare).
E.g., let’s say that we only have access to the following examples:
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 16 / 42
![Page 72: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/72.jpg)
Training BNNs: A feasibility perspective
Objective: Find a BNN that fits the data.
Problem definition
Given T = {(x1, y1), . . . , (xT , yT )}, find W s.t. NW(xk) = yk ∀k .
...also known as 100% train performance.
Let’s start by formulating this problem as a MIP model.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 17 / 42
![Page 73: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/73.jpg)
Training BNNs: A feasibility perspective
Objective: Find a BNN that fits the data.
Problem definition
Given T = {(x1, y1), . . . , (xT , yT )}, find W s.t. NW(xk) = yk ∀k .
...also known as 100% train performance.
Let’s start by formulating this problem as a MIP model.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 17 / 42
![Page 74: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/74.jpg)
Training BNNs: A feasibility perspective
Objective: Find a BNN that fits the data.
Problem definition
Given T = {(x1, y1), . . . , (xT , yT )}, find W s.t. NW(xk) = yk ∀k .
...also known as 100% train performance.
Let’s start by formulating this problem as a MIP model.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 17 / 42
![Page 75: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/75.jpg)
MIP Model
5 0 4 1 8 6 2 7 3 9
... ... ...
Goal: Find W such that NW(xk) = yk for all k ∈ {1 . . .T}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 18 / 42
![Page 76: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/76.jpg)
MIP Model
5 0 4 1 8 6 2 7 3 9
... ... ...
Goal: Find W such that NW(xk) = yk for all k ∈ {1 . . .T}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 18 / 42
![Page 77: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/77.jpg)
MIP Model
5 0 4 1 8 6 2 7 3 9
... ... ...
Goal: Find W such that NW(xk) = yk for all k ∈ {1 . . .T}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 18 / 42
![Page 78: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/78.jpg)
MIP Model
... ... ...
Goal: Find W such that NW(xk) = yk for all k ∈ {1 . . .T}.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 18 / 42
![Page 79: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/79.jpg)
MIP Model
... ... ...
∑i
wij · x0i ≥ 0 j = 5∑i
wij · x0i < 0 j 6= 5
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 18 / 42
![Page 80: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/80.jpg)
MIP Model
... ... ...
∑i
wij · x1i ≥ 0 j = 0∑i
wij · x1i < 0 j 6= 0
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 18 / 42
![Page 81: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/81.jpg)
MIP Model
wij ∈ {−1, 0, 1} ∀i ∈ N0, j ∈ NL
N0∑i=1
xki · wij ≥ 0 ∀j ∈ NL, k ∈ T : ykj = 1
N0∑i=1
xki · wij ≤ −ε ∀j ∈ NL, k ∈ T : ykj = −1
What if the BNN has hidden layers?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 19 / 42
![Page 82: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/82.jpg)
MIP Model
wij ∈ {−1, 0, 1} ∀i ∈ N0, j ∈ NL
N0∑i=1
xki · wij ≥ 0 ∀j ∈ NL, k ∈ T : ykj = 1
N0∑i=1
xki · wij ≤ −ε ∀j ∈ NL, k ∈ T : ykj = −1
What if the BNN has hidden layers?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 19 / 42
![Page 83: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/83.jpg)
MIP Model
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 20 / 42
![Page 84: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/84.jpg)
MIP Model
... ... ...
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 20 / 42
![Page 85: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/85.jpg)
MIP Model
We need to model the neuron activations using extra variables:
uk`j is 1 if neuron j in layer ` is active given xk ∈ T and 0 o/w.
2 · uk`j − 1 is the neuron’s output.
(uk`j = 1) =⇒∑
i∈N`−1wi`j · (2 · uk(`−1)i − 1) ≥ 0
(uk`j = 0) =⇒∑
i∈N`−1wi`j · (2 · uk(`−1)i − 1) ≤ −ε
We need to model w · n using extra variables:
Add variable cki`j to represent wi`j · (2 · uk(`−1)i − 1)
(uk`j = 1) =⇒ (cki`j = wi`j)
(uk`j = 0) =⇒ (cki`j = −wi`j)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 21 / 42
![Page 86: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/86.jpg)
MIP Model
We need to model the neuron activations using extra variables:
uk`j is 1 if neuron j in layer ` is active given xk ∈ T and 0 o/w.
2 · uk`j − 1 is the neuron’s output.
(uk`j = 1) =⇒∑
i∈N`−1wi`j · (2 · uk(`−1)i − 1) ≥ 0
(uk`j = 0) =⇒∑
i∈N`−1wi`j · (2 · uk(`−1)i − 1) ≤ −ε
We need to model w · n using extra variables:
Add variable cki`j to represent wi`j · (2 · uk(`−1)i − 1)
(uk`j = 1) =⇒ (cki`j = wi`j)
(uk`j = 0) =⇒ (cki`j = −wi`j)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 21 / 42
![Page 87: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/87.jpg)
MIP Model
We need to model the neuron activations using extra variables:
uk`j is 1 if neuron j in layer ` is active given xk ∈ T and 0 o/w.
2 · uk`j − 1 is the neuron’s output.
(uk`j = 1) =⇒∑
i∈N`−1wi`j · (2 · uk(`−1)i − 1) ≥ 0
(uk`j = 0) =⇒∑
i∈N`−1wi`j · (2 · uk(`−1)i − 1) ≤ −ε
We need to model w · n using extra variables:
Add variable cki`j to represent wi`j · (2 · uk(`−1)i − 1)
(uk`j = 1) =⇒ (cki`j = wi`j)
(uk`j = 0) =⇒ (cki`j = −wi`j)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 21 / 42
![Page 88: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/88.jpg)
MIP Model
We need to model the neuron activations using extra variables:
uk`j is 1 if neuron j in layer ` is active given xk ∈ T and 0 o/w.
2 · uk`j − 1 is the neuron’s output.
(uk`j = 1) =⇒∑
i∈N`−1wi`j · (2 · uk(`−1)i − 1) ≥ 0
(uk`j = 0) =⇒∑
i∈N`−1wi`j · (2 · uk(`−1)i − 1) ≤ −ε
We need to model w · n using extra variables:
Add variable cki`j to represent wi`j · (2 · uk(`−1)i − 1)
(uk`j = 1) =⇒ (cki`j = wi`j)
(uk`j = 0) =⇒ (cki`j = −wi`j)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 21 / 42
![Page 89: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/89.jpg)
MIP Model
ak`j =∑
i∈N`−1
cki`j ∀` ∈ L, j ∈ N`, k ∈ T
akLj ≥ 0 ∀j ∈ NL, k ∈ T : ykj = 1
akLj ≤ −ε ∀j ∈ NL, k ∈ T : y tj = −1
(uk`j = 1) =⇒ (ak`j ≥ 0) ∀` ∈ LL−1, j ∈ N`, k ∈ T
(uk`j = 0) =⇒ (ak`j ≤ −ε) ∀` ∈ LL−1, j ∈ N`, k ∈ T
cki1j = xki · wi1j ∀i ∈ N0, j ∈ N1, k ∈ T
(uk`j = 1) =⇒ (cki`j = wi`j ) ∀` ∈ L2, i ∈ N`−1, j ∈ N`, k ∈ T
(uk`j = 0) =⇒ (cki`j = −wi`j ) ∀` ∈ L2, i ∈ N`−1, j ∈ N`, k ∈ T
wi`j ∈ {−1, 0, 1} ∀` ∈ L, i ∈ N`−1, j ∈ N`
uk`j ∈ {0, 1} ∀` ∈ LL−1, j ∈ N`, k ∈ T
cki`j ∈ R ∀` ∈ L, i ∈ N`−1, j ∈ N`, k ∈ T
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 22 / 42
![Page 90: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/90.jpg)
MIP Model
“This model is a nightmare for a MIP solver”—Andre A. Cire
1. It has (way too) many auxiliary variables:
uk`j ∈ {0, 1} ∀` ∈ LL−1, j ∈ N`, k ∈ T
cki`j ∈ R ∀` ∈ L, i ∈ N`−1, j ∈ N`, k ∈ T
2. Everywhere I look, I see an implication constraint:
(uk`j = 1) =⇒ (ak`j ≥ 0) ∀` ∈ LL−1, j ∈ N`, k ∈ T
(uk`j = 0) =⇒ (ak`j ≤ −ε) ∀` ∈ LL−1, j ∈ N`, k ∈ T
(uk`j = 1) =⇒ (cki`j = wi`j) ∀` ∈ L2, i ∈ N`−1, j ∈ N`, k ∈ T
(uk`j = 0) =⇒ (cki`j = −wi`j) ∀` ∈ L2, i ∈ N`−1, j ∈ N`, k ∈ T
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 23 / 42
![Page 91: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/91.jpg)
MIP Model
“This model is a nightmare for a MIP solver”—Andre A. Cire
1. It has (way too) many auxiliary variables:
uk`j ∈ {0, 1} ∀` ∈ LL−1, j ∈ N`, k ∈ T
cki`j ∈ R ∀` ∈ L, i ∈ N`−1, j ∈ N`, k ∈ T
2. Everywhere I look, I see an implication constraint:
(uk`j = 1) =⇒ (ak`j ≥ 0) ∀` ∈ LL−1, j ∈ N`, k ∈ T
(uk`j = 0) =⇒ (ak`j ≤ −ε) ∀` ∈ LL−1, j ∈ N`, k ∈ T
(uk`j = 1) =⇒ (cki`j = wi`j) ∀` ∈ L2, i ∈ N`−1, j ∈ N`, k ∈ T
(uk`j = 0) =⇒ (cki`j = −wi`j) ∀` ∈ L2, i ∈ N`−1, j ∈ N`, k ∈ T
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 23 / 42
![Page 92: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/92.jpg)
MIP Model
“This model is a nightmare for a MIP solver”—Andre A. Cire
1. It has (way too) many auxiliary variables:
uk`j ∈ {0, 1} ∀` ∈ LL−1, j ∈ N`, k ∈ T
cki`j ∈ R ∀` ∈ L, i ∈ N`−1, j ∈ N`, k ∈ T
2. Everywhere I look, I see an implication constraint:
(uk`j = 1) =⇒ (ak`j ≥ 0) ∀` ∈ LL−1, j ∈ N`, k ∈ T
(uk`j = 0) =⇒ (ak`j ≤ −ε) ∀` ∈ LL−1, j ∈ N`, k ∈ T
(uk`j = 1) =⇒ (cki`j = wi`j) ∀` ∈ L2, i ∈ N`−1, j ∈ N`, k ∈ T
(uk`j = 0) =⇒ (cki`j = −wi`j) ∀` ∈ L2, i ∈ N`−1, j ∈ N`, k ∈ T
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 23 / 42
![Page 93: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/93.jpg)
CP model
We do not need auxiliary variables for this problem:
nkLj = ykj ∀j ∈ NL, k ∈ T
wi`j ∈ {−1, 0, 1} ∀` ∈ L, i ∈ N`−1, j ∈ N`
where nk`j is a CP expression recursively defined as follows:
nk0j = xkj ∀j ∈ N0, k ∈ T
nk`j = 2(scal prod(w`j ,n
k`−1) ≥ 0
)− 1 ∀` ∈ L \ {L}, j ∈ N`, k ∈ T
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 24 / 42
![Page 94: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/94.jpg)
CP model
We do not need auxiliary variables for this problem:
nkLj = ykj ∀j ∈ NL, k ∈ T
wi`j ∈ {−1, 0, 1} ∀` ∈ L, i ∈ N`−1, j ∈ N`
where nk`j is a CP expression recursively defined as follows:
nk0j = xkj ∀j ∈ N0, k ∈ T
nk`j = 2(scal prod(w`j ,n
k`−1) ≥ 0
)− 1 ∀` ∈ L \ {L}, j ∈ N`, k ∈ T
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 24 / 42
![Page 95: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/95.jpg)
CP model
We do not need auxiliary variables for this problem:
nkLj = ykj ∀j ∈ NL, k ∈ T
wi`j ∈ {−1, 0, 1} ∀` ∈ L, i ∈ N`−1, j ∈ N`
where nk`j is a CP expression recursively defined as follows:
nk0j = xkj ∀j ∈ N0, k ∈ T
nk`j = 2(scal prod(w`j ,n
k`−1) ≥ 0
)− 1 ∀` ∈ L \ {L}, j ∈ N`, k ∈ T
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 24 / 42
![Page 96: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/96.jpg)
CP model
We do not need auxiliary variables for this problem:
nkLj = ykj ∀j ∈ NL, k ∈ T
wi`j ∈ {−1, 0, 1} ∀` ∈ L, i ∈ N`−1, j ∈ N`
where nk`j is a CP expression recursively defined as follows:
nk0j = xkj ∀j ∈ N0, k ∈ T
nk`j = 2(scal prod(w`j ,n
k`−1) ≥ 0
)− 1 ∀` ∈ L \ {L}, j ∈ N`, k ∈ T
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 24 / 42
![Page 97: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/97.jpg)
A feasibility experiment
Approaches:
GD: Standard gradient-based approach.
MIP: MIP model solved by Gurobi 8.1
CP: CP model solved by CP Optimizer 12.8
Problem instances:
A 100 small training sets sampled from MNIST.
Each training set has from 1 to 10 examples per class.
Zero, one, and two hidden layers with 16 neurons.
Question:
Which approach solves more instances given a 2h time limit?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 25 / 42
![Page 98: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/98.jpg)
A feasibility experiment
Approaches:
GD: Standard gradient-based approach.
MIP: MIP model solved by Gurobi 8.1
CP: CP model solved by CP Optimizer 12.8
Problem instances:
A 100 small training sets sampled from MNIST.
Each training set has from 1 to 10 examples per class.
Zero, one, and two hidden layers with 16 neurons.
Question:
Which approach solves more instances given a 2h time limit?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 25 / 42
![Page 99: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/99.jpg)
A feasibility experiment
Approaches:
GD: Standard gradient-based approach.
MIP: MIP model solved by Gurobi 8.1
CP: CP model solved by CP Optimizer 12.8
Problem instances:
A 100 small training sets sampled from MNIST.
Each training set has from 1 to 10 examples per class.
Zero, one, and two hidden layers with 16 neurons.
Question:
Which approach solves more instances given a 2h time limit?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 25 / 42
![Page 100: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/100.jpg)
A feasibility experiment
Approaches:
GD: Standard gradient-based approach.
MIP: MIP model solved by Gurobi 8.1
CP: CP model solved by CP Optimizer 12.8
Problem instances:
A 100 small training sets sampled from MNIST.
Each training set has from 1 to 10 examples per class.
Zero, one, and two hidden layers with 16 neurons.
Question:
Which approach solves more instances given a 2h time limit?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 25 / 42
![Page 101: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/101.jpg)
A feasibility experiment
No hidden layers One hidden layer Two hidden layers
|T | MIP GD CP MIP GD CP MIP GD CP
10 10 10 10 10 9.6 10 9 9.2 1020 10 10 10 7 5.6 10 0 8.4 1030 10 10 10 0 0.4 9 0 5.2 1040 10 10 10 0 0 8 0 6.2 1050 10 10 10 0 0 8 0 4.2 1060 10 10 10 0 0 7 0 2.2 1070 10 10 10 0 0 3 0 0 1080 10 10 10 0 0 3 0 0 1090 10 10 8 0 0 1 0 0 8
100 10 10 8 0 0 0 0 0 6
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 26 / 42
![Page 102: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/102.jpg)
Overfitting
![Page 103: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/103.jpg)
Overfitting
Memorizing is not learning!
The real goal is to find weights that generalize (small testing error).
small training error 6= small testing error
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 28 / 42
![Page 104: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/104.jpg)
Overfitting
Memorizing is not learning!
The real goal is to find weights that generalize (small testing error).
small training error 6= small testing error
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 28 / 42
![Page 105: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/105.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 106: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/106.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 107: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/107.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 108: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/108.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 109: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/109.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 110: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/110.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 111: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/111.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 112: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/112.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 113: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/113.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 114: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/114.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 115: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/115.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 116: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/116.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
?
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 117: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/117.jpg)
Towards finding solutions that generalize
5 0 4 1 8 6 2 7 3 9
We better classify them correctly!
?
While most solutions overfit... some generalize.
How can we identify them?
Two principles: simplicity & robustness.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 29 / 42
![Page 118: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/118.jpg)
The simplicity principle
Occam’s razor: prefer the simplest BNN that fits the data.
minW
{∑w∈W
|w | : NW(x) = y, ∀(x, y) ∈ T , w ∈ {−1, 0, 1}, ∀w ∈W
}.
(min-weight)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 30 / 42
![Page 119: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/119.jpg)
The simplicity principle
Occam’s razor: prefer the simplest BNN that fits the data.
minW
{∑w∈W
|w | : NW(x) = y, ∀(x, y) ∈ T , w ∈ {−1, 0, 1}, ∀w ∈W
}.
(min-weight)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 30 / 42
![Page 120: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/120.jpg)
The simplicity principle
Occam’s razor: prefer the simplest BNN that fits the data.
minW
{∑w∈W
|w | : NW(x) = y, ∀(x, y) ∈ T , w ∈ {−1, 0, 1}, ∀w ∈W
}.
(min-weight)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 30 / 42
![Page 121: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/121.jpg)
The robustness principle
Prefer robust solutions
BNNs that fit the data under small perturbations to their weights.
maxW
∑`∈L,j∈N`
min{|a`j(x)| : (x, y) ∈ T }
s.t. NW(x) = y ∀(x, y) ∈ Tw ∈ {−1, 0, 1} ∀w ∈W
(max-margin)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 31 / 42
![Page 122: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/122.jpg)
The robustness principle
Prefer robust solutions
BNNs that fit the data under small perturbations to their weights.
maxW
∑`∈L,j∈N`
min{|a`j(x)| : (x, y) ∈ T }
s.t. NW(x) = y ∀(x, y) ∈ Tw ∈ {−1, 0, 1} ∀w ∈W
(max-margin)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 31 / 42
![Page 123: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/123.jpg)
The robustness principle
Prefer robust solutions
BNNs that fit the data under small perturbations to their weights.
A
B
maxW
∑`∈L,j∈N`
min{|a`j(x)| : (x, y) ∈ T }
s.t. NW(x) = y ∀(x, y) ∈ Tw ∈ {−1, 0, 1} ∀w ∈W
(max-margin)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 31 / 42
![Page 124: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/124.jpg)
The robustness principle
Prefer robust solutions
BNNs that fit the data under small perturbations to their weights.
A
B
maxW
∑`∈L,j∈N`
min{|a`j(x)| : (x, y) ∈ T }
s.t. NW(x) = y ∀(x, y) ∈ Tw ∈ {−1, 0, 1} ∀w ∈W
(max-margin)
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 31 / 42
![Page 125: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/125.jpg)
An optimality experiment
Approaches:
CPw and CPm: min-weight and max-margin CP models.
MIPw and MIPm: min-weight and max-margin MIP models.
Problem instances:
Same 100 instances using 0, 1, or 2 hidden layers.
Question:
Will MIP or CP find better solutions given a 2h time limit?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 32 / 42
![Page 126: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/126.jpg)
An optimality experiment
Approaches:
CPw and CPm: min-weight and max-margin CP models.
MIPw and MIPm: min-weight and max-margin MIP models.
Problem instances:
Same 100 instances using 0, 1, or 2 hidden layers.
Question:
Will MIP or CP find better solutions given a 2h time limit?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 32 / 42
![Page 127: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/127.jpg)
An optimality experiment
Approaches:
CPw and CPm: min-weight and max-margin CP models.
MIPw and MIPm: min-weight and max-margin MIP models.
Problem instances:
Same 100 instances using 0, 1, or 2 hidden layers.
Question:
Will MIP or CP find better solutions given a 2h time limit?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 32 / 42
![Page 128: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/128.jpg)
An optimality experiment
Approaches:
CPw and CPm: min-weight and max-margin CP models.
MIPw and MIPm: min-weight and max-margin MIP models.
Problem instances:
Same 100 instances using 0, 1, or 2 hidden layers.
Question:
Will MIP or CP find better solutions given a 2h time limit?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 32 / 42
![Page 129: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/129.jpg)
An optimality experiment
No hidden layers One hidden layer Two hidden layers
100 101 102 103100
101
102
103
MIPw
CPw
(a) Min-weight optimization
101 102 103 104 105 106
101
102
103
104
105
106
MIPm
CPm
(b) Max-margin optimization
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 33 / 42
![Page 130: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/130.jpg)
CP/MIP hybrid methods
Idea: use CP to find feasible solutions and MIP to optimize them.
Option 1: model HW
Use the CP solution as a warm-start for MIP.
Option 2: model HA
Use the CP solution to fix the activations of all neurons in the MIPmodel and search only over the weights.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 34 / 42
![Page 131: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/131.jpg)
CP/MIP hybrid methods
Idea: use CP to find feasible solutions and MIP to optimize them.
Option 1: model HW
Use the CP solution as a warm-start for MIP.
Option 2: model HA
Use the CP solution to fix the activations of all neurons in the MIPmodel and search only over the weights.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 34 / 42
![Page 132: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/132.jpg)
CP/MIP hybrid methods
Idea: use CP to find feasible solutions and MIP to optimize them.
Option 1: model HW
Use the CP solution as a warm-start for MIP.
Option 2: model HA
Use the CP solution to fix the activations of all neurons in the MIPmodel and search only over the weights.
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 34 / 42
![Page 133: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/133.jpg)
CP/MIP hybrid methods
“This model is a nightmare for a MIP solver”—Andre A. Cire
1. It has (way too) many auxiliary variables:
uk`j ∈ {0, 1} ∀` ∈ LL−1, j ∈ N`, k ∈ T
cki`j ∈ R ∀` ∈ L, i ∈ N`−1, j ∈ N`, k ∈ T
2. Everywhere I look, I see an implication constraint:
(uk`j = 1) =⇒ (ak`j ≥ 0) ∀` ∈ LL−1, j ∈ N`, k ∈ T
(uk`j = 0) =⇒ (ak`j ≤ −ε) ∀` ∈ LL−1, j ∈ N`, k ∈ T
(uk`j = 1) =⇒ (cki`j = wi`j) ∀` ∈ L2, i ∈ N`−1, j ∈ N`, k ∈ T
(uk`j = 0) =⇒ (cki`j = −wi`j) ∀` ∈ L2, i ∈ N`−1, j ∈ N`, k ∈ T
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 35 / 42
![Page 134: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/134.jpg)
CP/MIP hybrid methods
A
B
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 36 / 42
![Page 135: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/135.jpg)
CP/MIP hybrid methods
CP
A
B
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 36 / 42
![Page 136: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/136.jpg)
CP/MIP hybrid methods
CP
HW
A
B
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 36 / 42
![Page 137: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/137.jpg)
CP/MIP hybrid methods
HA
CP
HW
A
B
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 36 / 42
![Page 138: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/138.jpg)
CP/MIP hybrid methods
HA
CP
HW
A
B
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 36 / 42
![Page 139: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/139.jpg)
Results
![Page 140: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/140.jpg)
A generalization experiment
Approaches:
CPw , CPm, MIPw , and MIPm as before.
HWw and HWm: min-weight and max-margin warm-start CP/MIP.
HAw and HAm: min-weight and max-margin fixed-activation CP/MIP.
GDb and GDt : Two versions of gradient descent.
Problem instances:
Same 100 instances using 0, 1, or 2 hidden layers.
Question:
Which model finds solutions that generalize better within 2h?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 38 / 42
![Page 141: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/141.jpg)
A generalization experiment
Approaches:
CPw , CPm, MIPw , and MIPm as before.
HWw and HWm: min-weight and max-margin warm-start CP/MIP.
HAw and HAm: min-weight and max-margin fixed-activation CP/MIP.
GDb and GDt : Two versions of gradient descent.
Problem instances:
Same 100 instances using 0, 1, or 2 hidden layers.
Question:
Which model finds solutions that generalize better within 2h?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 38 / 42
![Page 142: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/142.jpg)
A generalization experiment
Approaches:
CPw , CPm, MIPw , and MIPm as before.
HWw and HWm: min-weight and max-margin warm-start CP/MIP.
HAw and HAm: min-weight and max-margin fixed-activation CP/MIP.
GDb and GDt : Two versions of gradient descent.
Problem instances:
Same 100 instances using 0, 1, or 2 hidden layers.
Question:
Which model finds solutions that generalize better within 2h?
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 38 / 42
![Page 143: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/143.jpg)
A generalization experiment
Test performance
No hidden layers One hidden layer Two hidden layers
0 0.2 0.4 0.60
0.2
0.4
0.6
HAw
max{GDb,GDt}
(c) Min-weight optimization
0 0.2 0.4 0.60
0.2
0.4
0.6
HAmm
ax{GDb,GDt}
(d) Max-margin optimization
HAm outperforms max{GDb, GDt} in 253 out of 300 experiments!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 39 / 42
![Page 144: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/144.jpg)
A generalization experiment
Test performance
No hidden layers One hidden layer Two hidden layers
0 0.2 0.4 0.60
0.2
0.4
0.6
HAw
max{GDb,GDt}
(e) Min-weight optimization
0 0.2 0.4 0.60
0.2
0.4
0.6
HAmm
ax{GDb,GDt}
(f) Max-margin optimization
HAm outperforms max{GDb, GDt} in 253 out of 300 experiments!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 39 / 42
![Page 145: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/145.jpg)
A generalization experiment
HAm outperforms alternatives by a large margin!
10 50 900
0.2
0.4
0.6
# Examples
TestPerform
ance
(g) 2 HL, min-weight
10 50 900
0.2
0.4
0.6
# Examples
TestPerform
ance
(h) 2 HL, max-margin
CP MIP HW HA GDb GDt
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 40 / 42
![Page 146: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/146.jpg)
A generalization experiment
HAm outperforms alternatives by a large margin!
10 50 900
0.2
0.4
0.6
# Examples
TestPerform
ance
(i) 2 HL, min-weight
10 50 900
0.2
0.4
0.6
# Examples
TestPerform
ance
(j) 2 HL, max-margin
CP MIP HW HA GDb GDt
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 40 / 42
![Page 147: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/147.jpg)
Concluding remarks
![Page 148: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/148.jpg)
Concluding remarks
Summary:
Training BNNs is a discrete optimization problem.
We can train BNNs using MIP and CP, but:
Use small datasets.Optimize some proxy for generalizability.
Our HAm model either outperformed GD or timed out.
Open questions:
How far can model-based approaches scale?
What other proxies for generalization are worth studying?
Are there meaningful ways to combine GD with MIP and CP?
Thanks!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 42 / 42
![Page 149: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/149.jpg)
Concluding remarks
Summary:
Training BNNs is a discrete optimization problem.
We can train BNNs using MIP and CP, but:
Use small datasets.Optimize some proxy for generalizability.
Our HAm model either outperformed GD or timed out.
Open questions:
How far can model-based approaches scale?
What other proxies for generalization are worth studying?
Are there meaningful ways to combine GD with MIP and CP?
Thanks!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 42 / 42
![Page 150: Training Binarized Neural Networks using MIP and CPrntoro/docs/cp19_bnns_slides.pdf · 2019. 10. 4. · Deep learning Binarized Neural Networks (BNNs) BNNs are NNs with binary weights](https://reader035.vdocuments.mx/reader035/viewer/2022071002/5fbec77f39c1e42e426eeffd/html5/thumbnails/150.jpg)
Concluding remarks
Summary:
Training BNNs is a discrete optimization problem.
We can train BNNs using MIP and CP, but:
Use small datasets.Optimize some proxy for generalizability.
Our HAm model either outperformed GD or timed out.
Open questions:
How far can model-based approaches scale?
What other proxies for generalization are worth studying?
Are there meaningful ways to combine GD with MIP and CP?
Thanks!
Toro Icarte et al: Training Binarized Neural Networks using MIP and CP 42 / 42