weird operators with big benefits heterogenous bitwidth ...josh fromm. network binarization...
TRANSCRIPT
![Page 1: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/1.jpg)
Heterogenous Bitwidth Binarization: Weird Operators with Big Benefits
Josh Fromm
![Page 2: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/2.jpg)
Network Binarization
● Multiply-accumulate becomes xnor-popcount.
● 5-30x theoretical speedup.● 32x weight memory compression.
![Page 3: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/3.jpg)
Network Binarization
● Multiply-accumulate becomes xnor-popcount.
● 5-30x theoretical speedup.● 32x weight memory compression.
● 1-bit accuracy is too low but fast.● 2-bit accuracy is high but too slow.● How to bridge the gap?
![Page 4: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/4.jpg)
Mixed Bitwidth Tensors
![Page 5: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/5.jpg)
Mixed Bitwidth Tensors
![Page 6: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/6.jpg)
Middle-Out Bit Distribution
![Page 7: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/7.jpg)
Middle-Out Bit Distribution
![Page 8: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/8.jpg)
Middle-Out Bit Distribution
![Page 9: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/9.jpg)
Super-Linear Scaling
![Page 10: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/10.jpg)
Super-Linear Scaling
![Page 11: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/11.jpg)
Super-Linear Scaling
![Page 12: Weird Operators with Big Benefits Heterogenous Bitwidth ...Josh Fromm. Network Binarization Multiply-accumulate becomes xnor-popcount. 5-30x theoretical speedup. 32x weight memory](https://reader030.vdocuments.mx/reader030/viewer/2022040814/5e5adcbd854bbd63474b33b9/html5/thumbnails/12.jpg)
Hard to Implement!
Implementing on CPU
● Needs efficient sparse tensor library support
Implementing on FPGA
● Gates can be directly laid out for big benefits
● Designing FPGAs is hard, especially for non-uniform computation
TVM can enable these platforms!