fpga-based real-time super-resolution system for ultra ... · compared six configurations no....
TRANSCRIPT
![Page 1: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/1.jpg)
FPGA-based Real-Time Super-Resolution System
for Ultra High Definition Videos
Zhuolun He, Hanxian Huang, Ming Jiang, Yuanchao Bai, and Guojie Luo
Peking University
FCCM 2018
![Page 2: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/2.jpg)
Ultra High Definition (UHD) Technology
UHD Television UHD Projector
UHD Phone UHD Camera
Content? • Limited Creators
• High network
bandwidth cost
• Huge storage cost
![Page 3: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/3.jpg)
High-Resolution <---> Low-Resolution
Desired HR Image 𝑿
Blur
Down-Sampling
Observed LR Image 𝒀
Noise 𝑛
Su
pe
r-R
eso
luti
on
![Page 4: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/4.jpg)
Spectrum of Super Resolution Methods
Interpolation
• Fast
• Easy to implement
• Blurry results
Model-based
• Interpretable
• High complexity
• Assumed known
blur kernel/noise
Example-based
• State-of-the-art quality
• High complexity
• Training data needed
Complicated Simple
![Page 5: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/5.jpg)
Model-based Method is also Compute-Intensive
Desired HR Image 𝑿
Blur
Down-Sampling
Observed LR Image 𝒀
Noise 𝑛
Su
pe
r-R
eso
luti
on
…
Iteration 1 Iteration 2 X
Model-based methods may not be needed
• The computation also has a layered structure
• We can use a neural network to approximate
![Page 6: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/6.jpg)
Total Variation Distribution
Fact: Blocks contain DIFFERENT
amount of information
(NOT equally important)
Insight: Use DIFFERENT upscaling
methods for different blocks
![Page 7: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/7.jpg)
A Hybrid Algorithm
INPUT: LR Image 𝒀
1. Crop 𝒀 into sub-images {𝒚}
2.1. 𝒙 <- Upscale(𝒚) IF 𝑴 𝒙 > 𝑻
2.2. ELSE 𝒙 <- CheapUpscale(𝒚)
3. Mosaic 𝑿 with {𝒙} OUTPUT: HR Image 𝑿
M: Total Variation (TV)
Upscale: FSRCNN-s
CheapUpscale: Intepolation
![Page 8: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/8.jpg)
Overall System
Low-Res
Image
High-Res
Image
Pipelined Neural Network
Conv(32, 1, 5) Conv(5, 3, 5) Conv(5, 1, 32)
Interpolator
Accelerator
Deconv(32, 9, 1) Conv(1, 5, 32)
Feature Extration Shrinking Mapping Expanding Deconvolution
Dispatcher
![Page 9: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/9.jpg)
Stencil Access of TV Computation
x[offset]
f3
x[right]
f2 ……
x[down]
f1
0
height
width 𝑁
𝑁
(𝛻𝑥)offset = 𝑎𝑏𝑠(𝑥 right − 𝑥 offset ) + 𝑎𝑏𝑠(𝑥 down − 𝑥 offset )
x[offset]
f3
x[right]
f2 ……
x[down]
f1
![Page 10: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/10.jpg)
Micro-architecture for Stencil Computation
s1 buffer1(𝑁-1) s2 s3
f1 f2 f3
buffer2(1)
x[i][j]…x[i-1][j+2] x[i-1][j+1]
x[i][j]
(x[down])
x[i-1][j+1]
(x[right])
x[i-1][j]
(x[offset])
Buffering System for array x
Computation Kernel (𝛻𝑥)𝑖,𝑗
![Page 11: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/11.jpg)
Convolutional Neural Network
Pipelined Neural Network
Conv(32, 1, 5) Conv(5, 3, 5) Conv(5, 1, 32) Deconv(32, 9, 1) Conv(1, 5, 32)
Feature Extraction Shrinking Mapping Expanding Deconvolution
![Page 12: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/12.jpg)
Convolution 𝑁i 𝑁i+1
ni ci fi
Input Compute Output
sliding window(s)
1
Conv(ci, fi, ni)
![Page 13: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/13.jpg)
Deconvolution
Input Compute Output
sliding window(s)
𝑁i 𝑁i+1
s
Deconv(ci, fi, ni) ci fi ni
![Page 14: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/14.jpg)
Pipeline Balancing
Layer 𝒄𝒊 𝒇𝒊 𝒏𝒊 𝑵𝒊 #Mult. Ideal
#DSP Ideal II
Alloc.
#DSP Alloc. II
Extraction 1 5 32 36 819200 201 4076 200 4096
Shrinking 32 1 5 32 163840 40 4096 32 4096
Mapping 5 3 5 32 202500 50 4050 45 4500
Expanding 5 1 32 30 144000 35 4115 32 4500
Deconvolution 32 9 1 30 2332800 573 4072 519 4500
Overall - - - - 3662340 899 4115 828 4500
Available (ZC706) - - - - - 900 - 900 -
![Page 15: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/15.jpg)
Sub-image Size
• Padding
• 𝑁𝑖 ≡ 𝑘 + 𝑓𝑖 − 1#𝐶𝑜𝑛𝑣𝑖
• If sub-image size too small
• Large border-to-block ratio
• Limited by memory bandwidth
• If sub-image size too large
• Large feature maps
• Limited by on-chip BRAM capacity
![Page 16: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/16.jpg)
Sub-image Size vs. Performance vs. #mult.
0,915
0,920
0,925
0,930
0,935
0,940
36,0
36,5
37,0
37,5
38,0
38,5
39,0
10 20 30 40 50
SS
IM
PS
NR
(d
B)
Block Size
PSNR SSIM
8,00E+09
8,50E+09
9,00E+09
9,50E+09
10 20 30 40 50
Block Size
Multiplications
![Page 17: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/17.jpg)
Overall Comparisons
• Compared six configurations
No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM
1 None Interpolation 6.6*10^7 35.51 0.9138
2 None Neural Network 8.2*10^9 38.55 0.9421
3 Blocking Interpolation 6.6*10^7 35.51 0.9138
4 Blocking Neural Network 8.4*10^9 38.55 0.9420
5 Blocking Mixed-Random 2.2*10^9 36.10 0.9211
6 Blocking Mixed-TV 2.2*10^9 37.36 0.9287
+3.04dB
No Performance Loss
+1.26dB
-1.19dB -75%
>100x
![Page 18: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/18.jpg)
Example Outputs
Configuration 1
None/Interpolation
Configuration 2
None/Neural Network
Configuration 3
Blocking/Interpolation
Configuration 4
Blocking/Neural Network
Configuration 5
Blocking/Mixed-Random
Configuration 6
Blocking/Mixed-TV
![Page 19: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/19.jpg)
Summary Flow
• Crop each frame into blocks • Suitable for low-level (pixel-level) tasks
• GOOD: on-chip buffer friendly
• BAD: Computation overheads
• Dispatch blocks according to TV value • Micro-architecture for buffering system
• Fully-pipelined CNN for upscaling • Sliding window for convolution/deconvolultion
• Pipeline balancing
• Performance • Full-HD (1920x1080) -> Ultra-HD (3940x2160): 31.7fps
![Page 20: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/20.jpg)
Thank you!
![Page 21: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/21.jpg)
TV Threshold vs. Performance vs. #mult.
0,910
0,915
0,920
0,925
0,930
0,935
0,940
0,945
35,0
35,5
36,0
36,5
37,0
37,5
38,0
38,5
30 40 50 60 70
SS
IM
PS
NR
(d
B)
TV Threshold
PSNR SSIM
0,0E+00
5,0E+09
1,0E+10
1,5E+10
2,0E+10
2,5E+10
30 40 50 60 70
TV Threshold
Multiplications
![Page 22: FPGA-based Real-Time Super-Resolution System for Ultra ... · Compared six configurations No. Preprocessing Upscaling #Mult. PSNR(dB) SSIM 1 None Interpolation 6.6*10^7 35.51 0.9138](https://reader033.vdocuments.mx/reader033/viewer/2022052013/602a77bb6f47822fcb02c1cf/html5/thumbnails/22.jpg)
Resource Utilizations
Component BRAM DSP FF LUT
Dispatcher 1 2 618 1138
Neural Network 178 844 63149 98439
Interpolator 0 10 1414 3076
Total 327 858 66261 103714
Available 1090 900 437200 218600
Utilization (%) 30 95 15 47