drastic: dynamically reconfigurable architecture systems ... · filter peripheral fir filter ip...
TRANSCRIPT
DRASTIC: Dynamically Reconfigurable Architecture Systems for Time-varying
Image Constraints
Marios S. Pattichis image and video Processing and Communications Laboratory (ivPCL)
Department of Electrical and Computer Engineering University of New Mexico Albuquerque, New Mexico
ivpcl.org Based on collaborative research with Dr. Y. Jiang, Dr. D. Llamocca, Dr. A. Panayides, and
Mr. C. Carranza.
Acknowledgment This material is based upon work supported by the National Science Foundation under NSF AWD
CNS1422031.
Talk Outline
• Motivation
• Related work
• Video communication examples
• Video analysis examples
• Discrete Periodic Radon Transform (DPRT)
• Conclusion
Motivation: Video Compression
HDTV video bandwidth requirements: • Has interlaced and progressive modes. • 720p: progressive, 1280x720 pixels, 60 frames per second.
Raw BW (24 bits/pixel): 1.3Gbps • 1080i: interlaced encoding, 1920x1080 pixels, 25 frames per second.
Raw BW (24 bits/pixel): 1.2Gbps • 1080p: progressive, 1920x1080 pixels, 59.94 frames per second.
Raw BW (24 bits/pixel): 2.98Gbps Ultra High Definition (hypothetical framerates): • 4K UHD: 3840x2160 (2160p 16:9): 24 bits/pixel@30 fps: 5.56Gbps,
4096x2048 (4K x 2K), 4096x2160 (1.9:1), 4096x2304 (16:9), 4096x3072: 24 bits/pixel @120 fps: 33.75Gbps.
• 8K UHD: 7680x4320 (4320p): 24 bits/pixel@30 fps: 22.24Gbps 8192x4096, 8192x4320: 24 bits/pixel@120 fps: 94.92Gbps
Mobile Comm. Networks Data Transfer Rates
Type Theoretical Transfer Rates Typical Transfer Rates
2G- GSM (early 1990s) 9.6 – 115 kbps About 10 kbps
2.5G-GPRS (2001) 9.6 - 171.2 kbps Between 30-50 kbps
2.5G- EDGE (2003) 9.6 -384 kbps Between 75-135 kbps
3G- UMTS (2001) 144 kbps - 2 Mbps Between 220-384 kbps
3.5G-HSPA (Rel. 7) (HSDPA , Rel. 5, 2005) (HSUPA, Rel. 6, 2008)
DL: 14Mbps UL: 5.8 Mbps
DL : 1-4 Mbps UL : 500Kbps -2Mbps
3.5G- Mobile WiMAX (IEEE 802.16e, 2005)
DL: 46 Mbps UL: 5.6 Mbps As for 3.5-HSPA
4G-LTE-Advanced (Rel. 10, Oct. 2010)
DL: 1Gbps UL: 100 Mbps N/A: See below
4G- WirelessMAN-Advanced
(IEEE 802.16m, Oct. 2010)
DL: 1Gbps UL: 100 Mbps N/A: See below
Refer to slides 159-188 from “4G …” from http://www.4gamericas.org/
Mobile Communication Networks Evolution
Video Compression Rates
3G UMTS (2001) max typical = 384 kbps • HDTV 720p: 1.3Gbps / 384 kbps = 3,549 • UHD 8192x4320 (hyp): 94.92Gbps / 384 kbps = 251,058
4G-LTE-Advanced @ max theoretical upload = 100 Mbps
• HDTV 720p: 1.3Gbps / 100 Mbps = 13.3
• UHD 8192x4320 (hyp): 94.92Gbps / 100 Mbps = 972
CR for HEVC Studies using ultrasound videos: 720x576(4CIF) (8 bits/pixel-yuv420@25fps) = 81.1 Mbps HEVC encoding using QP 36 and x265 ultra-fast profile:
PSNR: 32dB, 364 kbps, compression ratio = 223
Still Image Criteria for Medical Video
Plaque Motion Stenosis Plaque Morphology 5 plaque(s) motion(s) in transmitted video
identifiable as in original degree of stenosis in transmitted video
determined as in original plaque morphology in
transmitted video is the same as in original
4 plaque(s) motion(s) in transmitted video has artefacts that do not compromise diagnosis
enough clinical data to determine degree of
stenosis Some artefacts are seen that do not compromise
morphology visualization 3 plaque(s) motion(s) artefacts that can
compromise diagnosis clinical data only allow
approximation of degree of stenosis
Artefacts may compromise morphology
visualization 2 plaque(s) motion(s) artefacts that
significantly limit diagnosis very limited ability to
estimate degree of stenosis Significantly limit
diagnosis 1 Not visible
not determinable Not visible.
Assessed by humans for atherosclerotic plaque ultrasound.
Motivation: Multi-objective Opt
Provide Constraints for better control on Image and Video Compression
Power
Bitrate
Quality
Motivation: Multi-obj (more)
• Adaptive accuracy based on changes in the video
• Real-time video communications performance
• Fast image processing with limited computational resources (e.g., scalable DPRT)
• Large datasets with limited resources
Video Processing System
Multi-objectivehistory
hardwarerealizations
Objectives1 Module 1
Module 2
Module 3
Module 4
Module n
......
Objectives2
Objectives3
Objectives4
Objectivesn
Estimatedobjectives
n sets of:bitstreams+frequency
Pareto-optimal HW realizationsand estimated objectives
MEM
OR
Y
EXTERNALCONSTRAINTS
STATIC HWcomponents
DYNAMIC HWcomponents
DYNAMIC DIGITAL SYSTEM
VIDEOINPUT
VIDEOOUTPUT
OutputsMeasurements
MODELBUILDING
Realization Selector
Parameterhistory
Parameters
InputsMeasurements
Pareto FrontPrediction
Model
CONSTRAINTS
CONTROL BLOCK
Modes for Video Comm.
Mode Constraint Optimization Formulation
Max Im. Qual.: max Q subj. to: (BPS Bmax
)& (DP DPmax
)
Min Bitrate: min BPS subj. to: (Q � Qmin
)& (DP DPmax
)
Min Dyn. Power: min DP subj. to: (BPS Bmax
)& (Q � Qmin
)
Typical Mode: max ↵ · Q� � · BPS� � · DPsubj. to: (Q � Q
min
)& (DP DPmax
)& (BPS BPSmax
)
We have the following objectives and bounds:
DP Dynamic Power, max avail.=DPmax
Q Image Quality, min. acceptable=Qmin
BPS Bits Per Sample, max avail.=BPSmax
DRASTIC DCT: System
We have software/hardware control implementing DRASTIC modes using: • SW: Adjustable Quantization Table (QF only) • HW: Variable Zonal Coding • HW: Adjustable Bitwidth of the DCT coefficients
DRASTIC DCT: DCT HW
Variable Zonal Coding:
• Fine control providing 8 configurations
• Output will be extended to 16 bits
• Full implementation shown here: Removal of red-highlighted regions for implementing zonal=7 hardware mode.
p0
p1
p2
p3
8ext
ext
ext
ext
signed[-128,127]
p4
p5
p6
p7
ext
ext
ext
ext
8
8
8
8
+
_
+
++
_
__
189
9x4
18
18
18
18
18
18
18
trim
trim
trim
trim
trim
trim
trim
trim
11ext
ext
ext
ext
ext
ext
ext
ext
+
_
+
++
_
__
2112
21
21
21
21
21
21
21
trim
trim
trim
trim
trim
trim
trim
trim
14
14
14
14
1414
14
14
11
11
11
11
11
11
11
ping-pongtranspose memory
8
8
8
8
8
8
8
8
8
8
8
11
11
11
11
11
11
11
11
9
912x4
12
12
1D Filter (Y1)
DRASTIC DCT: Pareto Front
• QF: 5:5:100 • Zonal: 1:1:8 • Bitwidth: 2:1:9 Configurations = 1280 Pareto Optimal = 841
Pareto-optimal Configuration shown in red. (based on median results over LIVE image database)
DRASTIC DCT: Image Database
DRASTIC DCT: Mode Transition
Ongoing Work: HEVC
The video coding standard H.264/AVC is the project HEVC started from. H.264/AVC is initially developed during 1999-2003, and was further extended in scalable video coding (SVC) and multi-view video coding (MVC) during 2003-2009.
Formal joint Call for Proposals (CfP) on HEVC started by VCEG mand MPEG was issued in January 2010.
Motivation:
8K-UHD (8192 × 4320)
4K (4096 x 3072)
4. Ongoing Work: Intra-Prediction
HEVC Intra-Pred (Config)
HEVC Intra-Pred (Space)
Adaptive Real-time HEVC
This is an emergency video example.
• Adaptation for the maximum video quality mode subject to real-time encoding that translates to frame-rate ≥ 25 fps and 3G available bandwidths.
• (1) BW ≤ 250 kbps to (2) BW ≤ 384 kbps (max 3G upload speed). • Lower bandwidth: PSNR of 29.7 dB (33.88 fps @ 226 kbps) • Second bandwidth: PSNR: 31.9 dB (29.29 fps@ 363.53 kbps). • Encoding delay was 0.426 seconds with 6 core [email protected] GHz (WPP streams / pool / frames: 18 / 6 / 2 for x.265 encoding).
4. Video Image Analysis
PPC/µBlaze
PLB
System ACE memoryEthernet
MAC
Filter peripheral
FIR Filter IP
core
PRR
ICA
Ppo
rt
EPA
cons
traint
s
'n' filters:
DPRDMAcore
M S
ICAPcore
frequency& PR ctrl iFIFO
inte
rface
CFcard
oFIFO
clkfxPR_done
Fram
e 1
ROW 1
COL 3
PRR
ROW 2
COL 1
ROW 3
COL 2
ROW 1
COL 1
t1
t2
t3
t4
t5
t6
t7
t8...
FPGA
Fram
e 1
ROW 1
COL 1
PRR
ROW 1
COL 1
ROW 1
COL 1
ROW 1
COL 1
t1
t2
t3
t4
t5
t6
t7
t8
...
FPGA
Fram
e 2
Fram
e 3
Fram
e 4
Fram
e 2
Row 1Col 1
Row 2Col 2
Row nCol n
...
Filt
er 1
Filt
er 2
Filt
er n
Row 1Col 1
Row 2Col 2
Row nCol n
...
Row 1Col 1
Row 2Col 2
Row nCol n
...
Row 1Col 1
Row 2Col 2
Row nCol n
...
Filt
er 1
Filt
er 2
Filte
r n
Filterbank 1 Filterbank 2 Filterbank m
'm' Filterbanks
'2n*m' bitstreams in memory'2n' bitstreams in memory(a) (c)(b)
...
Video Analysis: Filterbanks
Rowfilter
Colfilter
Row i
Col iDPR
DPR
DYNAMIC MANAGERGenerates Pareto Optimal point POi and
loads it into the 2D FIR filter
Row i Col iPO i, 1 ≤ i ≤ n
B: User constraints
Consider new constraints based on image type, user input, or output
EPA constraintgenerated?
Look for PO point that satifies theEPA constraints
PO point exists?
Load <row i,col i> bitstreams into the 2D FIR filter via DPR, and/or
modify the frequency i
no
no
yes
yes
Energy
Performance
Accuracy
Pointer tobitstream row i
Pointer tobitstream col i
(c)
Pareto Optimal PointPOi in memory:
2D FIR Filter
A: in
put i
mag
e
C: o
utpu
t fra
mes
ParametersN,NH,OB,NXr,NXc
Frequency
DPRand/or
frequencycontrol
freq.
(b)(a)
Autom. Analysis: Adaptive Acc.
0 50 100 150 200 250 3000
0.01
0.02
0.03
0.04
0.05
0.06
Diff
eren
ce o
f sum
of a
bsol
ute
valu
es
Frame #0 50 100 150 200 250 300
40
50
60
70
PSN
R(d
B)
0 50 100 150 200 250 30020
30
40
50
60
70
80
90
Frame #
PSN
R (d
B)
2030405060708090
0.2
0.25
0.3
0.35
0.4
psnr(dB)
Ener
gy p
er fr
ame(
mJ)
�
�N=20, NH=10, OB=8
N=32, NH=16, OB=16
�N=24, NH=16, OB=16
N=8, NH=12, OB=8
�N=24, NH=10OB=16
�
�Energy per frameAccuracy (psnr)
0.3 mJ45dB
0.3mJmax
min--
min65dB
� � � �--
max
(a) (b)
avg=49.65
std=0.0977
avg=61.14
std=0.1425
avg=23.56std=0.18
avg=66.18std=0.5568
avg=88.45std=0.0475
�
�
�
�
�
frame # 30
frame # 90
frame # 150
frame # 215
frame # 270
�
��
�
�
�
frame # 31
frame # 191
frame # 260
frame # 245
Threshold
(c) (d)
�
fram
e #
185
fram
e #
191
30 framesbelow T=0.01
10 framesabove T=0.01
15 framesbelow T=0.01
MEDIUMACCURACY
62.33 dB0.281 mJ
LOWACCURACY
49.58 dB0.214 mJ
HIGH ACCURACY
65.82 dB0.346 mJ
START 15 framesabove T=0.01
�
If the conditions do not hold,
stay in the current state
DPRT: Prime Directions
i
j
(1,0) (1,1)
(1,2)
(1,3)(1,4)(1,5)(1,6)
(0,1)
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
j
i
(1,2)
DPRT: New Architecture
f0,0
j
i f1,0
f2,0
f3,0
f4,0
f5,0
f6,0
f0,1
f1,1
f2,1
f3,1
f4,1
f5,1
f6,1
f0,2
f1,2
f2,2
f3,2
f4,2
f5,2
f6,2
f0,3
f1,3
f2,3
f3,3
f4,3
f5,3
f6,3
f0,4
f1,4
f2,4
f3,4
f4,4
f5,4
f6,4
f0,5
f1,5
f2,5
f3,5
f4,5
f5,5
f6,5
f0,6
f1,6
f2,6
f3,6
f4,6
f5,6
f6,6
åR0(0)R0(1)R0(2)R0(3)R0(4)R0(5)R0(6)
7-operand adder tree_0
7-operand adder tree_1
7-operand adder tree_2
7-operand adder tree_3
7-operand adder tree_4
7-operand adder tree_5
7-operand adder tree_6
R0(0)R0(1)R0(2)R0(3)R0(4)R0(5)R0(6)
f0,0
f1,0
f2,0
f3,0
f4,0
f5,0
f6,0
f0,1
f1,1
f2,1
f3,1
f4,1
f5,1
f6,1
f0,2
f1,2
f2,2
f3,2
f4,2
f5,2
f6,2
f0,3
f1,3
f2,3
f3,3
f4,3
f5,3
f6,3
f0,4
f1,4
f2,4
f3,4
f4,4
f5,4
f6,4
f0,5
f1,5
f2,5
f3,5
f4,5
f5,5
f6,5
f0,6
f1,6
f2,6
f3,6
f4,6
f5,6
f6,6 CLS(6)
CLS(5)
CLS(4)
CLS(3)
CLS(2)
CLS(1)
f0,0
f1,1
f2,2
f3,3
f4,4
f5,5
f6,6
f0,1
f1,2
f2,3
f3,4
f4,5
f5,6
f6,0
f0,2
f1,3
f2,4
f3,5
f4,6
f5,0
f6,1
f0,3
f1,4
f2,5
f3,6
f4,0
f5,1
f6,2
f0,4
f1,5
f2,6
f3,0
f4,1
f5,2
f6,3
f0,5
f1,6
f2,0
f3,1
f4,2
f5,3
f6,4
f0,6
f1,0
f2,1
f3,2
f4,3
f5,4
f6,5
CLS
R1(0)R1(1)R1(2)R1(3)R1(4)R1(5)R1(6)
Dá0-iñ7
D0
Dá1-iñ7
D1
Dá2-iñ7
D2
Dá3-iñ7
D3
Dá4-iñ7
D4
Dá5-iñ7
D5
Dá6-iñ7
D6
å
j
i
f(0,q)
f(1,á q-pñ7)
f(2,á q-2pñ7)
f(3,á q-3pñ7)
f(4,á q-4pñ7)
f(5,á q-5pñ7)
f(6,á q-6pñ7)
+
+
+
+
+
+R(p,q)
(a) (b) (c)
(d) (e) (f)
(g)
Recent Work: Scalable FDPRT
N
H
...
Block 0
Block K-1
...
Sequential processing
Hardware
Image
N
N
H
It allows effective implementations based on different constraints on the hardware resources and image sizes (powers of 2 + prime).
Spilt by 2: 2N cycles to load + N for transpose.
Recent Work: Fast 2D Circular Convolution
Cyclic convolution using: Computational Complexity
Definition O(N4) Discrete Fourier Transform O(N2*log2N + N2) FDPRT O(N+ceil(log2N) + N2) Scalable FDPRT O(ceil(N/2h)*N+2N+h +
N2)
DPRT: Pareto-Optimal HW
(4056, 15939504)
(516096, 63253)
(1135022, 511) 500
5000
50000
500000
5000000
4000 40000 400000
Clo
ck c
ycle
s
Resource usage (1-bit D-flip flops)
SerialSystolicSFDPRT: H=2, …, 251 FDPRT
Pareto front (optimal)
Image size: 251 x 251
H=2
H=251
(6275, 32638)
(1135022, 1266)
H=113 (517813, 1897)
H=ª251/2º (567762, 1396)
H=84 (385285, 1607)
(pareto optimal)
(pareto optimal)
Conclusion
• Example Applications Demonstrate Promise
• Approach can handle joint software-hardware optimization as well as software-only optimization
• Current research focused on automatic constraint generation and real-time Pareto-front estimation without the need to pre-compute over a training set