1 motivation video communication over heterogeneous networks –diverse client devices –various...
TRANSCRIPT
1
MotivationMotivation
• Video Communication over Heterogeneous Networks– Diverse client devices– Various network connection
bandwidths
• Limitations of Scalable Video Coding Schemes– Limited layers supported– No video format changes
• Video Transcoding Provides Dynamic Solutions– Channel bandwidth adaptation– Video coding format adaptation
2
Challenges in Video TranscodingChallenges in Video Transcoding
• Improve Efficiency of Video Transcoding– Large data volume
– High computational complexity
• Optimize Visual Quality for a Given Bit Rate– Human vision system (HVS) based video transcoding is desirable
D ecoding(Partia lly)
V ideoM anipu la tion
EntropyEncoding
01010111 1011011... ...
Input C om pressedV ideo S tream
O utput C om pressedV ideo S tream
V ideo T ranscoder
3
Proposed SolutionsProposed Solutions
• Exploit Foveation Property of the HVS in Video Transcoding
• Develop Fast Algorithms for Video Transcoding– DCT-domain foveation filtering technique
– Fast algorithms for DCT-domain inverse motion compensation• Local bandwidth constrained DCT-domain inverse motion compensation
• Look-up-table based DCT-domain inverse motion compensation
U niform R eso lu tionC om pressed V ideo
Foveated V ideoS tream
D ecoding(P artia lly)
V ideoM anipu la tion& Foveation
V ideoR e-encod ing
01010111 1011011... ...
Foveation E m bedded V ideo T ranscoder
4
FoveationFoveation
• The Human Eye Samples Visual Field Non-uniformly– The highest sampling resolution is at Fovea
– The sampling resolution decreases rapidly as away from Fovea
• Retinal Images are Inherently Non-uniform in Spatial Resolution
Eccentricity (left eye)
Eccentricity (deg)
Cel
ls p
er d
egre
e
5
Foveation ModellingFoveation Modelling• Foveated Contrast Threshold [Geisler & Perry 98]
• Foveated Cut-off Frequency fc
• Spatial Frequencies Beyond
the Cut-off Frequency is
Invisible (Foveated Image)
) (),(2
20 e
eefaexpCTefCT
• f: Spatial frequency (cyc/degree)• e: Retinal eccentricity(degree)• a: Spatial frequency decay constant
• e2: Half-resolution eccentricity• CT0: Minimum contrast threshold• CT: Contrast threshold
. CT . , e. a 64
1
76
1,321060 02
02
2 1ln
)( CTeea
efc
Loc
al c
ut-o
ff f
requ
ency
(cy
c/de
g)
Pixel position relative to foveation point (unit: pixel)
Image size: 512 x 512Unit of v: image height
6
JPEG-coded Uniform Image (168KB) JPEG-coded Foveated Image (136KB)
Foveated ImagesFoveated Images
Foveation point is marked by ‘X’
7
Foveated Contrast Sensitivity Function (FCSF)Foveated Contrast Sensitivity Function (FCSF)
• Foveated Contrast Sensitivity Function (FCSF)
• Shape the Compression Distortion According to FCSF
2
2
0
exp11
),(e
eefa
CTCTefFCSF
Image size: 512 x 512Viewing distance: 3 times the image height
Nor
mal
ized
con
tras
t sen
siti
vity
of
hum
an e
ye
Distance from foveation point (unit: pixel)
8
Video Transcoding ArchitectureVideo Transcoding Architecture
• Open-Loop Video Transcoding– Simple and fast
– Error drift
I P PP
Transcoding Error Propagation
R in R outD elay VLD
VLD B it A llocationAnalysis
VLCRequantization
VLC: Variable Length Coding
VLD: Variable Length Decoding
9
Drift Free Video TranscodersDrift Free Video Transcoders
• Cascaded Pixel Domain Video Transcoding– Low efficiency
– Long delay
• Fast Pixel Domain Video Transcoding– Save motion estimation, one frame memory and one IDCT operation
• Fast DCT-Domain Video Transcoding– No IDCT-DCT operations; Lower data volume
– DCT-domain inverse motion compensation is complex (Research topic)
Fast Pixel Domain Video Transcoder Fast DCT Domain Video Transcoder
R inR outC om plete
D ecoding (Q 1)Encoding
(Q 2)
IDCT
M C
DCT
R in R outQ 2
IQ 2
IQ 1
FM
-+
+
+
R in R outQ 2
IQ 2
IQ 1
-+
+
+
DCT-Dom ainInverse M otionCom pensation
10
Foveation Embedded DCT Domain Video TranscodingFoveation Embedded DCT Domain Video Transcoding
VLD : Variable Length Decoding
FM : Fram e Mem oryVLC : Variable Length CodingQ : Quantization
IQ : Inverse Quantization
MVR : Motion Vector Refinem entIMC : Inverse Motion Com pensationMVs : Motion Vectors
R 2(n)R 1(n)
+VLD IQ 1
+
Q 2 VLC
IQ 2
FM
D E C O D E R E N C O D E R
FM
MV
s
+
+
+D C T -D om ain
IM C
D C T -D om ainFoveation
D C T -D om ainM VR
D C T -D om ainIM C
+
In tra-coded fram e
MV
s
FoveationPoint
Selection
M Vs
11
Foveation FilteringFoveation Filtering
• Pixel Domain Foveation Filtering Technique [Lee, 99]
– High computational complexity
12
DCT-Domain Foveation FilteringDCT-Domain Foveation Filtering
• DCT-Domain Block Mirror Filtering [Rao, 90]
• Pros– Significantly simplified
– Combine with inverse quantization
– Easy to parallelize
• Cons– Blocking artifacts
fhfhDCT ~~ˆ
Filter Kernel
DCT of f
h~h
f̂
f
f~
1-D signa l
0
Blo
ckm
irrorin
g
0
H. R. Sheikh, S. Liu, B. L. Evans and A. C. Bovik,“Real-Time Foveation Techniques for H.263 Video Encoding in Software”, ICASSP 2001.
13
Multipoint Video ConferencingMultipoint Video Conferencing
Internet
M ultipo int C ontro l U nit(M C U )
BU F VLD
BU F VLD
BU F VLD
. .
.
. .
.
V ideocom biner
FoveationE m bedded D C T
dom ain videotranscoder
BU F
BU F
BU F
. .
.
to conferee #1
to conferee #2
to conferee #N
from conferee #1
from conferee #2
from conferee #N
H. R. Sheikh, S. Liu, Z. Wang and A. C. Bovik,“Foveated Multipoint Videoconferencing at Low Bit Rates”, ICASSP 2002, accepted.
14
Simulation ResultsSimulation Results
Uniform resolution video at 256 kb/s Foveated video at 256 kb/s
Foveation point is at the center of the upper-left quadrant
15
Foveation Point SelectionFoveation Point Selection
• Interactive Methods– Mouse, eye tracker
– Reverse channel is assumed
– End to end delay is assumed short enough
• Automatic Methods– Fixation points analysis (Very challenging)
– Application oriented methods
• DCT-Domain Human Face Detection [Wang & Chang, 97]
– Skin color region segmentation
– Face template constraint
– Spatial Verification