1 motivation video communication over heterogeneous networks –diverse client devices –various...

1

MotivationMotivation

• Video Communication over Heterogeneous Networks– Diverse client devices– Various network connection

bandwidths

• Limitations of Scalable Video Coding Schemes– Limited layers supported– No video format changes

• Video Transcoding Provides Dynamic Solutions– Channel bandwidth adaptation– Video coding format adaptation

2

Challenges in Video TranscodingChallenges in Video Transcoding

• Improve Efficiency of Video Transcoding– Large data volume

– High computational complexity

• Optimize Visual Quality for a Given Bit Rate– Human vision system (HVS) based video transcoding is desirable

D ecoding(Partia lly)

V ideoM anipu la tion

EntropyEncoding

01010111 1011011... ...

Input C om pressedV ideo S tream

O utput C om pressedV ideo S tream

V ideo T ranscoder

3

Proposed SolutionsProposed Solutions

• Exploit Foveation Property of the HVS in Video Transcoding

• Develop Fast Algorithms for Video Transcoding– DCT-domain foveation filtering technique

– Fast algorithms for DCT-domain inverse motion compensation• Local bandwidth constrained DCT-domain inverse motion compensation

• Look-up-table based DCT-domain inverse motion compensation

U niform R eso lu tionC om pressed V ideo

Foveated V ideoS tream

D ecoding(P artia lly)

V ideoM anipu la tion& Foveation

V ideoR e-encod ing

01010111 1011011... ...

Foveation E m bedded V ideo T ranscoder

4

FoveationFoveation

• The Human Eye Samples Visual Field Non-uniformly– The highest sampling resolution is at Fovea

– The sampling resolution decreases rapidly as away from Fovea

• Retinal Images are Inherently Non-uniform in Spatial Resolution

Eccentricity (left eye)

Eccentricity (deg)

Cel

ls p

er d

egre

e

5

Foveation ModellingFoveation Modelling• Foveated Contrast Threshold [Geisler & Perry 98]

• Foveated Cut-off Frequency fc

• Spatial Frequencies Beyond

the Cut-off Frequency is

Invisible (Foveated Image)

) (),(2

20 e

eefaexpCTefCT

• f: Spatial frequency (cyc/degree)• e: Retinal eccentricity(degree)• a: Spatial frequency decay constant

• e2: Half-resolution eccentricity• CT0: Minimum contrast threshold• CT: Contrast threshold

. CT . , e. a 64

1

76

1,321060 02

02

2 1ln

)( CTeea

efc

Loc

al c

ut-o

ff f

requ

ency

(cy

c/de

g)

Pixel position relative to foveation point (unit: pixel)

Image size: 512 x 512Unit of v: image height

6

JPEG-coded Uniform Image (168KB) JPEG-coded Foveated Image (136KB)

Foveated ImagesFoveated Images

Foveation point is marked by ‘X’

7

Foveated Contrast Sensitivity Function (FCSF)Foveated Contrast Sensitivity Function (FCSF)

• Foveated Contrast Sensitivity Function (FCSF)

• Shape the Compression Distortion According to FCSF

2

2

0

exp11

),(e

eefa

CTCTefFCSF

Image size: 512 x 512Viewing distance: 3 times the image height

Nor

mal

ized

con

tras

t sen

siti

vity

of

hum

an e

ye

Distance from foveation point (unit: pixel)

8

Video Transcoding ArchitectureVideo Transcoding Architecture

• Open-Loop Video Transcoding– Simple and fast

– Error drift

I P PP

Transcoding Error Propagation

R in R outD elay VLD

VLD B it A llocationAnalysis

VLCRequantization

VLC: Variable Length Coding

VLD: Variable Length Decoding

9

Drift Free Video TranscodersDrift Free Video Transcoders

• Cascaded Pixel Domain Video Transcoding– Low efficiency

– Long delay

• Fast Pixel Domain Video Transcoding– Save motion estimation, one frame memory and one IDCT operation

• Fast DCT-Domain Video Transcoding– No IDCT-DCT operations; Lower data volume

– DCT-domain inverse motion compensation is complex (Research topic)

Fast Pixel Domain Video Transcoder Fast DCT Domain Video Transcoder

R inR outC om plete

D ecoding (Q 1)Encoding

(Q 2)

IDCT

M C

DCT

R in R outQ 2

IQ 2

IQ 1

FM

-+

+

+

R in R outQ 2

IQ 2

IQ 1

-+

+

+

DCT-Dom ainInverse M otionCom pensation

10

Foveation Embedded DCT Domain Video TranscodingFoveation Embedded DCT Domain Video Transcoding

VLD : Variable Length Decoding

FM : Fram e Mem oryVLC : Variable Length CodingQ : Quantization

IQ : Inverse Quantization

MVR : Motion Vector Refinem entIMC : Inverse Motion Com pensationMVs : Motion Vectors

R 2(n)R 1(n)

+VLD IQ 1

+

Q 2 VLC

IQ 2

FM

D E C O D E R E N C O D E R

FM

MV

s

+

+

+D C T -D om ain

IM C

D C T -D om ainFoveation

D C T -D om ainM VR

D C T -D om ainIM C

+

In tra-coded fram e

MV

s

FoveationPoint

Selection

M Vs

11

Foveation FilteringFoveation Filtering

• Pixel Domain Foveation Filtering Technique [Lee, 99]

– High computational complexity

12

DCT-Domain Foveation FilteringDCT-Domain Foveation Filtering

• DCT-Domain Block Mirror Filtering [Rao, 90]

• Pros– Significantly simplified

– Combine with inverse quantization

– Easy to parallelize

• Cons– Blocking artifacts

fhfhDCT ~~ˆ

Filter Kernel

DCT of f

h~h

f̂

f

f~

1-D signa l

0

Blo

ckm

irrorin

g

0

H. R. Sheikh, S. Liu, B. L. Evans and A. C. Bovik,“Real-Time Foveation Techniques for H.263 Video Encoding in Software”, ICASSP 2001.

13

Multipoint Video ConferencingMultipoint Video Conferencing

Internet

M ultipo int C ontro l U nit(M C U )

BU F VLD

BU F VLD

BU F VLD

. .

.

. .

.

V ideocom biner

FoveationE m bedded D C T

dom ain videotranscoder

BU F

BU F

BU F

. .

.

to conferee #1

to conferee #2

to conferee #N

from conferee #1

from conferee #2

from conferee #N

H. R. Sheikh, S. Liu, Z. Wang and A. C. Bovik,“Foveated Multipoint Videoconferencing at Low Bit Rates”, ICASSP 2002, accepted.

14

Simulation ResultsSimulation Results

Uniform resolution video at 256 kb/s Foveated video at 256 kb/s

Foveation point is at the center of the upper-left quadrant

15

Foveation Point SelectionFoveation Point Selection

• Interactive Methods– Mouse, eye tracker

– Reverse channel is assumed

– End to end delay is assumed short enough

• Automatic Methods– Fixation points analysis (Very challenging)

– Application oriented methods

• DCT-Domain Human Face Detection [Wang & Chang, 97]

– Skin color region segmentation

– Face template constraint

– Spatial Verification

1 motivation video communication over heterogeneous networks –diverse client devices –various...

Documents

video encoding

pixel slide

x slide

motivation video communication

image height slide

foveation property

degree slide

desirable slide