ele 488 f06 ele 488 fall 2006 image processing and transmission (11-28 -06) digital video motion...

39
ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video •Motion Pictures •Broadcast Television •Digital Video 11/28

Upload: mercy-banks

Post on 27-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

ELE 488 Fall 2006Image Processing and Transmission

(11-28 -06)

Digital Video

•Motion Pictures

•Broadcast Television

•Digital Video

11/28

Page 2: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Motion Picture Television Digital Video

• Broadcast Television (analog)– why invent new technology? – movie at home, mass market– influence of movie on development

• Key Steps– convert pictures to electric signal– send electric signal – convert electric signal to picture

• Comparison with motion picture• High Definition Television - analog digital, compression

• Video telephone - analog predecessor• Video conference - travel cost, people cost• Cable (narrowcast), satellite, interactive, ...

Page 3: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

NTSC (National Television Systems Committee)

• 525 lines

– 2 dots less than 1/2000 of distance from eye are not separated (merge into one)

– Assume view at distance 4 times the screen height. No need to have more than 500 lines

– NTSC set 525 lines (475 active)

• Movies in 1940 has 4:3 aspect ratio (width to height)

• 25 or more pictures per second to see continuous motion

• 50 or more pictures per second to avoid flicker

– movies use 24 frames/sec, each shown twice

• 30 frames/sec with 2:1 interlace (60 even-odd fields/sec)

Page 4: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Bandwidth of Broadcast Television

• Without interlace (progressive scan), 60 frames/sec– 500 lines alternating black and white gives 250 full cycles– each horizontal line has 250 x 4/3 ~ 350 full cycles– 60 (frames/sec) x 500 (line) x 350 = 10 MHz (video ONLY)

• With 2:1 interlace, 5 MHz for video

• FCC assigns 6 MHz per broadcast channel– real usable bandwidth is less, MUCH less– actual resolvable lines per vertical height ~250

• Color insertion - must compatible with B/W receiver– Change R-G-B to Y-Cb-Cr – Y is luminance (brightness), Cb and Cr are chrominances– B/W sets converts Y to picture, color sets converts Y-Cb-Cr to

R-G-B then display

Page 5: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Digital Video

• What drives digital video?– Information technology:

electronics, communication, storage, new functionality, …– HDTV

• R-G-B component video– 640 x 480 (pixel) x 3 (color) x 8 (bits/color) x 30 = 221 Mb/sec

• Y-Cb-Cr with subsampled Cb and Cr– 640 x 480 (pixel) x 1.5 (color) x 8 (bits/color) x 30 = 110 Mb/sec

• Compression - MPEG (motion picture expert group)– MPEG-1: CD-ROM, 1.5Mb/sec, 1.2Mb/sec for video,

352x240 (CIF), progressive scan, motion compensation– MPEG-2: extension of MPEG-1, interlace, HD– MPEG-4: object/region based– H.2xx

Page 6: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Video Coding

• Video consists of frames In(i,j)– Code each frame as a still picture – motion JPEG

• Each frame is close to the previous frame– Code the difference FDn(i,j) = In(i,j) – In-1(i,j)

– Differential coding (DPCM, predictive coding)

( In-1(i,j) is the predicted value of In(i,j) )

– Need to code the first frame

Page 7: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Encoding Three Frame Types

Differential encoding of video I – Intra Frame, code by itselfP – Prediction Frame, code by referring to previous I or P frameB – Bi-direction Frame, code by referring to forward AND backward I or P frames

I

BP PP P

B B B B B B B B B

I

Page 8: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Coding of I-frame – same as still image

Page 9: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

I Frames

• I frames are Intra-coded using the JPEG coded• I frames can be decoded without reference to other

frames of the video.• Sometimes called anchor frames

I frame: JPEG

Frame 31

A group of pictures (GOP) begins with an I-frame and ends before the next I-frame

A typical GOP length is 15 framesWith only 1 I-frame per GOP (the first frame)

Page 10: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Coding P Frames

• Each frame is close to the previous frame– Code frame difference (differential coding – DPCM)

• Occlusion – parts of current frame is blocked in previous frame

– need future frame to “predict” FDn(i,j) = In(i,j) – In+1(i,j)

current frame In frame difference In - In-1

Page 11: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Coding P Frames

• Each frame is close to the previous frame– Code frame difference (differential coding – DPCM)

current frame In frame difference In - In-1

Page 12: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Coding of P Frames

• Video consists of frames In(i,j)– Code each frame as a still picture – motion JPEG

• Each frame is close to the previous frame– Code the difference FDn(i,j) = In(i,j) – In-1(i,j)

– Differential coding (predictive coding)

– In-1(i,j) is the predicted value of In(i,j)

• Observe: – Most part of frame is unchanged– Except for moving objects– Motion Compensated Coding MPEG

Page 13: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Motion Compensated Video Coding

Observe: Most of picture remains unchanged

But some objects have moved.

So code Displaced Frame Difference

Motion Compensated Coding

previous frame current frame

Page 14: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Displaced Frame Difference

Page 15: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Displaced Frame Difference

Page 16: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Page 17: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

P Frames

I frame: JPEG P frame: motion compensated. macro-blocks and macro-blockmotion vectors are indicated

Frame 31 Frame 34

P frames are coded using two methods: - block motion compensation + error coding - jpeg (intra-coded), without referring to previous frames P frames are also anchor frames

Divide P-frame into Macro-blocksMB ~16x16

Page 18: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Finding Motion Vectors

• Matching a block from current frame with a displaced block in reference frame using: (a) sum of squared difference (SSD), or

(b) sum of absolute difference (SAD) (almost always used)• The displacement giving best match is the motion vector of the block • Search methods:

• Global search over the entire anchor frame• Restricted search over local neighborhood• Fast search – over a selected neighborhood,

anchor frame current frame

Page 19: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Illustration: P-frame Macro-Blocks

Frame 34 P-frame

Page 20: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

MPEG: I and P frames (anchor frames)

Page 21: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Block Matching Motion Estimation

current frame

Block for which motion vector to be determined

a position for comparison

previous frame

another position

Mean Absolute Diff erence:

MAD(k,l, x,y) = (1/ MN)

1M

0i

1N

0j

| I n(k+i,l+j ) – I n–1(k+x+i,l+y+j) |

Motion vector f or (k,l)th

block:

v(k,l) = arg min(x,y) MAD(k,l, x, y)

Blocks of size MxN

Page 22: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Motion Compensated Encoding of P Frame

current frame

previous frame

Y

Page 23: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Coding of P frame

reconstructed previous frame

Encoder contains decoder

Page 24: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

More Detail

Page 25: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Need for Bi-directional Encoding

I

BP PP P

B B B B B B B B B

I

Page 26: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Bidirectional Encoding

Page 27: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Frame Transmit Order vs Viewing Order

View order

Decode order= transmit order

Page 28: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

B-frames

B-frames are coded in the same way as P-frames except that for each macro-block, search for the best matching block in both the preceding and succeeding anchor frames. Use the encoding that requires the fewest bits. Called bidirectional encoding.

Page 29: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Block Matching Motion Estimation

current frame

Block for which motion vector to be determined

a position for comparison

previous frame

another position

Mean Absolute Diff erence:

MAD(k,l, x,y) = (1/ MN)

1M

0i

1N

0j

| I n(k+i,l+j ) – I n–1(k+x+i,l+y+j) |

Motion vector f or (k,l)th

block:

v(k,l) = arg min(x,y) MAD(k,l, x, y)

Blocks of size MxN

Page 30: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Complexity of Exhaustive Block-Matching

• Assumptions

– Block size NxN and image size S=M1xM2– Search step size is 1 pixel ~ “integer-pixel accuracy”– Search range +/–R pixels both horizontally and vertically

• Computation complexity

– # Candidate matching blocks = (2R+1)2 – # Operations for computing MAD for one block ~ O(N2)– # Operations for MV estimation per block ~ O((2R+1)2 N2)– # Blocks = S / N2 – Total # operations for entire frame ~ O((2R+1)2 S)

• i.e., overall computation load is independent of block size!

• E.g., M=512, N=16, R=16, 30fps => On the order of 8.55 x 109 operations per second!– Was difficult for real time estimation, but possible with parallel

hardware

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

Page 31: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Exhaustive Search: Cons and Pros

• Pros– Guaranteed optimality within search range and motion model

• Cons– Motion vectors are integer valued– High computation complexity

• On the order of [search-range-size * image-size] for 1-pixel step size

• How to improve accuracy?– Half pixel – significantly improvement– Quarter pixel – some improvement– Requires interpolation

• How to improve speed?– Fast search– Try to exclude unlikely candidates

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

Page 32: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Half pixel resolution in matching

B

a b

cdp

Page 33: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

M24

M15 M14 M13

M16

M11

M12

M5 M4 M3

M17 M18 M19

-6 M6 M1 M2 +6

M7 M8 M9

dx

dy

Fast Algorithm: 3-Step Search

• Search candidates at 9 positions

• Reduce step-size after each iteration– Start with step size

approx. half of max. search range

motion vector {dx, dy} = {1, 6}

Total number of computations: 9 + 82 = 25 (3-step) (2R+1)2 = 169 (full search)

(Fig. from Ken Lam – HK Poly Univ. short course in summer’2001)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

Page 34: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Lowest resolution

Lower resolution

Original resolution

Hierarchical Block Matching• Problem with fast search at full resolution

– Small mis-alignment may give large displacement error esp. for texture and edge blocks

• Hierarchical (multi-resolution) block matching

– Match with coarse resolution to narrow down search range

– Match with high resolution to refine motion estimation

(From Wang’s Preprint Fig.6.19)

UM

CP

EN

EE

40

8G

Slid

es

(cre

ate

d b

y M

.Wu

& R

.Liu

© 2

00

2)

Page 35: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Pixel Decimation

IEEE Trans. on Video Technology, April 1993, pp. 148- 157.

a block in current frame

part of a block in reference frame

Page 36: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Pixel Decimation

Page 37: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Subsampled Motion Field

Page 38: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

Subsampled Motion Field

Page 39: ELE 488 F06 ELE 488 Fall 2006 Image Processing and Transmission (11-28 -06) Digital Video Motion Pictures Broadcast Television Digital Video 11/28

ELE 488 F06

What else can you do with MPEG video?

• The MPEG encoder-decoder is asymmetric. – Encoder is much more complex than the decoder.

Determining motion vectors is a major task– Decoding is easy and fast. – The encoding only has to be done once, the decoding will

be done many times or at many locations.

• Symmetric application?

• Compression loses information. But– compressed video has information not readily available in

original video