automatic caption localization in compressed video by yu zhong, hongjiang zhang, and anil k. jain,...

Automatic Caption Localization in Compressed Video

By Yu Zhong, Hongjiang Zhang, and Anil K. Jain, Fellow, IEEE

IEEE Transactions on Pattern Analysis and Machine Intelligence

Vol 22, No. 4, April 2000

Introduction

Caption text on videoGeneral methods for caption extractionProposed Method How it works Evaluation

Caption Text on Video

Parse, index and abstract of Video

Caption Text Information of Video Describe the content Catch “highlights”

General Extraction Methods

Component-based Geometrical arrangement Homogeneous color

Texture-based Contrast the background Horizontal intensity variation

Most published method Applied on uncompressed images

Digital video and images Compressed (MPEG & JPEG) DCT (Discrete Cosine Transform) coding Reducing interframe redundancy (for MPEG)

Proposed Method

Step 1 & 2 Detecting Blocks

Step 3 Refinement

Step 4 Segmentation of

rows

Step 1

Step 2

Step 3

Step 4

Proposed Method

Source frame

Step 1 & 2Detecting Blocks of High Horizontal Spatial Intensity Variation

Operates in DCT domain Not necessary to decompress Unit: 8x8 blocks in I-frames

(Intracoded)

Quantized DCT coefficients Readily extracted Fast

DCT blocks with high horizontal intensity variation

Step 3Remove noise by applying Morphological Operations

Step 1 & 2 Picked high contrast nontext

blocks Disconnected text blocks

Wide spacing, low contrast, large fonts

Step 3 Remove most isolated blocks Merges nearby blocks

Applying Morphological Operations

Step 4Segmentation based on vertical intensity variation

Detected text regions Large vertical intensity

variation Local vertical harmonics

Corresponding row of text High vertical spectrum

energyAfter horizontal/vertical text energy test

Dilating the previous result by one block

Evaluation

Not work properly when: Very big characters Too widely spaced text Image texture

Caption Text on Video

Parse, index and abstract of Video

Caption Text Information of Video Describe the content Catch “highlights”

Evaluation

Commonly used caption NOT very big characters NOT too widely spaced text NOT image texture

Therefore, important information retrieved!

Evaluation

Future work Proposed to other transform-based

compressions Use also color information to improve accuracy Combining DCT blocks to support larger fonts Solution to P- and B-frames

Summary

Proposed caption localization method For compressed video Fast

Further development is needed to improve: Accuracy Support other compression methods

automatic caption localization in compressed video by yu zhong, hongjiang zhang, and anil k. jain,...

Documents