automatic caption localization in compressed video by yu zhong, hongjiang zhang, and anil k. jain,...
DESCRIPTION
Caption Text on Video Parse, index and abstract of Video Caption Text Information of Video Describe the content Catch “highlights”TRANSCRIPT
Automatic Caption Localization in Compressed Video
By Yu Zhong, Hongjiang Zhang, and Anil K. Jain, Fellow, IEEE
IEEE Transactions on Pattern Analysis and Machine Intelligence
Vol 22, No. 4, April 2000
Introduction
Caption text on videoGeneral methods for caption extractionProposed Method How it works Evaluation
Caption Text on Video
Parse, index and abstract of Video
Caption Text Information of Video Describe the content Catch “highlights”
General Extraction Methods
Component-based Geometrical arrangement Homogeneous color
Texture-based Contrast the background Horizontal intensity variation
Most published method Applied on uncompressed images
Digital video and images Compressed (MPEG & JPEG) DCT (Discrete Cosine Transform) coding Reducing interframe redundancy (for MPEG)
Proposed Method
Step 1 & 2 Detecting Blocks
Step 3 Refinement
Step 4 Segmentation of
rows
Step 1
Step 2
Step 3
Step 4
Proposed Method
Source frame
Step 1 & 2Detecting Blocks of High Horizontal Spatial Intensity Variation
Operates in DCT domain Not necessary to decompress Unit: 8x8 blocks in I-frames
(Intracoded)
Quantized DCT coefficients Readily extracted Fast
DCT blocks with high horizontal intensity variation
Step 3Remove noise by applying Morphological Operations
Step 1 & 2 Picked high contrast nontext
blocks Disconnected text blocks
Wide spacing, low contrast, large fonts
Step 3 Remove most isolated blocks Merges nearby blocks
Applying Morphological Operations
Step 4Segmentation based on vertical intensity variation
Detected text regions Large vertical intensity
variation Local vertical harmonics
Corresponding row of text High vertical spectrum
energyAfter horizontal/vertical text energy test
Dilating the previous result by one block
Evaluation
Not work properly when: Very big characters Too widely spaced text Image texture
Caption Text on Video
Parse, index and abstract of Video
Caption Text Information of Video Describe the content Catch “highlights”
Evaluation
Commonly used caption NOT very big characters NOT too widely spaced text NOT image texture
Therefore, important information retrieved!
Evaluation
Future work Proposed to other transform-based
compressions Use also color information to improve accuracy Combining DCT blocks to support larger fonts Solution to P- and B-frames
Summary
Proposed caption localization method For compressed video Fast
Further development is needed to improve: Accuracy Support other compression methods