irp of special interest group 2 - leader ...€¦ · irp of special interest group 2 - leader: tu...

1
www.petamedia.eu IRP of Special Interest Group 2 - Leader: TU Berlin Tools for Tag Generation Introduction The aim of this integrative research project (IRP) is the generation of tags and metadata using sig- nal processing and/or users’ annotation. This IRP deals with algorithms for key frame ex- traction and video shot clustering to enable users to tag more easily videos. TUB Database TUB TUD Annotation TUB Shot/subshot boundary detection EPFL Visual quality TUB Key framing TUB Text detection / feature extraction TUB EPFL TUD QMUL Video clustering Integration acitivies The IRP’s main topic is the integration of different expertise in the areas of image search engines, au- dio/video signal processing, machine learning and text detection/recognition. Preparations The first activity was to set up a database of videos from the unstructured channel “Travel” on YouTube.com using “NUE YouTube Downloader”. This tool is also useful for other IRPs, i. e. “Social Media Acquisition”. Tool used for setting up the database This common database, consisting of 100 videos and affiliated metadata (keywords, comments, user information etc.), was then annotated for shot boundaries. Key Frame Extraction Temporal video segmentation divides the video stream into a set of segments from each of which one representative frame is extracted based on at- tention features. Key frame extraction methods are simple yet ef- fective form of summarizing a long video sequence and can be used for applications that only work on images, like search engines (CBIR) or image clus- ter algorithms. These key frames can also be used for automatic or manual tagging, because they fa- cilitate users’ annotation. Extracted key frames of a video sequence Tag Generation A topic of this IRP is the generation of tags. Fol- lowing aspects have been considered: Quality Tags: Key frames are used to pro- duce quality tags by no-reference video qual- ity assessment. good quality Tags derived by no-reference video quality assessment Tags derived from Text: Recognizing text within video sequences is also a possibility for generating tags. Another possibility to produce tags is to find out persons and locations by analyzing the sentence structure of affiliated descriptions and user comments. Tags generated by concept detectors (like indoor/outdoor, face, etc.) Clustering A fundamental step in this video summarization is to create a similarity matrix and organize key frames into the tree-structure using ant-tree clus- tering method. Tree structuring of video frames by ant-tree clustering Low-level features and tags of key frames are clus- tered to find related video content, and visualized using the FastMap algorithm applied on distance matrices. Propagation of widely common tags within compact clusters is to be studied. Clustered key frames to perform similarity search Future Work Automatic ROI Image Tagging A solution for automatic image tagging can be archived by object duplicate detection in static images or key frames. The goal of detection is to propagate tags of objects re- garding from a training set. What is this object? Taj Mahal OK Taj Mahal Taj Mahal Taj Mahal User Annotated Object Automatic Image Tagging Tag propagation by object duplicate detection Subject Classification Automatic subject tagging of video involves the assignment of a subject label to a video object or to a time point within a video ob- ject. The subject label reflects the semantic theme treated by the video; it reflects what the video is about rather than what is de- picted in the visual channel. Semantic Key Frame Extraction Semantic Key Frame extraction is the task of selecting one or more keyframes to rep- resent the intellectual content of a video or a given segment of video stream. Visual Reranking to improve Video Retrieval Low-level visual features will be exploited for improving semantic-theme based retrieval of videos indexed using speech recognition transcripts of their spoken content. Contact Coordination: Pascal Kelm Web: www.petamedia.eu Email: {[email protected]}

Upload: others

Post on 19-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IRP of Special Interest Group 2 - Leader ...€¦ · IRP of Special Interest Group 2 - Leader: TU Berlin Tools for Tag Generation Introduction Theaimofthisintegrativeresearchproject(IRP)

www.petamedia.eu

IRP of Special Interest Group 2 - Leader: TU BerlinTools for Tag Generation

Introduction

The aim of this integrative research project (IRP)is the generation of tags and metadata using sig-nal processing and/or users’ annotation.This IRP deals with algorithms for key frame ex-traction and video shot clustering to enable usersto tag more easily videos.

TUB

Database

TUB

TUD

Annotation

TUB

Shot/subshot

boundary

detection

EPFL

Visual quality

TUB

Key framing

TUB

Text detection /

feature

extractionTUB

EPFL

TUD

QMUL

Video clustering

Integration acitivies

The IRP’s main topic is the integration of differentexpertise in the areas of image search engines, au-dio/video signal processing, machine learning andtext detection/recognition.

Preparations

The first activity was to set up a database ofvideos from the unstructured channel “Travel” onYouTube.com using “NUE YouTube Downloader”.This tool is also useful for other IRPs, i. e. “SocialMedia Acquisition”.

Tool used for setting up the database

This common database, consisting of 100 videosand affiliated metadata (keywords, comments,user information etc.), was then annotated forshot boundaries.

Key Frame Extraction

Temporal video segmentation divides the videostream into a set of segments from each of whichone representative frame is extracted based on at-tention features.Key frame extraction methods are simple yet ef-fective form of summarizing a long video sequenceand can be used for applications that only work onimages, like search engines (CBIR) or image clus-ter algorithms. These key frames can also be usedfor automatic or manual tagging, because they fa-cilitate users’ annotation.

Extracted key frames of a video sequence

Tag Generation

A topic of this IRP is the generation of tags. Fol-lowing aspects have been considered:

• Quality Tags: Key frames are used to pro-duce quality tags by no-reference video qual-ity assessment.

good quality

Tags derived by no-reference video quality

assessment

• Tags derived from Text:

Recognizing text within video sequences isalso a possibility for generating tags.

Another possibility to produce tags is to findout persons and locations by analyzing thesentence structure of affiliated descriptionsand user comments.

• Tags generated by concept detectors (likeindoor/outdoor, face, etc.)

Clustering

A fundamental step in this video summarizationis to create a similarity matrix and organize keyframes into the tree-structure using ant-tree clus-tering method.

Tree structuring of video frames by ant-tree clustering

Low-level features and tags of key frames are clus-tered to find related video content, and visualizedusing the FastMap algorithm applied on distancematrices. Propagation of widely common tagswithin compact clusters is to be studied.

Clustered key frames to perform similarity search

Future Work

• Automatic ROI Image Tagging

A solution for automatic image tagging canbe archived by object duplicate detection instatic images or key frames. The goal ofdetection is to propagate tags of objects re-garding from a training set.

What is this object?

Taj Mahal

OK

Taj Mahal

Taj Mahal

Taj

Mah

al

User Annotated Object Automatic Image Tagging

Tag propagation by object duplicate detection

• Subject Classification

Automatic subject tagging of video involvesthe assignment of a subject label to a videoobject or to a time point within a video ob-ject. The subject label reflects the semantictheme treated by the video; it reflects whatthe video is about rather than what is de-picted in the visual channel.

• Semantic Key Frame Extraction

Semantic Key Frame extraction is the taskof selecting one or more keyframes to rep-resent the intellectual content of a video ora given segment of video stream.

• Visual Reranking to improve Video Retrieval

Low-level visual features will be exploited forimproving semantic-theme based retrievalof videos indexed using speech recognitiontranscripts of their spoken content.

Contact

Coordination: Pascal KelmWeb: www.petamedia.euEmail: {[email protected]}