cross-media intelligent searching in digital library

79
Cross-media Cross-media Intelligent Searching Intelligent Searching in Digital Library in Digital Library Yueting Zhuang Yueting Zhuang Zhejiang University, China Zhejiang University, China Nov. 18, 2006, Egypt Nov. 18, 2006, Egypt

Upload: dai

Post on 23-Jan-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Cross-media Intelligent Searching in Digital Library. Yueting Zhuang Zhejiang University, China Nov. 18, 2006, Egypt. Outline. 1. CADAL: China digital library 2. Our Vision to next generation of digital library 3. From Multimedia Retrieval to Cross-media Retrieval - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Cross-media Intelligent Searching in Digital Library

Cross-media Intelligent Cross-media Intelligent Searching in Digital Searching in Digital

Library Library

Yueting Zhuang Yueting Zhuang

Zhejiang University, ChinaZhejiang University, China

Nov. 18, 2006, EgyptNov. 18, 2006, Egypt

Page 2: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal5. Building Personalized Portal

6. Conclusion6. Conclusion

Page 3: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal5. Building Personalized Portal

6. Conclusion6. Conclusion

Page 4: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang3rd Workshop 2004, CMU, USA

Page 5: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

ICUDL 2005, Zhejiang University, China

Page 6: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Page 7: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

1. CADAL: China Digital 1. CADAL: China Digital LibraryLibrary

China-US One Million Book Digital Library Project

a unique library resource to scholars, students, and

citizens

contain over one million scanned books

A big step towards the goal: create a universal free to

read digital library• Get knowledge available on the web, anytime, anyone, anywhere

http://www.cadal.zju.edu.cnhttp://www.cadal.zju.edu.cn

Page 8: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Page 9: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

1.0231.023 million books was digitized, including: million books was digitized, including: Degree dissertationDegree dissertation Modern Chinese books Modern Chinese books Traditional cultural resources Traditional cultural resources English booksEnglish books

Supporting multimedia resource:Supporting multimedia resource: Image Image audioaudio videovideo 3D model3D model Chinese calligraphyChinese calligraphy

about 200,000 clicks a day (http://www.cadal.zju.edu.cn)about 200,000 clicks a day (http://www.cadal.zju.edu.cn) users spread over 70 countries and regionsusers spread over 70 countries and regions 16 scanning centers in China, occupying more than 2000 square met16 scanning centers in China, occupying more than 2000 square met

ersers

As of today, CADAL has achieved:As of today, CADAL has achieved:

Page 10: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Scanning books

Processing digitized books

Page 11: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

成都

长春

西安

广州

北京

南京

上海杭州武汉

Page 12: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Users spread over 70 countries and regions

Page 13: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Service structure of Service structure of CADAL:CADAL:

CALIS Integration

Unified Authentication

Personal Portal

Personal Service

Unified Quick Search

Advanced Search

Knowledge Map

Sign Language

Movie Search

CalligraphySearch

Image Search

Cultural Relics

Illustration Search

Bilingual Translation

Help System

FullText Search

Metadata Havesting

Resource Location

Access Control Policy

User Management Logging

Page 14: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

digital resources are classified into 8 classes digital resources are classified into 8 classes

according to the publication time and type.according to the publication time and type.

both unified and advanced search are provided for all both unified and advanced search are provided for all

resourcesresources

Current services provided by CADALCurrent services provided by CADAL::

(1) (1) Metadata searchingMetadata searching

Page 15: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(2) (2) Unified searchUnified search

Page 16: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

China Ancient Choose the types of resources

to search

Page 17: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

search results contain each type of resources.

Page 18: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(3) (3) advanced searchadvanced search

Users can choose search scope, combined results and result style

Second search, full texts and detailed information are available in result page.

Page 19: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(4) (4) full-text searchfull-text search

Full text search uses the texts from OCR

Page 20: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal5. Building Personalized Portal

6. Conclusion6. Conclusion

Page 21: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

2. Our Vision to Next Generation of 2. Our Vision to Next Generation of Digital LibraryDigital Library

support multimodal sourcessupport multimodal sources

enable cross-media retrievalenable cross-media retrieval

What the next generation of DL looks like?

typical features of existing DLs: books are indexed by title, author, keywords…books are indexed by title, author, keywords…

users query books by keywords inputusers query books by keywords input

mostly only text information is returnedmostly only text information is returned

multimodal data is not fully-supportedmultimodal data is not fully-supported

Page 22: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Extension to the concept of “Book”Extension to the concept of “Book”

The key of our vision to next generation of The key of our vision to next generation of digital library is the extension of “book” digital library is the extension of “book” conceptconcept• A book is regarded as A book is regarded as not only the written not only the written

symbols on papers, but also any type of symbols on papers, but also any type of multimedia “item”,multimedia “item”, such as such as

A video clipA video clip An audio clipAn audio clip A piece of paintingA piece of painting …………..

Page 23: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

So in the next generation of DL, “book” can be in “multimodal”:

Scenery Image Chinese Calligraphy Video fragment Audio clips

……

a general data representation for multimodal data

feature analysis knowledge mining

We can find a general data structure to represent multimodal “books”

Page 24: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Supporting multimodal data is an important trend in multimedia retrieval:

We get multimodal information from real world, then can we get multimodal data from digital world, especial like a digital library?

multimodal ?

real world digital world

texts

image

audio

video……

Page 25: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Cross-media retrievalCross-media retrieval

After the extension of “After the extension of “Book”Book” concept, the retrieval shall also be concept, the retrieval shall also be extended. extended.

We call it “cross-media retrieval”. We call it “cross-media retrieval”.

Page 26: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Cross-media-Cross-media-

Cross-media-

Scenario: a simple example of cross-media :

Starting Query

Starting QueryStarting

Query

User can start a query from any type of media, and relevant multimedia data would be returned.

Textual Description tothe giant Panda: the Panda is a kind of cat which ……

“Giant Panda” Image

“Giant Panda” Text “Giant Panda” Audio

Page 27: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Cross-media retrieval is a useful way to access multimodal data:

available available

available available

Cross-media retrieval can be regarded as the simulation of the real world, and it helps us get multimodal data in a more flexible and more informative way!

textsimage

audiovideo

…… ……

Page 28: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

What cross-media retrieval needs to do?

user query interfaceSubmit a query example

It can be an image, audio or keywords…

cross-media search enginecross-media search enginecross-media search engine

texts image audio video

raw data

knowledge base

multimodal representation & index

query results:

texts, images, audios…

Page 29: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal 5. Building Personalized Portal

6. Conclusion6. Conclusion

Page 30: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

3. From Multimedia Retrieval to 3. From Multimedia Retrieval to Cross-media RetrievalCross-media Retrieval

1) Image Retrieval: Content-based

Page 31: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

negative example

query example

Searching images

relevance feedback

positive example

Page 32: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

multimedia retrieval

(2) Image retrieval: text-based

Query text

Page 33: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(3) Motion retrieval

Given a query example of motion data, we can find similar motion data from database.

multimedia retrieval

Page 34: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(4) Audio retrieval: Content-based

multimedia retrieval

content-based audio search engine

audio depository

audio query example

user

submit

adjust feature weight

adjust query center

returned audio results

return

relevance feedbackuser judge

System Framework

Page 35: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

audio retrieval: key techniques

multimedia retrieval

extract auditory features in compression field from extract auditory features in compression field from

audio clipsaudio clips

cluster fuzzy auditory featurescluster fuzzy auditory features

represent audio clips with the cluster centerrepresent audio clips with the cluster center

retrieve similar audios by cluster center matchingretrieve similar audios by cluster center matching

introduce relevance feedback techniquesintroduce relevance feedback techniques

Page 36: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

query examplefeature weight

relevance feedback

weight adjusting

audio retrieval: an example

multimedia retrieval

Page 37: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(5) video retrieval: Overview

multimedia retrieval

unlike text resources, video is unstructured.unlike text resources, video is unstructured.• rich in visual contents;rich in visual contents;• poor in semantic understanding; poor in semantic understanding;

the challenging issues:the challenging issues:• summarization & structuring;summarization & structuring;• video miningvideo mining

Page 38: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(5) video retrieval: key techniques

multimedia retrieval

video structuring: video structuring: construct video table-of-content (VTOC)construct video table-of-content (VTOC) make it physically structured. make it physically structured.

video summarization: video summarization: help the user quickly grasp the content of video clipshelp the user quickly grasp the content of video clips support video browsing support video browsing video encoding/compressionvideo encoding/compression

Page 39: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

video

Scene

group

shot

key frame

concept clustering

video stream

temporal features

spatial features

table of contents

shot boundary detection

Key Frame Extraction

grouping

scene construction

video structuring

Page 40: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

video summary: video content mining

original video(redundant)

summarized video(concise and informative )

video contentmining

Find meaningful patterns to support efficient video browsing

Page 41: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

two news video are separated in 6 video shots (the following are the key frames) .And their total length is 3 minutes

video summary: an example

Page 42: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

After video summarization, the video is 3 seconds.

And it consists of 3 key frames as below.

Page 43: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

video shot clustering result

video shot

original videosimilar video shots are clustered together

Page 44: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Video Retrieval

video browse

Page 45: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

key frames

video browse

summary

Page 46: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(6) 3D model retrieval: overview

multimedia retrieval

measure 3D model with shape similarity

Page 47: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(6) 3D model retrieval: an example

multimedia retrieval

query example

Page 48: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

As shown above, the multimedia As shown above, the multimedia retrieval is generally retrieval is generally content-based X retrieval—CBXR. —CBXR.

Page 49: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

towards cross-media Retrieval

Motivation

image retrieval

audio retrievalvideo retrieval

motion retrieval

3D model retrieval

Cross-media retrieval……

intelligent integration

We can provide a more flexible and efficient way to access multimodal data.

We name it as cross-media retrieval.

CBXR

Page 50: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Support multimodal sourcesSupport multimodal sources smooth integration of multimodal data;smooth integration of multimodal data;

query media objects by examples of different modalities; query media objects by examples of different modalities;

Challenging issues:Challenging issues: texts, images, audios, etc. are represented with different texts, images, audios, etc. are represented with different

featuresfeatures

different features are heterogeneousdifferent features are heterogeneous

cross-media similarity can’t be measured by content featurescross-media similarity can’t be measured by content features

there is a semantic gap between low-level features and there is a semantic gap between low-level features and semanticssemantics

Page 51: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Our Solution to Cross-media retrieval

build cross-indexing from multimodal build cross-indexing from multimodal datadata

organize multimedia documentorganize multimedia document

explore cross-media correlationsexplore cross-media correlations

…………

Page 52: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Cross-indexing-based retrieval: General idea

text

image

audio

video

graphics

text search engine

image search engine

audio search engine

video search engine

graphics search engine

preprocessingcross-index

graph

cross-index multimodal

search engine

SVM based

clustering

Retrie

val in

terfa

cequery

search results fusion

results

relevance feedback

……

Page 53: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

an image query example

retrieved images

retrieved video

retrieved audio

(1) Cross-index retrieval: interface

The system now support images, audios and videos. Users can submit any of the media objects, and the system returns relevant images, audios and videos.

Page 54: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Building multimedia document: General idea

definition of multimedia documentdefinition of multimedia document

a logical representation of multimodal data;a logical representation of multimodal data;

consists of semantically related media objects; consists of semantically related media objects;

formal structure:formal structure:

Document := <ID, Title, URI, KeywordList, ElementSet,LinkSet>Document := <ID, Title, URI, KeywordList, ElementSet,LinkSet>

ElementSet := { (Audio| Image | Text | Video) i | i N }∈ElementSet := { (Audio| Image | Text | Video) i | i N }∈

Audio := <ID, ParentID, URI, Size, KeywordList, AudioFeature>Audio := <ID, ParentID, URI, Size, KeywordList, AudioFeature>

Image := <ID, ParentID, URI, Size, KeywordList, ImageFeature>Image := <ID, ParentID, URI, Size, KeywordList, ImageFeature>

Text := <ID, ParentID, URI, KeywordList >Text := <ID, ParentID, URI, KeywordList >

Video := <ID, ParentID, URI, Frames, KeywordList, VideoFeature>Video := <ID, ParentID, URI, Frames, KeywordList, VideoFeature>

Page 55: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Build multimedia document: framework

text

image

audio

video

graphics

Semantic skeleton base

Storage SubsystemMultimedia document

Preprocessing

Learning and Relevance feedback subsystem

Query Processor(multimedia document + media objects)

keyword

Page 56: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Besides keyword-based search, the user can perform a content-based search with a specific media object as the query example

A multimedia document is visualized as its sketch, i.e. text, images and key-frame lists for videos.

image video text multimedia document

the left figure is the relevant media data retrieved by the query of “water”.

Building multimedia document: retrieval interface

Page 57: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT ZhuangChallenges:

visual feature space auditory feature space

high-level semantics: war, dog, bird, car, tiger

Gap 2: Semantic gap

1. multimodal data reside in heterogeneous feature spaces2. the semantic gap

Gap 1: Content gap

Exploring cross-media correlations: challenges

Page 58: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Images and audios represent high-level semantics from different perspectives. If we can find the correlation between different perspectives, we can enable cross-media retrieval with the bridge of correlations.

bird explosiontiger dogcar

correlationcorrelation

Exploring Cross-media Correlations: Solutions

Page 59: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Canonical correlation analysis

11 12, ..., 1

21 22, ..., 2

......

1 2, ...,

,

,'

,

m

m

n n nm

x x x

x x xX

x x x

11 12, ..., 1

21 22, ..., 2

......

1 2, ...,

,

,'

,

m

m

n n nm

y y y

y y yY

y y y

Output:

11 12, ..., 1

21 22, ..., 2

......

1 2, ...,

,

,

,

p

p

n n np

x x x

x x xX

x x x

11 12, ......, 1

21 22, ......, 2

......

1 2, ......,

,

,

,

q

q

n n nq

y y y

y y yY

y y y

image feature matrix: Audio feature matrix:

Input : npX nqY

At the same time, the correlation between X and Y maximally coincides with the correlation between X’ and Y’

X and Y are of different dimension !

X and Y are of the same dimension !

Basic idea:

Exploring cross-media correlations: mathematical realization

Page 60: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

the correlation network in the subspace

locate

1. how to measure both intra- and inter-media correlations ?1. how to measure both intra- and inter-media correlations ?

2. how to introduce new media objects into the system?2. how to introduce new media objects into the system?

locate

testing data

Intra-mediaIntra-media

cross-media

cross-media

Exploring cross-media correlations: subsequent challenges

Page 61: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal 5. Building Personalized Portal

6. Conclusion6. Conclusion

Page 62: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

4. Retrieval of Chinese Calligraphy 4. Retrieval of Chinese Calligraphy CharacterCharacter

motivation: Original calligraphy works is unique. They exist in paper, bamboo slips, and are easily to be destroyed.

Page 63: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

How to search?

In our digital library, we digitize Chinese Calligraphy works, Design retrieval systems to make them sharable by all the people on internet.

Page 64: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

the objective:

1. to query similar characters1. to query similar characters

Similar characters could be found and returned to users.This is like traditional content based image retrieval.

Page 65: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

2. to find out where a character comes from2. to find out where a character comes from

We aim to provide an intelligent way to find out surrounding characters, and represent them to users.

Character “ 其” comes from this work

Page 66: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

System Overview

segmentation

individual

characters

feature extraction

Database

feature dataraw data

scanner

Ancient Books

digitize

search engine

Page 67: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

feature extractionfeature extraction

shape matchingshape matching

speed upspeed up

(2). retrieval :

(1). segmentation :

noise eliminationnoise elimination

page-image analysispage-image analysis

smoothingsmoothing

Page 68: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(1) segmentation

We segment page into columns, and cut the columns into individual characters within the minimum-bounding box.

minimum-bounding box

Page 69: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

(2) Retrieval of Chinese Calligraphy Characters

feature extraction:feature extraction:

we use contour points to represent the calligraphy character,and keep the features of each individual calligraphy character in the database

Calligraphy character is written by brush in stead of hard pen.The brush causes stroke varies in different shape and different sickness. Also the ancient calligraphy has many degradation because of nature changes.

Page 70: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

shape matching:shape matching:

•use polar coordinates to represent the characters:

divide the direction into 8 bins equally, and divide each bin into 4 areas. Then count the points in every bins as show in the picture.

Page 71: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

speed up strategy:speed up strategy:

coarse-to-fine Strategy

improve Shape matching algorithm• dynamic Time Warping of projecting histogram• extended DTW for 2D calligraphy contour warping

high dimensional indexing

Page 72: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Visualization of Chinese

Calligraphy

Shape-based character retrieval

Retrieval result

Submit Example

Page 73: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

OutlineOutline

1. CADAL: China digital library1. CADAL: China digital library

2. Our Vision to next generation of digital library2. Our Vision to next generation of digital library

3. From Multimedia Retrieval to Cross-media 3. From Multimedia Retrieval to Cross-media RetrievalRetrieval

4. Retrieval of Chinese calligraphy character: a 4. Retrieval of Chinese calligraphy character: a cross-media practicecross-media practice

5. Building Personalized Portal5. Building Personalized Portal

6. Conclusion6. Conclusion

Page 74: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

5. Building Personalized Portal5. Building Personalized Portal

Personalized portal

Web personalization is the technique to help users quickly Web personalization is the technique to help users quickly locate interesting information which features locate interesting information which features multimediamultimedia and and cross-mediacross-media..

Service integration around the content

Information filtering based recommendation

Show me the information that I really need !

Page 75: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

personalized portal

Personalization services provided by portal:Personalization services provided by portal: my bookshelfmy bookshelf my bookmarkmy bookmark my rulesmy rules personal profile personal profile

settingsetting

My bookshelf

My bookmark

Books recommended by rules

Page 76: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

detail information about bookdetail information about book translate metadatatranslate metadata full-text searchfull-text search my bookshelf managementmy bookshelf management rankingranking CALIS union catalog and inter- CALIS union catalog and inter-

library loan library loan

““My bookshelf”My bookshelf” management management ““my bookmark”my bookmark” management management bilingual translation bilingual translation full-text searchfull-text search

service integration around the content

Page 77: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

information filtering based recommendation

the classification of Web datathe classification of Web data content data: texts, images……content data: texts, images…… structure data: XML/HTML tagstructure data: XML/HTML tag usage data: Web access logusage data: Web access log user profile: preferences, demographic informationuser profile: preferences, demographic information

implementing information filtering techniquesimplementing information filtering techniques content –based filtering methodcontent –based filtering method collaborative filtering methodcollaborative filtering method

Page 78: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

6. Conclusion6. Conclusion•Next generation of digital library shall focus more on multimedia, and finally cross-media retrieval.

•But more research issues to be faced with……

• Cross-Media Representation Framework• Cross-Media Knowledge-based Reasoning• Analysis and Recognition• Complex retrieval

Page 79: Cross-media Intelligent Searching in Digital Library

ICUDL06, YT ZhuangICUDL06, YT Zhuang

Thanks !Thanks !