ucb digital library project: research agenda

59
October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson Information Access for Information Access for a Digital Library: a Digital Library: Cheshire II and the Berkeley Environmental Cheshire II and the Berkeley Environmental Digital Library Digital Library Ray R. Larson Ray R. Larson School of Information Management & Systems School of Information Management & Systems University of California, Berkeley University of California, Berkeley [email protected] [email protected] Chad Carson Chad Carson Computer Science Division, EECS Computer Science Division, EECS University of California, Berkeley University of California, Berkeley [email protected] [email protected]

Upload: elvis-cline

Post on 02-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

- PowerPoint PPT Presentation

TRANSCRIPT

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Information Access for a Digital Information Access for a Digital Library:Library:

Cheshire II and the Berkeley Environmental Digital LibraryCheshire II and the Berkeley Environmental Digital Library

Ray R. LarsonRay R. Larson

School of Information Management & SystemsSchool of Information Management & SystemsUniversity of California, BerkeleyUniversity of California, Berkeley

[email protected]@sherlock.berkeley.edu

Chad CarsonChad CarsonComputer Science Division, EECSComputer Science Division, EECSUniversity of California, BerkeleyUniversity of California, Berkeley

[email protected]@eecs.berkeley.edu

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

UCB Digital Library Project: UCB Digital Library Project: Research AgendaResearch Agenda

• Funded by NSF/NASA/DARPA Digital Library Funded by NSF/NASA/DARPA Digital Library Initiative (Phases I and II)Initiative (Phases I and II)

• Research agendaResearch agenda– Understand user needs.Understand user needs.– Extend functionality of documents.Extend functionality of documents.

• ““Enliven” legacy documents.Enliven” legacy documents.

– Improve access to information.Improve access to information.– Scale to large systems.Scale to large systems.– Re-Invent Scholarly Information Access and UseRe-Invent Scholarly Information Access and Use

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

• Collection: Diverse material relevant to Collection: Diverse material relevant to California’s key habitats.California’s key habitats.

• Users: A consortium of state agencies, Users: A consortium of state agencies, development corporations, private development corporations, private corporations, regional government alliances, corporations, regional government alliances, educational institutions, and libraries.educational institutions, and libraries.

• Potential: Impact on state-wide Potential: Impact on state-wide environmental system (environmental system (CERES CERES ))

Testbed: An Environmental Testbed: An Environmental Digital LibraryDigital Library

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

The Environmental Library -The Environmental Library -Users/ContributorsUsers/Contributors

• California Resources Agency, California California Resources Agency, California Environment Resources Evaluation System Environment Resources Evaluation System (CERES)(CERES)

• California Department of Water ResourcesCalifornia Department of Water Resources

• The California Department of Fish & GameThe California Department of Fish & Game

• SANDAGSANDAG

• UC Water Resources Center ArchivesUC Water Resources Center Archives

• New Partners: CDL and SDSCNew Partners: CDL and SDSC

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

The Environmental Library - The Environmental Library - ContentsContents

• Environmental technical reports, bulletins, etc.Environmental technical reports, bulletins, etc.• County general plansCounty general plans• Aerial and ground photographyAerial and ground photography• USGS topographic mapsUSGS topographic maps• Land use and other special purpose mapsLand use and other special purpose maps• Sensor dataSensor data• ““Derived” informationDerived” information• Collection data bases for the classification and distribution Collection data bases for the classification and distribution

of the California biota (e.g., SMASCH)of the California biota (e.g., SMASCH)• Supporting 3-D, economic, traffic, etc. modelsSupporting 3-D, economic, traffic, etc. models• Videos collected by the California Resources AgencyVideos collected by the California Resources Agency

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

The Environmental Library - The Environmental Library - ContentsContents

The Environmental Library - The Environmental Library - ContentsContents

• As of mid 1999, the collection represents As of mid 1999, the collection represents about three quarters of a terabyte of data, about three quarters of a terabyte of data, including over 70,000 digital images, over including over 70,000 digital images, over 300,000 pages of environmental documents, 300,000 pages of environmental documents, and over a million records in geographical and over a million records in geographical and botanical databases. and botanical databases.

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Botanical DataBotanical Data::Botanical DataBotanical Data::

The CalFlora Database contains taxonomical The CalFlora Database contains taxonomical and distribution information for more than 8000 and distribution information for more than 8000 native California plants. The Occurrence native California plants. The Occurrence Database includes over 300,000 records of Database includes over 300,000 records of California plant sightings from many federal, California plant sightings from many federal, state, and private sources. The botanical state, and private sources. The botanical databases are linked to our CalPhotos collection databases are linked to our CalPhotos collection of Calfornia plants, and are also linked to of Calfornia plants, and are also linked to external collections of data, maps, and photos. external collections of data, maps, and photos.

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Geographical Data:Geographical Data: Geographical Data:Geographical Data:

Much of the geographical data in our collection is Much of the geographical data in our collection is being used to develop our web-based GIS Viewer. being used to develop our web-based GIS Viewer. The Street Finder uses 500,000 Tiger records of The Street Finder uses 500,000 Tiger records of S.F. Bay Area streets along with the 70,000-records S.F. Bay Area streets along with the 70,000-records from the USGS GNIS database. California Dams is from the USGS GNIS database. California Dams is a database of information about the 1395 dams a database of information about the 1395 dams under state jurisdiction. An additional 11 GB of under state jurisdiction. An additional 11 GB of geographical data represents maps and imagery that geographical data represents maps and imagery that have been processed for inclusion as layers in our have been processed for inclusion as layers in our GIS Viewer. This includes Digital Ortho Quads and GIS Viewer. This includes Digital Ortho Quads and DRG maps for the S.F. Bay Area.DRG maps for the S.F. Bay Area.

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

DocumentsDocuments: : DocumentsDocuments: :

Most of the 300,000 pages of digital documents are Most of the 300,000 pages of digital documents are environmental reports and plans that were provided by environmental reports and plans that were provided by California state agencies. This collection includes California state agencies. This collection includes documents, maps, articles, and reports on the California documents, maps, articles, and reports on the California environment including Environmental Impact Reports environment including Environmental Impact Reports (EIRs), educational pamphlets, water usage bulletins, and (EIRs), educational pamphlets, water usage bulletins, and county plans. Documents in this collection come from the county plans. Documents in this collection come from the California Department of Water Resources (DWR), California Department of Water Resources (DWR), California Department of Fish and Game (DFG), San California Department of Fish and Game (DFG), San Diego Association of Governments (SANDAG), and many Diego Association of Governments (SANDAG), and many other agencies. Among the most frequently accessed other agencies. Among the most frequently accessed documents are County General Plans for every California documents are County General Plans for every California county and a survey of 125 Sacramento Delta fish species. county and a survey of 125 Sacramento Delta fish species.

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Documents - cont.Documents - cont.Documents - cont.Documents - cont.

The collection also includes about 20Mb of The collection also includes about 20Mb of full-text (HTML) documents from the full-text (HTML) documents from the World Conservation Digital Library. In World Conservation Digital Library. In addition to providing online access to addition to providing online access to important environmental documents, the important environmental documents, the document collection is the testbed for our document collection is the testbed for our Multivalent Document research. Multivalent Document research.

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Photographs:Photographs: Photographs:Photographs:

The photo collection includes 17,000 The photo collection includes 17,000 images of California natural resources from images of California natural resources from the state Department of Water Resources, the state Department of Water Resources, several hundred aerial photos, 17,000 several hundred aerial photos, 17,000 photos of California native plants from St. photos of California native plants from St. Mary's College, the California Academy of Mary's College, the California Academy of Science, and others, a small collection of Science, and others, a small collection of California animals, and 40,000 Corel stock California animals, and 40,000 Corel stock photos. photos.

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Testbed Success StoriesTestbed Success Stories• LUPIN: CERES’ Land Use Planning Information NetworkLUPIN: CERES’ Land Use Planning Information Network

– California Country General Plans and other environmental California Country General Plans and other environmental documents.documents.

– Enter at Resources Agency Server, documents stored at and Enter at Resources Agency Server, documents stored at and retrieved from UCB DLIB server.retrieved from UCB DLIB server.

• California flood relief effortsCalifornia flood relief efforts– High demand for some data sets only available on our server High demand for some data sets only available on our server

(created by document recognition).(created by document recognition).

• CalFlora: Creation and interoperation of repositories CalFlora: Creation and interoperation of repositories pertaining to plant biology.pertaining to plant biology.

• Cloning of services at Cal State Library, FBICloning of services at Cal State Library, FBI

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Research HighlightsResearch HighlightsResearch HighlightsResearch Highlights

• DocumentsDocuments– Multivalent Document prototypeMultivalent Document prototype

• Page images, structured documents, GIS data, photographsPage images, structured documents, GIS data, photographs

• Intelligent Access to ContentIntelligent Access to Content– Document recognition Document recognition

– Vision-based Image RetrievalVision-based Image Retrieval: stuff, thing, scene : stuff, thing, scene retrievalretrieval

– Natural Language Processing: categorizing the web, Natural Language Processing: categorizing the web, Cheshire IICheshire II, TileBar Interfaces, TileBar Interfaces

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

User Interface Paradigms: User Interface Paradigms: Multivalent Documents Multivalent Documents

• An approach to new document types and An approach to new document types and their authoring. their authoring.

• Supports active, distributed, composable Supports active, distributed, composable transformations of multimedia documents. transformations of multimedia documents.

• Enables sophisticated annotations, Enables sophisticated annotations, intelligent result handling, user-modifiable intelligent result handling, user-modifiable interface, composite documents. interface, composite documents.

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Multivalent DocumentsMultivalent Documents

Cheshire LayerCheshire Layer

OCR LayerOCR Mapping LayerHistory of The Classical World

The jsfj sjjhfjs jsjjjsjhfsjf sjhfjksh sshfjsfksfjk sjs jsjfs kjsjfkjsfhskjf sjfhjkshskjfhkjshfjkshjsfhkjshfjkskjfhsfhskjfksjflksjflksjflksfsjfksjfkjskfjskfjklsslkslfjlskfjklsfklkkkdsjksfksjfkskflk sjfjksfkjsfkjsfkjshf sjfsjfjksksfjksfjksjfkthsjir\\ksksfjksjfkksjkls’ksklsjfkskfksjjjhsjhuusfsjfkjs

Modernjsfj sjjhfjs jsjjjsjhfsjf sslfjksh sshfjsfksfjk sjs jsjfs kjsjfkjsfhskjf sjfhjkshskjfhkjshfjkshjsfhkjshfjkskjfhsfhskjfksjflksjflksjflksfsjfksjfkjskfjskfjklsslkslfjlskfjklsfklkkkdsj

GIS Layer

taksksh kdjjdkd kdjkdjkd kjsksksk kdkdk kdkd dkkskksksk jdjjdj clclc ldldl

taksksh kdjjdkd kdjkdjkd kjsksksk kdkdk kdkd dkkskksksk jdjjdj clclc ldldl

Table 1.

Table Layer

kdkdkdkdk Scanned

PageImage

Valence:2: The relativecapacity to unite,react, or interact(as with antigensor a biologicalsubstrate).

Webster’s 7th CollegiateDictionary

Network Protocols &Resources

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

GIS in the MVD FrameworkGIS in the MVD Framework

• Layers are georeferenced data sets.Layers are georeferenced data sets.

• Behaviors areBehaviors are

– display semi-transparentlydisplay semi-transparently

– panpan

– zoomzoom

– issue queryissue query

– display contextdisplay context

– ““spatial hyperlinks”spatial hyperlinks”

– annotationsannotations

• Written in Java (to be merged with MVD-1 code line?)Written in Java (to be merged with MVD-1 code line?)

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

GIS Viewer Example GIS Viewer Example http://elib.cs.berkeley.edu/annotations/gis/buildings.htmlhttp://elib.cs.berkeley.edu/annotations/gis/buildings.html

GIS Viewer Example GIS Viewer Example http://elib.cs.berkeley.edu/annotations/gis/buildings.htmlhttp://elib.cs.berkeley.edu/annotations/gis/buildings.html

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Overview of Cheshire IIOverview of Cheshire IIOverview of Cheshire IIOverview of Cheshire II

• The Cheshire II system is intended to The Cheshire II system is intended to provide an easy-to-use, standards-compliant provide an easy-to-use, standards-compliant system capable of retrieving any type of system capable of retrieving any type of information in a wide variety of settings.information in a wide variety of settings.

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Overview of Cheshire IIOverview of Cheshire IIOverview of Cheshire IIOverview of Cheshire II• It supports SGML and XML.It supports SGML and XML.• It is a client/server application.It is a client/server application.• Uses the Z39.50 Information Retrieval Protocol.Uses the Z39.50 Information Retrieval Protocol.• Server supports a Relational Database Gateway.Server supports a Relational Database Gateway.• Supports Boolean searching of all servers.Supports Boolean searching of all servers.• Supports probabilistic ranked retrieval in the Cheshire search engine.Supports probabilistic ranked retrieval in the Cheshire search engine.• Search engine supports ``nearest neighbor'' searches and relevance Search engine supports ``nearest neighbor'' searches and relevance

feedback.feedback.• GUI interface on X window displays.GUI interface on X window displays.• WWW/CGI forms interface for DL, using combined client/server CGI WWW/CGI forms interface for DL, using combined client/server CGI

scripting via WebCheshire.scripting via WebCheshire.• Image Content retrieval using BlobWorldImage Content retrieval using BlobWorld• Support for the SDLIP (Simple Digital Library Interoperability Protocol) Support for the SDLIP (Simple Digital Library Interoperability Protocol)

for search and as Z39.50 Gatewayfor search and as Z39.50 Gateway

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Cheshire II SearchingCheshire II SearchingCheshire II SearchingCheshire II Searching

Z39.50 Internet

ImagesScannedText

Local Remote

Z39.50

Z39.50

Z39.50

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Current Usage of Cheshire IICurrent Usage of Cheshire IICurrent Usage of Cheshire IICurrent Usage of Cheshire II

• Web clients for:Web clients for:– NSF/NASA/ARPA Digital Library NSF/NASA/ARPA Digital Library

• Includes support for full-text and page-level search.Includes support for full-text and page-level search.• Experimental Blob-World image searchExperimental Blob-World image search

– SunSiteSunSite– University of Liverpool.University of Liverpool.– University of Essex, HDS (part of AHDS)University of Essex, HDS (part of AHDS)– California Sheet Music ProjectCalifornia Sheet Music Project– Cha-Cha (Berkeley Intranet Search Engine)Cha-Cha (Berkeley Intranet Search Engine)– Univ. of VirginiaUniv. of Virginia

• Cheshire ranking algorithm is basis for Inktomi Cheshire ranking algorithm is basis for Inktomi (i.e., Yahoo, Hotbot, MSN? and others)(i.e., Yahoo, Hotbot, MSN? and others)

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Image Retrieval ResearchImage Retrieval ResearchImage Retrieval ResearchImage Retrieval Research

• Finding “Stuff” vs “Things”Finding “Stuff” vs “Things”

• BlobWorldBlobWorld

• Other Vision ResearchOther Vision Research

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

BlobworldBlobworld: use regions for retrieval: use regions for retrievalBlobworldBlobworld: use regions for retrieval: use regions for retrieval

• We want to find general We want to find general objectsobjects Represent images based on Represent images based on coherent regionscoherent regions

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

OutlineOutlineOutlineOutline

• Why regions?Why regions?

• Creating Blobworld: segmentation and Creating Blobworld: segmentation and

descriptiondescription

• Using Blobworld: query experimentsUsing Blobworld: query experiments

• Indexing blobs for faster queryingIndexing blobs for faster querying

• ConclusionsConclusions

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

CreatingCreating and using Blobworld and using BlobworldCreatingCreating and using Blobworld and using Blobworld

extract features segment image describe regions query

Create Use

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Extract Extract featuresfeatures for each pixel for each pixelExtract Extract featuresfeatures for each pixel for each pixel

• ColorColor– Take average color (L*a*b*) at the selected scaleTake average color (L*a*b*) at the selected scale

ignore local color variations due to texture ignore local color variations due to texture

– ““zebra = gray horse + stripes”zebra = gray horse + stripes”

• TextureTexture– Find contrast, anisotropy, polarity at the selected scaleFind contrast, anisotropy, polarity at the selected scale

• PositionPosition

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

• ModelModel feature distribution as a feature distribution as a mixture of mixture of GaussiansGaussians using Expectation-Maximization using Expectation-Maximization (EM)(EM)

Find Find groupsgroups in feature space in feature spaceFind Find groupsgroups in feature space in feature space

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

• Label each pixelLabel each pixel based on its Gaussian cluster based on its Gaussian cluster

• Find connected components Find connected components regions regions

Find Find regionsregions in the image in the imageFind Find regionsregions in the image in the image

1

334

2 11

3 4

2

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Describe regions by color, texture, Describe regions by color, texture, shapeshape

• ColorColor– Color histogram Color histogram within regionwithin region– Quadratic distance: encode similarity between Quadratic distance: encode similarity between

color binscolor binsdd22

histhist((xx, , yy) = () = (xx - - yy)' )' AA ( (xx - - y)y)

• TextureTexture– Mean contrast and anisotropyMean contrast and anisotropy

stripes vs. spots vs. smoothstripes vs. spots vs. smooth

• (Basic) Shape(Basic) Shape– Fourier descriptors of contourFourier descriptors of contour

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Select appropriate Select appropriate scalescale for for processingprocessing

Select appropriate Select appropriate scalescale for for processingprocessing

• PolarityPolarity: do all the gradient vectors point in : do all the gradient vectors point in the same direction?the same direction?

• Choose Choose scalescale where polarity stabilizes where polarity stabilizes include one approximate period include one approximate period

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Initialize meansInitialize means using image data using image dataInitialize meansInitialize means using image data using image data

• Before, we picked random initializationBefore, we picked random initialization

• Now, choose initial means based on image Now, choose initial means based on image tilestiles

• Add noise to means and restart EM (4 runs Add noise to means and restart EM (4 runs per per KK))

K = 2 K = 5K = 4K = 3

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

update ,

update labels update ,

Grouping: Expectation-Grouping: Expectation-MaximizationMaximization

Grouping: Expectation-Grouping: Expectation-MaximizationMaximization

• Given class characteristics (Given class characteristics (,,), find class membership), find class membership• Given class membership, find class characteristics (Given class membership, find class characteristics (,,))• IterateIterate

update labels

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

• Model selectionModel selection: Minimum Description Length: Minimum Description Length– Prefer fewer Gaussians if performance is comparablePrefer fewer Gaussians if performance is comparable

How many Gaussians?How many Gaussians?How many Gaussians?How many Gaussians?

vs.vs.

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

• ModelModel feature distribution as a feature distribution as a mixture of mixture of GaussiansGaussians using Expectation-Maximization using Expectation-Maximization (EM)(EM)

Find Find groupsgroups in feature space in feature spaceFind Find groupsgroups in feature space in feature space

EM mathEM mathEM mathEM mathProbability density:Probability density:

Update equations:Update equations:

wherewhere

N

jj

N

jijijj

i

N

jj

N

jjj

i

N

jji

xip

xxxip

xip

xipx

xipN

1

old

1

Tnewnewold

new

1

old

1

old

new

1

oldnew

,

,

,

,

,1

K

kkkk

iiij

xf

xfxip

1

old,

)()(

1

1T21

21

2 det)2(

1iii

d

xx

i

ii

K

iiii

exf

xfxf

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Encode similarity between Encode similarity between colorcolor bins binsEncode similarity between Encode similarity between colorcolor bins bins

• Quadratic distanceQuadratic distance

• Distance between histograms x and y:Distance between histograms x and y:

dd22histhist((xx, , yy) = () = (xx - - yy)' )' AA ( (xx - - yy))

• AAijij is based on the similarity between bins is based on the similarity between bins ii and and jj– Neighboring bins haveNeighboring bins have AAijij = 0.5 = 0.5

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Fourier descriptors for Fourier descriptors for shapeshapeFourier descriptors for Fourier descriptors for shapeshape

• [Zahn & Roskies ’72, Kuhl & Giardina ’82][Zahn & Roskies ’72, Kuhl & Giardina ’82]

• Find (Find (xx,,yy) representation of outer contour) representation of outer contour

• Find Fourier series of (Find Fourier series of (xx,,yy))– Coefficients specify an ellipse (4 parameters):Coefficients specify an ellipse (4 parameters):

major axis, minor axis, orientation, starting pointmajor axis, minor axis, orientation, starting point

• Remove starting point ambiguityRemove starting point ambiguity

• Store first ten Fourier coefficientsStore first ten Fourier coefficients

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Creating and Creating and usingusing Blobworld BlobworldCreating and Creating and usingusing Blobworld Blobworld

extract features segment image describe regions query

Create Use

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Querying: let user see the Querying: let user see the representationrepresentation

Querying: let user see the Querying: let user see the representationrepresentation

• Current systems are unsatisfyingCurrent systems are unsatisfying– User can’t see what the computer seesUser can’t see what the computer sees– Unclear how parameters relate to the imageUnclear how parameters relate to the image

• User should interact with the representationUser should interact with the representation– Helps in query formulationHelps in query formulation– Makes results understandableMakes results understandable– Minimizes disappointmentMinimizes disappointment

http://elib.cs.berkeley.edu/photos/blobworldhttp://elib.cs.berkeley.edu/photos/blobworld

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Query experimentsQuery experimentsQuery experimentsQuery experiments

• Collection of 10,000 Corel stock photosCollection of 10,000 Corel stock photos

• Five query images in each of ten categoriesFive query images in each of ten categories(e.g., cheetahs, polar bears, airplanes)(e.g., cheetahs, polar bears, airplanes)

• Compare Blobworld to global histogram queriesCompare Blobworld to global histogram queries

• PrecisionPrecision (% of retrieved images that are correct) (% of retrieved images that are correct) vs. vs. RecallRecall (% of correct images that are retrieved) (% of correct images that are retrieved)

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Distinctive Distinctive objectsobjectsDistinctive Distinctive objectsobjects

• Tigers, cheetahs, and zebras:Tigers, cheetahs, and zebras:– BlobworldBlobworld does better than global histograms does better than global histograms

cheetahs zebras

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

black bears

Distinctive Distinctive objects and objects and backgroundsbackgrounds

Distinctive Distinctive objects and objects and backgroundsbackgrounds

• Eagles and black bears:Eagles and black bears:– BlobworldBlobworld does better than global histograms does better than global histograms

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Distinctive Distinctive scenesscenesDistinctive Distinctive scenesscenes

• Airplanes and brown bears:Airplanes and brown bears:– Global histogramsGlobal histograms do better than Blobworld do better than Blobworld– But Blobworld has room to grow (shape, etc.)But Blobworld has room to grow (shape, etc.)

airplanes

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

IndexIndex to to search huge collectionssearch huge collectionsIndexIndex to to search huge collectionssearch huge collections• Indexing is trickier than for traditional dataIndexing is trickier than for traditional data

• We can afford some We can afford some mistakesmistakes: even with full : even with full search, we’ll miss some tigers and include search, we’ll miss some tigers and include some pumpkinssome pumpkins

• Two approaches we have tried:Two approaches we have tried:– Store Store terms terms and treat image as a and treat image as a documentdocument– Store Store featuresfeatures and index using a and index using a treetree

• Final (“correct”) ranking of images from indexFinal (“correct”) ranking of images from index

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Index using conventional IR Index using conventional IR methodsmethods

Index using conventional IR Index using conventional IR methodsmethods

• Treat each database blob as a Treat each database blob as a documentdocument– Store “Store “termsterms” (bins) for color, texture, location, and ” (bins) for color, texture, location, and

shapeshape– Repeat color terms based on histogram weightsRepeat color terms based on histogram weights

• Index using Index using Cheshire IICheshire II

• Treat each query blob as a Treat each query blob as a documentdocument– Repeat “terms” according to query weightsRepeat “terms” according to query weights

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Indexing and Retrieval with Indexing and Retrieval with Cheshire IICheshire II

Indexing and Retrieval with Indexing and Retrieval with Cheshire IICheshire II

• Originally used the same probabilistic Originally used the same probabilistic algorithm used for textalgorithm used for text– Blobs are not distributed like text words or stemsBlobs are not distributed like text words or stems

• Now using a weighting based on coordination Now using a weighting based on coordination level match with a minimum threshold (must level match with a minimum threshold (must have at least half of the characteristics of the have at least half of the characteristics of the query cluster.query cluster.

• Still eyeballing data, but seems much better Still eyeballing data, but seems much better for many types of queriesfor many types of queries

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

ConclusionsConclusionsConclusionsConclusions

• Image retrieval in general collections Image retrieval in general collections requires requires region segmentation and region segmentation and descriptiondescription

• Blobworld yields high precision in queries Blobworld yields high precision in queries for distinctive objectsfor distinctive objects

• Blobworld can be Blobworld can be indexedindexed to allow fast to allow fast queryingquerying

October 26, 1999 ASIS Annual Meeting 1999: Ray R. Larson

Further InformationFurther InformationFurther InformationFurther Information

• Full Cheshire II client and server source is Full Cheshire II client and server source is available available ftp://sherlock.berkeley.edu/pub/cheshire/ftp://sherlock.berkeley.edu/pub/cheshire/– Includes HTML and Troff documentationIncludes HTML and Troff documentation

• http://cheshire.lib.berkeley.edu/http://cheshire.lib.berkeley.edu/

• UC Berkeley Digital Library ProjectUC Berkeley Digital Library Project– http://elib.cs.berkeley.eduhttp://elib.cs.berkeley.edu