aol search speaker series virginia tech’s digital library research laboratory dec. 20, 2004 -- aol...

56
AOL Search Speaker Series Virginia Tech’s Digital Library Research Laboratory Dec. 20, 2004 -- AOL HQ Edward A. Fox, [email protected] Virginia Tech, Blacksburg, VA 24061 USA http://fox.cs.vt.edu/talks/2004/ http://fox.cs.vt.edu/cv.htm

Upload: cornelius-benson

Post on 01-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

AOL Search Speaker Series

Virginia Tech’s Digital LibraryResearch Laboratory

Dec. 20, 2004 -- AOL HQEdward A. Fox, [email protected]

Virginia Tech, Blacksburg, VA 24061 USAhttp://fox.cs.vt.edu/talks/2004/

http://fox.cs.vt.edu/cv.htm

Acknowledgements (Selected)

• Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; ITR-0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS

Acknowledgements: Faculty, Staff

• Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …

Acknowledgements: Students

• Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Saverio Perugini, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Xiaoyan Yu, Baoping Zhang, Qinwei Zhu, …

Rao Shen’s Preliminary Exam:Hypothesis and Research Questions

• The 5S framework provides effective solutions to DL integration.

– Formally define the DL integration problem?– Guide integration of domain focused DLs?

• How to formally model such domain specific DLs?• How to integrate formally defined DL models into a

union DL model?• How to use the union DL model to help design and

implement high quality integrated DLs?

– Assess the integration?

Related Work

DL interoperability approach

Intermediary-based mapping-based

Consists of

mediator wrapper agent

use

two architectures

federation Union Archiving

used in

Consists of

hybrid mapper composite mapper

use

schema mapping

use

SemInt

has an example

LSD

has an example

Interrelated with

DL interoperability approach

Intermediary-based mapping-based

Consists of

mediator wrapper agent

use

two architectures

federation Union Archiving

used in

Consists of

hybrid mapper composite mapper

use

schema mapping

use

Interrelated with

GA

trained by

DL integration formalization

based on

Formal Definition of DL Integration

• DLi=(Ri, DMi, Servi, Soci), 1 i n

– Ri is a network accessible repository

– DMi is a set of metadata catalogs for all collections

– Servi is a set of services

– Soci is a society

• UnionRep• UnionCat• UnionServices• UnionSociety

Formal Definition of DL Integration (Cont.)

• DL integration problem definition:

Given n individual libraries, integrate the n DLs to create a UnionDL.

Demonstration: ETANA-DL (NSF ITR w. CWRU)

feathers.dlib.vt.edu

Repository1

DL1

Repository2

Union Catalog

Union Repository

Catalog1 Catalog2

Searching

Union DL DL2

archaeologists

Society

General Public

Society

ArchaeologistsGeneral Public

Union Society

ServiceBrowsingService

Union Service

Harvesting, Mapping,Searching, Browsing,

Clustering, Visualization

Architecture of a Union DL

Union Catalog Integration

VN MetadataFormat

Global MetadataFormat

VNCatalog

HDCatalog

Union Catalog

MappingTool

Wrapper

MappingTool

Wrapper

HD MetadataFormat

Virtual Nimrin(VN)

Halif DigMaster(HD)

Union ArchDL

Example of Union Service: CitiViz

CitiViz:A Visual User Interface to the

CITIDEL System

ECDL 2004, Bath, England, September 2004

Nithiwat Kampanya, Rao Shen, Seonho Kim, Chris North, and

Edward A. [email protected] http://fox.cs.vt.edu

Digital Object

RepositoryCollection Minimal DL

Metadata Catalog

Descriptive Metadata

Specification

A Minimal DL in the 5S Framework

Structural Metadata

Specification

Streams Structures Spaces Scenarios Societies

indexing

browsing searching

services

hypertext

Structured Stream

Streams Structures Spaces Scenarios Societies

indexing

browsing searching

services

hypertext

Structured Stream

Descriptive Metadata

specification

SpaTemOrg

StraDia

Arch Descriptive Metadata specification

ArchDO

ArchObj

ArchColl

Arch Metadata catalog

ArchDColl ArchDR Minimal ArchDL

A Minimal ArchDL in the 5S Framework

5SGraph5S Archaeology

MetaModelArchDL Expert ArchDL Designer

Structure Sub-model

ETANA-DLUnion Services

Descriptions

HarvestingMapping

SearchingBrowsing

Scenario Sub-model

VN Metadata Format

ETANA-DL Metadata Format

HD Metadata Format

Mapping Tool

Wrapper4VN Wrapper4HD

Inverted Files

Services DB

Index

Index

BrowseService

SearchService

Browse DB

OtherETANA-DL

Services

Web

Interface

XOAI

XOAI

VNCatalog

HDCatalog

UnionCatalog

5SGen

ComponentPool

Browsing…

Computing and Information Technology Interactive Digital Educational Library (CITIDEL)

• Domain: computing / information technology

• Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), …

• Submission & Collection: sub/partner collections www.citidel.org

www.CITIDEL.org

• Led by Virginia Tech, with co-PIs:– Fox (director, DL systems)– Lee (history)– Perez (user interface, Spanish support)– Students: Ryan Richardson, Kate McDevitt,

Jon Pryor, Baoping Zhang

• Partners– College of New Jersey (Knox)– Hofstra (Impagliazzo)– Villanova (Cassel)– Penn State (Giles)

Annotations

OAI Data

Harvester

EDUCATORS

ADMINISTRATORS LEARNERS

Multilingual Searching

Revising Annotating Filtering Browsing Administering

Filtering Profiles User Profiles

Union Metadata

OAI Data

Provider

Remote and Peer Digital Libraries (eg. NSDL -CIS)

PORTALS

SERVICES

REPOSITORIES

Digital library architecture for localand interoperable CITIDEL services

CITIDEL Technology Features•Component architecture (Open Digital Library)

•Re-use and compose re-deployable digital library components.

•Built Using Open Standards & Technologies

•OAI: Used to collect DL Resources and DL Interoperability

•XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …)

•Perl: Component Integration

•ESSEX: Search Engine Functionality

•Very fast, utilizing in-memory processing

•Includes snap-shots for persistence

•Multi-scheming (Aaron Krowne, now at Emory U. Library)

•Integrates multiple classifications / views through maps, closure

•Extensions: clustering, visualization, personalization, …

Cluster Search Results from CITIDEL

Cluster NDLTD-Computing

CITIDEL + PIPE• Adds Interaction Personalization to CITIDEL

•Automatically handles multi-modal conversion to Cell phone, PDA, Etc.

•Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.

Naren Ramakrishnan and Saverio Perugini (U. Dayton)

CITIDEL -> NSDL

• A collection project in the

• National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL

• National Science Digital Library

• www.nsdl.org

• (Next slides courtesy Lee Zia, NSF)

NSDL ProgramTracks

• Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources

• Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty

• Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form

• Targeted (Applied) Research: have immediate impact on one or more of the other three tracks

• Pathways: large efforts across broad ranges of areas or approaches or users

NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup

referenceditems &

collections

referenceditems &

collections

Special Databases

NSDLServicesNSDL

ServicesOther NSDLServices

CI Services

annotation

CI Services

discussion

CI Services

personalization

CI Services

authentication

CI Services

browsing

Core Services:information retrieval

Core Collection-Building Services

harvesting

Core Collection-Building Services

protocols

Core Services:metadata gathering

Portals &ClientsPortals &

ClientsPortals &Clients

Usage Enhancement

Collection Building

User Interfaces

NSDLCollections

NSDLCollections

NSDLCollections

CoreNSDL“Bus”

OCKHAM Library Network (NSDL)

NSDL

OCKHAM

Services

NSDLServices

Teachers LearnersLibrarians

OCKHAMLibrary

Network

LibraryServices

OCKHAM (Ming Luo)

• Simplicity (a la OCCAM’s razor)• Support by Mellon and DLF• Four main ideas:

1. Components2. Lightweight protocols3. Open reference models (e.g., 5S, OAIS)4. Community perspective and involvement

• Funded by NSF in NSDL, with P2P, with Emory, Notre Dame, Oregon State, …

OCKHAM Proposed Services

• Alerting• Browsing• Cataloging• Conversion• OAI – Z39.50• Pathfinding• Registry • (plus others such as from adapted ODL)

A Digital Library Case Study

• Domain: graduate education, research

• Genre:ETDs=electronic theses & dissertations

• Submission: http://etd.vt.edu

• Collection: http://www.theses.org

Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org (supported by Ming Luo)

OCLC SRU Interface => Dr. A.K. Tyagi

ETD Union Search Mirror Site in China (CALIS)(http://ndltd.calis.edu.cn – popular site!)

LOCKSS Extensions:Bing Liu, Xiaoyu Zhang, Ji-Sun Kim• Lots of copies keep stuff safe• Stanford (Vicky Reich)• Initial focus on lower levels, journals• Shift to OAI, esp. for ETDs• Collab with Emory (Martin Halbert)

– NDIIP: AmericanSouth, MetaArchive– Help deploy and adapt, apply in other contexts

• Another registry• Set of publisher manifests (information providers)• Set of storage systems (archival storage)

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

open digital library

OA OA

OA

OA

OA

OA

OA

OA

OA

PMH

PMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

XPMH

Hussein Suleman(Capetown, S. Africa)

Open Digital Library Protocol

Extended OAI-PMH

Protocol for Metadata Harvesting

Open Digital Library Component

Extended OPEN ARCHIVE

OPENARCHIVE

Open Digital Library Components

• Running now– XML-File (data provider from file system)– Search: simple or in-memory (Essex) or

generalized– Union, browse, recent, filter– E-journal/review, Submit, Edit, Annotation– Recommender, Rating; Mirroring (see JCDL’02)– Working with NCSA: from DB, unstructured text

• Others in process– Classification/categorization– Registry (and other connections with web services)

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

Document

1010100101010010101010010101010101010101

ETD-1

1010100101010010101010010101010101010101

Program

1010100101010010101010010101010101010101

ETD-2

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

Image

1010100101010010101010010101010101010101

ETD-3

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

Video

1010100101010010101010010101010101010101

ETD-4

ETD DL for the Networked Digital Library of Theses and Dissertations

(www.ndltd.org)

Search

Filter

Filter

Union

Recent

Browse

PMH

PMH

PMH

ODLRecent

ODLBrowse

ODLUnion

ODLUnion

ODLSearch

ODLUnionPMH

PMH

US

ER

INT

ER

FA

CE

Students and researchers ETD collections

Example Open Digital Library

Open Digital Library Deployments

• NDLTD (www.ndltd.org)• Computer Science Teaching Center

(www.cstc.org)• Computing and Information Technology

Interactive Digital Educational Library (www.citidel.org)

• Open Archives Distributed (NSF, DFG) – enhancements to PhysNet

• OCKHAM• Open to others through DL-in-a-box

Interest-based User Grouping Model

for Collaborative Filtering in Digital Libraries

7th ICADL 2004

Shanghai, P.R. China

Dec. 15, 2004

Edward A. Fox, Seonho KimVirginia Tech, Blacksburg, VA 24061 USA

Some Other Students/Projects

• Wensi Xi: Matrices, reinforcement, clusters (Microsoft)• Paul Mather: mod/sim of large DLs on clusters;

characterization: uses, files (NASA)• Ming Luo: personalization aided by demographics• Ryan Richarson: CLIR with concept maps• Xiaoyan Yu: Stepping Stones and Pathways (NSF,

Fernando Das Neves completed & returned to Argentina)• Baoping Zhang: Physics and classification (NSF, DFG)• Several: TREC with GP• New projects:

– Superimposed information w. PSU (NSF NSDL)– Quality and metasearch and structure w. Emory (IMLS)

• …

Conclusion

• Many DL/IR: areas, projects, students• Theory• Architecture• Modeling and simulation• Systems development and testing to: validate

above, demonstrate innovations• Users, interfaces, visualization, usability

• Special thanks to AOL for 4 years of Fellowships!