aol search speaker series virginia tech’s digital library research laboratory dec. 20, 2004 -- aol...
TRANSCRIPT
AOL Search Speaker Series
Virginia Tech’s Digital LibraryResearch Laboratory
Dec. 20, 2004 -- AOL HQEdward A. Fox, [email protected]
Virginia Tech, Blacksburg, VA 24061 USAhttp://fox.cs.vt.edu/talks/2004/
http://fox.cs.vt.edu/cv.htm
Acknowledgements (Selected)
• Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; ITR-0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS
Acknowledgements: Faculty, Staff
• Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …
Acknowledgements: Students
• Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Saverio Perugini, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Xiaoyan Yu, Baoping Zhang, Qinwei Zhu, …
Rao Shen’s Preliminary Exam:Hypothesis and Research Questions
• The 5S framework provides effective solutions to DL integration.
– Formally define the DL integration problem?– Guide integration of domain focused DLs?
• How to formally model such domain specific DLs?• How to integrate formally defined DL models into a
union DL model?• How to use the union DL model to help design and
implement high quality integrated DLs?
– Assess the integration?
Related Work
DL interoperability approach
Intermediary-based mapping-based
Consists of
mediator wrapper agent
use
two architectures
federation Union Archiving
used in
Consists of
hybrid mapper composite mapper
use
schema mapping
use
SemInt
has an example
LSD
has an example
Interrelated with
DL interoperability approach
Intermediary-based mapping-based
Consists of
mediator wrapper agent
use
two architectures
federation Union Archiving
used in
Consists of
hybrid mapper composite mapper
use
schema mapping
use
Interrelated with
GA
trained by
DL integration formalization
based on
Formal Definition of DL Integration
• DLi=(Ri, DMi, Servi, Soci), 1 i n
– Ri is a network accessible repository
– DMi is a set of metadata catalogs for all collections
– Servi is a set of services
– Soci is a society
• UnionRep• UnionCat• UnionServices• UnionSociety
Formal Definition of DL Integration (Cont.)
• DL integration problem definition:
Given n individual libraries, integrate the n DLs to create a UnionDL.
Demonstration: ETANA-DL (NSF ITR w. CWRU)
feathers.dlib.vt.edu
Repository1
DL1
Repository2
Union Catalog
Union Repository
Catalog1 Catalog2
Searching
Union DL DL2
archaeologists
Society
General Public
Society
ArchaeologistsGeneral Public
Union Society
ServiceBrowsingService
Union Service
Harvesting, Mapping,Searching, Browsing,
Clustering, Visualization
Architecture of a Union DL
Union Catalog Integration
VN MetadataFormat
Global MetadataFormat
VNCatalog
HDCatalog
Union Catalog
MappingTool
Wrapper
MappingTool
Wrapper
HD MetadataFormat
Virtual Nimrin(VN)
Halif DigMaster(HD)
Union ArchDL
CitiViz:A Visual User Interface to the
CITIDEL System
ECDL 2004, Bath, England, September 2004
Nithiwat Kampanya, Rao Shen, Seonho Kim, Chris North, and
Edward A. [email protected] http://fox.cs.vt.edu
Digital Object
RepositoryCollection Minimal DL
Metadata Catalog
Descriptive Metadata
Specification
A Minimal DL in the 5S Framework
Structural Metadata
Specification
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
Streams Structures Spaces Scenarios Societies
indexing
browsing searching
services
hypertext
Structured Stream
Descriptive Metadata
specification
SpaTemOrg
StraDia
Arch Descriptive Metadata specification
ArchDO
ArchObj
ArchColl
Arch Metadata catalog
ArchDColl ArchDR Minimal ArchDL
A Minimal ArchDL in the 5S Framework
5SGraph5S Archaeology
MetaModelArchDL Expert ArchDL Designer
Structure Sub-model
ETANA-DLUnion Services
Descriptions
HarvestingMapping
SearchingBrowsing
…
Scenario Sub-model
VN Metadata Format
ETANA-DL Metadata Format
HD Metadata Format
Mapping Tool
Wrapper4VN Wrapper4HD
Inverted Files
Services DB
Index
Index
BrowseService
SearchService
Browse DB
OtherETANA-DL
Services
Web
Interface
XOAI
XOAI
VNCatalog
HDCatalog
UnionCatalog
5SGen
ComponentPool
Browsing…
Computing and Information Technology Interactive Digital Educational Library (CITIDEL)
• Domain: computing / information technology
• Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), …
• Submission & Collection: sub/partner collections www.citidel.org
www.CITIDEL.org
• Led by Virginia Tech, with co-PIs:– Fox (director, DL systems)– Lee (history)– Perez (user interface, Spanish support)– Students: Ryan Richardson, Kate McDevitt,
Jon Pryor, Baoping Zhang
• Partners– College of New Jersey (Knox)– Hofstra (Impagliazzo)– Villanova (Cassel)– Penn State (Giles)
Annotations
OAI Data
Harvester
EDUCATORS
ADMINISTRATORS LEARNERS
Multilingual Searching
Revising Annotating Filtering Browsing Administering
Filtering Profiles User Profiles
Union Metadata
OAI Data
Provider
Remote and Peer Digital Libraries (eg. NSDL -CIS)
PORTALS
SERVICES
REPOSITORIES
Digital library architecture for localand interoperable CITIDEL services
CITIDEL Technology Features•Component architecture (Open Digital Library)
•Re-use and compose re-deployable digital library components.
•Built Using Open Standards & Technologies
•OAI: Used to collect DL Resources and DL Interoperability
•XSL and XML: Interface rendering with multi-lingual community based translation of screens and content (Spanish, …)
•Perl: Component Integration
•ESSEX: Search Engine Functionality
•Very fast, utilizing in-memory processing
•Includes snap-shots for persistence
•Multi-scheming (Aaron Krowne, now at Emory U. Library)
•Integrates multiple classifications / views through maps, closure
•Extensions: clustering, visualization, personalization, …
CITIDEL + PIPE• Adds Interaction Personalization to CITIDEL
•Automatically handles multi-modal conversion to Cell phone, PDA, Etc.
•Can be adopted to any digital data set, only requires XML file of content with hierarchy maintained.
Naren Ramakrishnan and Saverio Perugini (U. Dayton)
CITIDEL -> NSDL
• A collection project in the
• National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL
• National Science Digital Library
• www.nsdl.org
• (Next slides courtesy Lee Zia, NSF)
NSDL ProgramTracks
• Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources
• Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty
• Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form
• Targeted (Applied) Research: have immediate impact on one or more of the other three tracks
• Pathways: large efforts across broad ranges of areas or approaches or users
NSDL Information ArchitectureEssentially as developed by the Technical Infrastructure Workgroup
referenceditems &
collections
referenceditems &
collections
Special Databases
NSDLServicesNSDL
ServicesOther NSDLServices
CI Services
annotation
CI Services
discussion
CI Services
personalization
CI Services
authentication
CI Services
browsing
Core Services:information retrieval
Core Collection-Building Services
harvesting
Core Collection-Building Services
protocols
Core Services:metadata gathering
Portals &ClientsPortals &
ClientsPortals &Clients
Usage Enhancement
Collection Building
User Interfaces
NSDLCollections
NSDLCollections
NSDLCollections
CoreNSDL“Bus”
OCKHAM Library Network (NSDL)
NSDL
OCKHAM
Services
NSDLServices
Teachers LearnersLibrarians
OCKHAMLibrary
Network
LibraryServices
OCKHAM (Ming Luo)
• Simplicity (a la OCCAM’s razor)• Support by Mellon and DLF• Four main ideas:
1. Components2. Lightweight protocols3. Open reference models (e.g., 5S, OAIS)4. Community perspective and involvement
• Funded by NSF in NSDL, with P2P, with Emory, Notre Dame, Oregon State, …
OCKHAM Proposed Services
• Alerting• Browsing• Cataloging• Conversion• OAI – Z39.50• Pathfinding• Registry • (plus others such as from adapted ODL)
A Digital Library Case Study
• Domain: graduate education, research
• Genre:ETDs=electronic theses & dissertations
• Submission: http://etd.vt.edu
• Collection: http://www.theses.org
Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org (supported by Ming Luo)
LOCKSS Extensions:Bing Liu, Xiaoyu Zhang, Ji-Sun Kim• Lots of copies keep stuff safe• Stanford (Vicky Reich)• Initial focus on lower levels, journals• Shift to OAI, esp. for ETDs• Collab with Emory (Martin Halbert)
– NDIIP: AmericanSouth, MetaArchive– Help deploy and adapt, apply in other contexts
• Another registry• Set of publisher manifests (information providers)• Set of storage systems (archival storage)
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
open digital library
OA OA
OA
OA
OA
OA
OA
OA
OA
PMH
PMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
XPMH
Hussein Suleman(Capetown, S. Africa)
Open Digital Library Components
• Running now– XML-File (data provider from file system)– Search: simple or in-memory (Essex) or
generalized– Union, browse, recent, filter– E-journal/review, Submit, Edit, Annotation– Recommender, Rating; Mirroring (see JCDL’02)– Working with NCSA: from DB, unstructured text
• Others in process– Classification/categorization– Registry (and other connections with web services)
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
Document
1010100101010010101010010101010101010101
ETD-1
1010100101010010101010010101010101010101
Program
1010100101010010101010010101010101010101
ETD-2
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
Image
1010100101010010101010010101010101010101
ETD-3
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
Video
1010100101010010101010010101010101010101
ETD-4
ETD DL for the Networked Digital Library of Theses and Dissertations
(www.ndltd.org)
Search
Filter
Filter
Union
Recent
Browse
PMH
PMH
PMH
ODLRecent
ODLBrowse
ODLUnion
ODLUnion
ODLSearch
ODLUnionPMH
PMH
US
ER
INT
ER
FA
CE
Students and researchers ETD collections
Example Open Digital Library
Open Digital Library Deployments
• NDLTD (www.ndltd.org)• Computer Science Teaching Center
(www.cstc.org)• Computing and Information Technology
Interactive Digital Educational Library (www.citidel.org)
• Open Archives Distributed (NSF, DFG) – enhancements to PhysNet
• OCKHAM• Open to others through DL-in-a-box
Interest-based User Grouping Model
for Collaborative Filtering in Digital Libraries
7th ICADL 2004
Shanghai, P.R. China
Dec. 15, 2004
Edward A. Fox, Seonho KimVirginia Tech, Blacksburg, VA 24061 USA
Some Other Students/Projects
• Wensi Xi: Matrices, reinforcement, clusters (Microsoft)• Paul Mather: mod/sim of large DLs on clusters;
characterization: uses, files (NASA)• Ming Luo: personalization aided by demographics• Ryan Richarson: CLIR with concept maps• Xiaoyan Yu: Stepping Stones and Pathways (NSF,
Fernando Das Neves completed & returned to Argentina)• Baoping Zhang: Physics and classification (NSF, DFG)• Several: TREC with GP• New projects:
– Superimposed information w. PSU (NSF NSDL)– Quality and metasearch and structure w. Emory (IMLS)
• …