federating repositories of scientific literature

36
Federating Repositories of Scientific Literature The Interspace Prototype (1997- 2000) Digital Libraries Initiative (1994-1998) Worm Community System (1990-1993) Telesophy System (1984-1989) www.canis.uiuc.edu

Upload: fallon

Post on 04-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Federating Repositories of Scientific Literature. www.canis.uiuc.edu. The Interspace Prototype (1997-2000) Digital Libraries Initiative (1994-1998) Worm Community System (1990-1993) Telesophy System (1984-1989). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Federating Repositories of Scientific Literature

Federating Repositoriesof Scientific Literature

The Interspace Prototype (1997-2000)

Digital Libraries Initiative (1994-1998)

Worm Community System (1990-1993)

Telesophy System (1984-1989)

www.canis.uiuc.edu

Page 2: Federating Repositories of Scientific Literature

Federating Repositoriesof Scientific Literature

The University of Illinois Digital Libraries Initiative (DLI)Project Status & Retrospective

Bruce R. Schatz [email protected]

http://dli.grainger.uiuc.edu

AAAS-98, Digital Libraries SessionPhiladelphia, February 1998

Page 3: Federating Repositories of Scientific Literature

1960

1970

1980

1990

2000

2010

Grand Visions

Text Search

Document Search

Concept Search

StructureSyntax Semantics

Evolution of Information Retrieval across the Net

from: Bruce R. Schatz, “Information Retrieval in Digital Libraries: Bringing Search to the Net” cover article in Science, vol 275, Jan 17, 1997 special issue on Bioinformatics

Page 4: Federating Repositories of Scientific Literature

Illinois DLI Status

• Production Testbed based in a Real Library– Document Search based on Structure– SGML Publisher Stream deployed at U of Illinois

• Technology Research for Scalable Federation– Concept Search based on Semantics– Statistical Indexes across subjects and media

Page 5: Federating Repositories of Scientific Literature

Production Testbed Status

• Based in major Engineering Library• Production Stream - in testbed before on shelves

• Full-text SGML -- Federated Structure Search• 5 publishers, 55 journals, 40,000 articles

• Web version campus rollout October 1997• integrated within library information services

Page 6: Federating Repositories of Scientific Literature

Production Testbed Evaluation

• 700 users, steadily increasing to max 1500• used in intro Computer Science classes

• developers and evaluators work closely• needs assessment and usability studies

• careful multi-modal usage evaluation• session observations and transaction logs

Page 7: Federating Repositories of Scientific Literature

Primary Partners

• journal/magazine Publishers: – American Institute of Physics (AIP)

– American Physical Society (APS)

– American Astronomical Society (AAS)

– American Society of Civil Engineers (ASCE)

– American Society of Mechanical Engineers (ASME)

– American Society of Agricultural Engineers (ASAE)

– American Institute of Aeronautics & Astronautics (AIAA)

– Institute of Electrical and Electronics Engineers (IEEE)

– Institution of Electrical Engineers (IEE)

– IEEE Computer Society (IEEE-CS)

• testbed: SoftQuad, OpenText

• infrastructure: Hewlett-Packard, Microsoft

Page 8: Federating Repositories of Scientific Literature

DeLIver Search Interface

Page 9: Federating Repositories of Scientific Literature

DeLIver Search Results

Page 10: Federating Repositories of Scientific Literature

(Full Text Retrieval)

Page 11: Federating Repositories of Scientific Literature

Result of “Figure Caption Search”

Page 12: Federating Repositories of Scientific Literature

Dynamic Linking in Bibliography

Page 13: Federating Repositories of Scientific Literature

Testbed Difficulties

• Original plan was to modify Mosaic for search– Web became commercial -- we lost control of developers

• Plan to use standard BRS as fulltext backend– needed to use SGML specific OpenText search engine

• good-quality SGML simply not available– we had to train every publisher; nothing was ready

• SGML interactive display not journal quality– physics requires equations -- hard to display well

• Custom software hard to deploy widely– Web widespread but too lowend for professional search

Page 14: Federating Repositories of Scientific Literature

Testbed Successes

• Willing to build custom encoding procedures– so succeed with SGML where Elsevier and OCLC failed

• Canonical encoding for structure tags– so can federate across publishers and journals

• Willing to build custom software for Search– so able to do multiple views not single stream like Web

• Production repositories for real Publishers– became R&D arm of major scientific publishers

• Changing the nature of libraries with research– research prototype becomes standard service

Page 15: Federating Repositories of Scientific Literature

Technology Transfer

• Illinois DLI considered R&D arm of publishers– broad spectrum of major publishers in scientific literature

– successful annual partner’s workshop plus high-level visits

• Technology transferred to Publisher partners– contract with AIP to clone testbed software & processing

– arrangements with ASCE for a second cloning

• Testbed Continuance by University Library– industrial partners program between Library & Publishers

– company formed to provide software and service

Page 16: Federating Repositories of Scientific Literature

Technology Research

• Scalable Semantics becoming feasible– statistical clustering proves useful interactively– concept spaces and category maps

• Semantic indexes for large collections– 400K Inspec (1995)– 4M Compendex (1996)

• Simulation of Community Repositories– 1000 collections across all of engineering– testbed for vocabulary switching (federation)

Page 17: Federating Repositories of Scientific Literature

Vocabulary Switching

• Grand Challenge of Digital Libraries– semantic interoperability across subject domains– vocabulary switching to suggest across domains

• Generating 1000 community repositories– 600 categories across engineering (38 top-level) – 150 categories across EE, CS, physics– 3M raw abstracts, about 10M in community spaces

• large-scale supercomputer simulation– 7 days of dedicated computation (10 days overall)– have space navigation; need space intersection

Page 18: Federating Repositories of Scientific Literature
Page 19: Federating Repositories of Scientific Literature
Page 20: Federating Repositories of Scientific Literature
Page 21: Federating Repositories of Scientific Literature
Page 22: Federating Repositories of Scientific Literature
Page 23: Federating Repositories of Scientific Literature
Page 24: Federating Repositories of Scientific Literature
Page 25: Federating Repositories of Scientific Literature
Page 26: Federating Repositories of Scientific Literature
Page 27: Federating Repositories of Scientific Literature
Page 28: Federating Repositories of Scientific Literature

Multimedia Federation

• Semantic Indexing within Media– Text, Image, Number

• Semantic Interoperability across Media– Spatial Data (GIS) dataset intersection

• Multi-site DLI Collaboration– U Illinois: systems and supercomputers– U Arizona: algorithms and experiments– UC Santa Barbara: collections and metadata

Page 29: Federating Repositories of Scientific Literature

Semantic Analysis of Multimedia

• Collections of Objects containing Units– Text: community repository (topic proximity)

document abstracts containing noun phrases– Image: aerial photograph (spatial proximity)

feature regions containing texture tiles

• Units are media-dependent (statistical parsers)– Text: phrase segmentation (nouns on word parts of speech)– Image: texture segmentation (orientation on pixel densities)

• Indexes are media-independent (statistical clusters)– Concept: co-occurrence similarity of units within objects– Category: self-organizing maps of objects within collections

Page 30: Federating Repositories of Scientific Literature

Media Interoperability Experiment

• Feature regions containing texture tiles in aerial photos– 1M regions in 5K photos around southern California (GIS)

• text concept space and category map in geoscience– 10M phrases in 500K abstracts from Georef and Petroleum Abstracts

• image concept space and category map in aerial photos– tile similarity space and visual thesaurus maps (10M tiles)

• numeric satellite sensor data– 1M NASA AVHRR temperature records, 2M GNIS feature names

• spatial gazetteer as bridge image<=>text<=>number– images are labeled by GNIS gazetteer (feature names for text search)

Page 31: Federating Repositories of Scientific Literature
Page 32: Federating Repositories of Scientific Literature
Page 33: Federating Repositories of Scientific Literature

Federated Search

• Multiple Indexes in Distributed Repositories– text search: SGML for full-text articles (Testbed)

bibliographic abstracts for full coverage (INSPEC)

– term suggestion: thesaurus for taxonomy (INSPEC)

concept spaces for term coverage (SGML)

• Multiple View User Interface Client– uniform displays for multiple indexes

– drag-and-drop between display views to mix-and-match

– uniform search across multiple repositories

• Multiple Protocol Stateful Gateway– single query stream analog to single user interface

– will handle distributed repositories for federation, e.g. AAS

– Opentext (socket), term-suggest (SQL), Ovid/DRA (Z39.50)

Page 34: Federating Repositories of Scientific Literature

IODyne Engineering Search Example

Page 35: Federating Repositories of Scientific Literature

Building a new Community

starting the field of Digital Libraries

• IEEE Computer DLI special issue May 1996 • Computer DLI retrospective planned for 1999

• Allerton workshops on DL Sociology• edited book planned on DL Evaluation

• DLI National Coordination effort• Illinois DLI retrospective conference (Mar 98)

Page 36: Federating Repositories of Scientific Literature

The 21st Century: Analysis

• Beyond Search to Analysis• Cross-Correlating Information from many

sources across the Net• The Net solves problems

• Every community has its own special library• Every community and every person does

indexing !!

• The Internet evolves into the Interspace