fedora at northwestern university bill parod academic technologies northwestern university...
TRANSCRIPT
FEDORAFEDORAatat
Northwestern UniversityNorthwestern University
Bill Parod
Academic Technologies
Northwestern University
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
General BackgroundGeneral Background
Academic TechnologiesFaculty projectsLibrary partnershipsInstitutional partnershipsDiverse clienteleDiverse content“One-off” projects
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Current FEDORA ProjectsCurrent FEDORA Projects
Block Museum of ArtThe Last Expression Art CollectionIntroduction to Asian Art HistoryBBC Spoken Word ArchiveParis Map CollectionEncyclopedia of ChicagoWordHoard Text Analysis Project
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images
Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data
Diversity of ContentDiversity of Content
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Wavelet Image Servers Vector Image Processors Streaming Media Servers RDBMS XML Databases XSLT Processors GIS Servlet Engines
Diversity of SystemsDiversity of Systems
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images
Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data
Abstract Image ModelsAbstract Image Models
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
4 Image Behavior Classes4 Image Behavior Classes
Core behavior– getCoverpage– getThumbnail
Basic image (UVa)– getThumbnail– getMedium– getHigh– getVeryHigh
Addressable image – getRegion(rgn,size)– getViewer
Layered image– getRegion(,,layers)– getViewer(layers)
Geographic image– getRegion(,,, coords)– getViewer(, coords)
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
4 Image Content Models4 Image Content Models
Core behavior– XML Metadata– HTML XSLT script– Thumbnail Image
Basic image (UVa)– Thumbnail jpeg– Medium Res jpeg– High Res jpeg– Very High Res jpg
Addressable image – Image metadata– Viewer XSLT script
Layered image– Layer metadata
Geographic image– World file for
projection
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Images Simple Zoom Layers
Core getThumbnail getThumbnail getThumbnail
getCoverpage getCoverpage getCoverpage
Basic getThumbnail getThumbnail getThumbnail
get Medium getMedium getMedium
getHigh getHigh getHigh
getVeryHigh getVeryHigh getVeryHigh
Addr getRegion getRegion
getViewer getViewer
Layered getRegion
getViewer
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
BDEF Interface DefinitionBDEF Interface Definition
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
BMECH DescriptionBMECH Description
Method bindings to implementationHTTP URL templates to image servletAccepts image server metadata streamAccepts specific user parametersProvides implementation flexibilityCurrently using TrueSpectra/Scene7 image
server
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
getCoverPage() for simple image – Block Museum Collection
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
getCoverPage() for zoomable image – History of Asian Art class
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Ingesting Images Ingesting Images
Imaging person deposits master TIFF images in WebDAV enabled file stor
Image server configured with “virtual path” to WebDAV stor for master image tiff.
TIFF master is converted to FlashPix and cached in image server
Image server handles request for FEDORA dissemination
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Metadata in Excel METS FEDORA
Tiffs in Xythos TrueSpectraImage Server
DisseminationRequests
• Catalog in Excel converted to METS for FEDORA ingest
• Tiff Masters deposited in collection’s Xythos directory
• Access to Xythos directory enabled for TrueSpectra virtual paths
• METS/FEDORA record includes link to TrueSpectra image
• Access to image is through FEDORA image behaviors
Department Academic Technologies
Data flow
Requests
Users
Image Workflow: FEDORA – TrueSpectra – XythosImage Workflow: FEDORA – TrueSpectra – Xythos
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Auto-ingester FEDORAFiles in Xythos
TrueSpectraStreaming ServerSearch
DisseminationRequests
Faculty or Support Academic Technologies
Data flow
Requests
Users
Physical Collection Management Scenario: Physical Collection Management Scenario: FEDORA – Content Service – Xythos IntegrationFEDORA – Content Service – Xythos Integration
Metadata update
• FEDORA collection object attached to Xythos directory
• Xythos notifies collection object of changes in the directory
• File added – collection creates new member item
• File updated – item accepts new version for file stream
• File removed – item is set dormant in FEDORA
• Metadata added/updated online or batch
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Basic Collection ObjectBasic Collection Object
Collection behavior– getSearchForm– performSearch()– getItem()– getItems()– addItem()– deleteItem()– reindex()– displayItem()
Core behavior– getCoverpage– getThumbnail
Block Museum of Art The Last Expression Vesalius Figures BBC Audio History of Asian Art
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Collection Content ModelCollection Content Model
Search FormXSLT for search resultsIndexHeader/footer XML for result streamMember PIDs
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Search ImplementationSearch Implementation
FEDORA METS files currently indexed offline Plan to integrate update notification and indexing Search Engine
– Have 3 implementations: FEDORA native search Sgrep OpenText
Investigating SRW/CQL Search results passed through XSLT Easy to provide search capability to collections
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
FEDORA DisseminationRequests
External ServicesCache data
Data Request
Dissemination
FEDORA – External ServiceFEDORA – External Service
Image Server
Search EngineBMECH
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Virtual CollectionsVirtual Collections
Collection maintenance– Topical galleries
Ad-hoc or dynamic collections– For classes...– personal collections…– special exhibits…
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Database IntegrationDatabase Integration
SQL/XQuery for object “data streams”SQL/XQuery for object disseminations
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Encyclopedia of ChicagoEncyclopedia of Chicago
In active developmentMetadata continually updated by research
staff in Microsoft AccessNew content continually added to MS
Access and file storVaried entry typesAll have dynamic “See Also”s
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
SQL DatastreamsSQL Datastreams
“See Also” and “Content” datastreams– Cocoon urls that perform SQL queries on
dynamic research data and convert to XML.– Dynamic updates during development– When project finished will consider moving to
more robust database or “freeze” streams in the repository as “managed”.
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
FEDORA DisseminationRequests
External ServicesCache data
Data Request
Dissemination
FEDORA – External ServiceFEDORA – External Service
RDBMS
Search EngineBMECH
Image Server
Data stream
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
WordHoard Text AnalysisWordHoard Text Analysis
Large TEI XML Etext corporaWord level grammatical and frequency dataText requests via XqueryWord level lexical queries via SQL
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Basic Text BehaviorBasic Text Behavior
BMECH Backed by eXist database
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Viewer ObjectViewer Object
Presentation uncoupled from data object
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Example Book ModelExample Book Model
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
TEXT TOC ServiceTEXT TOC Service
Request for TOC keyed by text PIDTOC XML requested from textTOC DOM cached in serviceUser requests with “open nodes” parameterPruned DOM styled with XSLT from
Viewer content model
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images
Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data
Abstract Text ModelAbstract Text Model
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Text MethodsText Methods
Structured text (UVa)– getHeading– getTOC(level)– getChunk(idref) – getPage(idref)
Core behavior– getCoverpage– getThumbnail
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Digital video Spoken word Literary works Encyclopedias Lexical data Census data Event data
Art collections Wall murals Photographs Historical maps GIS maps Newspapers Book page images
Time-based Media ModelTime-based Media Model
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Time-based Media BehaviorsTime-based Media Behaviors
Core behavior– getCoverpage– getThumbnail
Time-based media– Play– playSection()
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Behaviors by TypeBehaviors by TypeImage Map A/V Book News EText
Core
Image
Hi-Res
Layered
Geo
Time
Text
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Next StepsNext Steps
Implement more object types– Event, video, tabular data
Transactions– Ad-hoc groupings of repository objects– Asset management, Annotation– Access control for user editing
Interoperability– Search protocols and repository interactions
Consider application models Specialized clients
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Specialized ClientsSpecialized Clients
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
Viewer ObjectViewer Object
CNI April 15, 2004 Northwestern UniversityBill Parod
Academic Technologies
SummarySummary
Code reuse through object abstractionFlexible implementation bindingComprehensible APIs for applicationsStable APIs for Content reuse