a multi-tiered architecture for distributed data collection and centralized data delivery stacy...
TRANSCRIPT
![Page 1: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/1.jpg)
A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data
Delivery
Stacy Kowalczyk and James HallidayApril 28, 2008
![Page 2: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/2.jpg)
Project OverviewIN Harmony is • An IMLS funded grant• Awarded in Fall 2004• To be competed in Fall 2008• A partnership of
• Indiana University Digital Library Program• Indiana University Lilly Library• Indiana State Library• Indiana State Museum• Indiana Historical Society
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 3: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/3.jpg)
Project Goals1. To provide a model for fostering collaborative digital
library development by partnering with institutions with complementary collections;
2. To digitize a portion of the sheet music from these collections and offer access to these materials free of charge on the web;
3. To bring these materials and their attendant metadata together on a single web site, offering both federated searching of the entire collection and searching of one or more selected collections;
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 4: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/4.jpg)
Deliverables• Tools to
• Process the images• Capture metadata• Provide search and display functions
• 10,000 pieces of sheet music scanned and cataloged
• 4,000 Indiana University Lilly Library• 2,000 Indiana State Library• 2,000 Indiana State Museum• 2,000 Indiana Historical Society
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 5: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/5.jpg)
Cataloging and Imaging Workflow Goals
• Data integrity
•Quality of the scans•Quality of the metadata• Accuracy of the links between page images• Accuracy of the links between metadata and
images
• Simplicity of use• Balance of flexibility and constraints
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 6: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/6.jpg)
Cataloging and Imaging Use Cases
1. Catalog first
2. Scanning first
3. Metadata created in another system and imported into IN Harmony
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 7: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/7.jpg)
Digitizing Quality Control
• 2 phased Quality Control Process• Automated QC process verifies:
• All TIFF tags of every digital file• TIFF must be uncompressed• Files names • Embedded profile appropriate to its bit depth • Consistency of pixel dimensions within a score• Appropriate resolution
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 8: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/8.jpg)
Digitizing Quality Control (2)
• Manual QC – at 100% pixel display, verify:
• Correct page orientation and order• Correct color balance • Sharp and in-focus scan• No digital artifacts
• When all QC is passed, derivative files are created
• Large and small jpgs for screen delivery• PDF sized for 8.5 x 11 printing
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 9: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/9.jpg)
Digitizing Quality Control Software
![Page 10: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/10.jpg)
![Page 11: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/11.jpg)
![Page 12: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/12.jpg)
![Page 13: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/13.jpg)
Designing the metadata model
• User studies • Work with the partners• Define fields• Write cataloging guidelines with partner input• Representation in MODS
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 14: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/14.jpg)
Types of fields
• Title elements• Name elements• Publication elements• Subject elements• Identification elements• Note elements• Cover information
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 15: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/15.jpg)
Metadata Collection Tool
![Page 16: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/16.jpg)
![Page 17: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/17.jpg)
![Page 18: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/18.jpg)
Public Search and Discovery System
Demo
April 21, 2023Customize footer: View menu/Header and Footer
![Page 19: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/19.jpg)
ARCHITECTURE OVERVIEW
JIM HALLIDAY
April 21, 2023Customize footer: View menu/Header and Footer
![Page 20: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/20.jpg)
IN Harmony Technical Overview
Fedora Web Browser
SRU and http
Mass StorageSystem
OracleCataloging
ClientQuality Control
Scanner
Authentication Service
JavaSwing
MODs Export
FTP
Perl WebApplication
![Page 21: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/21.jpg)
Getting Data Into IN Harmony
2 primary data sources• Cataloging client• Image QC/upload application
Other data sources• XML data exported from other cataloging
systems• Score images exported from older
systems
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 22: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/22.jpg)
![Page 23: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/23.jpg)
Image QC/upload application1. User scans scores and uploads to IN Harmony
server2. User accesses Perl-based web application to initiate
automated quality control3. A second user proceeds with manual QC, then uses
web application to signal that manual QC is finished4. The application moves and backs up the files,
creates derivatives, and alerts both Fedora and the internal database that the process is complete
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 24: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/24.jpg)
IN Harmony Derivatives• Three sizes of JPG’s produced per page
• Full (1200px high)• Screen (600px high)• Thumb (200px high)
• Multi-page, playable PDF• Approx. 1MB for an average score
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 25: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/25.jpg)
![Page 26: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/26.jpg)
IN Harmony cataloging client• Standalone Java Swing based client
• Connects to Oracle database and outputs MODS for Fedora ingestion
• Implemented as a client-server application via web services using Axis
• Specialized UI components (such as ‘smart’ combo boxes) assist with quick, correct data entry
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 27: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/27.jpg)
![Page 28: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/28.jpg)
Internal IN Harmony database• Oracle database stores record and user
data in our own internal format
• Communicates with upload/QC application, and cataloging client
• Cataloging client and internal scripts can output to MODS format for ingestion into Fedora
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 29: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/29.jpg)
![Page 30: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/30.jpg)
IN Harmony authentication• CAS (IU’s Central Authentication Service) is
used to authenticate all users• Non-IU users must create IU Guest Accounts
to authenticate• All account/password maintenance in user’s
control
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 31: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/31.jpg)
![Page 32: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/32.jpg)
Fedora and IN Harmony• Fedora used as a single storage and
infrastructure solution for Digital Library Program projects as IU
• Data (score images and metadata) ingested into Fedora and referenced as METS objects
• Master images sent to IU’s mass storage system
• Derivatives stored internally• Objects indexed using Lucene for SRU-based
searchingApril 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 33: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/33.jpg)
Fedora Object Model Collection
Sheet music
Copy
Page
![Page 34: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/34.jpg)
![Page 35: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/35.jpg)
IN Harmony end-user interface- Java Struts based web application- Offers searching, browsing, and record display- Each partner institution is offered a personalized view
of their data only
Interaction with Fedora
- Application sends CQL queries to Fedora and retrieves MODS data which is transformed via XSLT
- PURLs (persistent URL’s) are used to access image derivatives
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 36: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/36.jpg)
METS Navigator• METS Navigator is used to page through
scores online• Uses METS structmap to facilitate navigation• Allows views of multiple sizes of images• Released by IU as open source – see
http://metsnavigator.sourceforge.net
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 37: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/37.jpg)
IN Harmony Technical Overview
Fedora Web Browser
SRU and http
Mass StorageSystem
OracleCataloging
ClientQuality Control
Scanner
Authentication Service
JavaSwing
MODs Export
FTP
Perl WebApplication
![Page 38: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/38.jpg)
IN Harmony Links
• IN Harmony Public Interface • IN Harmony Project Information • Cataloging Tool Release date – June 2008
April 28, 2008IN Harmony – DLP Spring Forum 2008
![Page 39: A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008](https://reader030.vdocuments.mx/reader030/viewer/2022020417/5697bf9b1a28abf838c92ddc/html5/thumbnails/39.jpg)
Questions?
April 28, 2008IN Harmony – DLP Spring Forum 2008