building collections using greenstone tod a. olson sr. programmer/analyst digital library...
Post on 20-Dec-2015
215 views
TRANSCRIPT
![Page 1: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/1.jpg)
Building CollectionsUsing Greenstone
Tod A. Olson <[email protected]>
Sr. Programmer/Analyst Digital Library Development Center
University of Chicago Library
http://www.lib.uchicago.edu/dldc/talks/2003/dlf-greenstone/
![Page 2: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/2.jpg)
Greenstone
New Zealand Digital Library Projectat the University of Waikato• In cooperation with UNESCO, Human Info NGO
International, every continentExamples:• Academic
– Digitization projects– Classes on digital libraries
• Non-academic– UNESCO humanitarian documentation
![Page 3: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/3.jpg)
Greenstone features
• Works with existing documents– Imports several formats
• Searching: full text and metadata– Dublin Core, custom metadata
• Browse• Structured documents
– Indexing, access
• Extensible & customizable• OpenSource software (GPL)
![Page 4: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/4.jpg)
![Page 5: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/5.jpg)
![Page 6: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/6.jpg)
![Page 7: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/7.jpg)
Greenstone Architecture
Receptionist
Collection Server Collection Server
DB & Indexes
Redrawn from Witten & Bainbridge, How to Build a Digital Library, p. 356
Protocol
Collection
Import
DB & Indexes
Collection
Import
DB & Indexes
Collection
Import
Receptionist
![Page 8: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/8.jpg)
Greenstone Architecture
Receptionist• Provides user interface• Accept user input• Send to appropriate
collection server• Accept results• Dynamic page
generation
Collection Server• Handle collection
content• Search and filter
information• Return results• multiple collections
![Page 9: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/9.jpg)
DB &Indexes
HTML
PDF Import BuildGSAF
???
Building Collections
![Page 10: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/10.jpg)
Building collections
• Create a collection framework– or work with an old collection
• Select documents
• Import documents– Converts to internal XML format (GSAF)
• Build collection– creates search indexes and browse listings
![Page 11: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/11.jpg)
<Section><Description>
<Metadata name=“Title” value=“…”><Content>
[Text, images, links, etc.]<Section>
<Description><Metadata name=“Title” …>
<Content>…<Section>…
<Section>…<Section>…
GSAF: internal XML format
![Page 12: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/12.jpg)
GSAF: internal XML format
Section:• Description
– Metadata fields
• Content– Text,internal markup, images
• Section– No limit in number or depth
Hierarchical documentsSections nest, tree structure
![Page 13: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/13.jpg)
Config file: collect.cfg
Collection-specific configuration file, collect.cfg, specifies:
• file types to import • Indexes and browse lists
– Document or section level– paragraph (text index only)
• display of results and browse listings • document displays
![Page 14: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/14.jpg)
Chopin Early Editions
Over 400 early edition Chopin scores1830’s to 1880’s
Target audience: music scholars & musicians. On web, page-turnable JPEG images.
Online in March 2003Currently 372 scores in online collectionUsage:
Nearly100 hits per day, > 30% of use is international.
![Page 15: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/15.jpg)
![Page 16: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/16.jpg)
![Page 17: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/17.jpg)
![Page 18: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/18.jpg)
![Page 19: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/19.jpg)
![Page 20: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/20.jpg)
![Page 21: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/21.jpg)
![Page 22: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/22.jpg)
Catalogrecords
ScannedImages
Structuralmetadata
METS &MODS
XSLT GreenstoneArchiveFormat
GreenstoneDig. LibrarySoftware
Humanprocessing
XML-based automated processing
Build overview
![Page 23: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/23.jpg)
"chopin","108","001","","1","""chopin","108","002","","1","""chopin","108","003","1","1","Nocturne, no.15""chopin","108","004","2","1","""chopin","108","005","3","1",""
Structural and other metadata
![Page 24: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/24.jpg)
Catalogrecords
ScannedImages
Structuralmetadata
METS &MODS
XSLT GreenstoneArchiveFormat
GreenstoneDig. LibrarySoftware
Humanprocessing
XML-based automated processing
Build overview
![Page 25: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/25.jpg)
dmdSecMODS
fileSecURL: page1.jpgURL: page2.jpg
structMapdiv DMDID=1
div FILEID=1div FILEID=2
Catalog record(MARC)
Scanned images(JPEG)
Structuralmetadata
METS & MODS
![Page 26: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/26.jpg)
METS & MODS
Program uses structural metadata to:• Generate structMap• Generate image URLs for fileSec
– Images stored by naming convention
• Structural md carries catalog record no.• Extract MARC from catalog• crosswalk to MODS• Embed in dmdSec
![Page 27: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/27.jpg)
GSAF
• XML format for internal storage• Hierarchical document structure
– Nested sections: e.g. part 1, chapt. 2
• METS to GSAF via XSLT• Natural mapping from METS to GSAF
– Map structural hierarchy
– Follow links• Descriptive metadata
• File content
![Page 28: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/28.jpg)
dmdSecMODS: Title, …
fileSecpage1.jpgpage2.jpg
structMapdiv: Score
div: Page 1div: Page 2
SectionDescription
Metadata: Title, …Content:
Title, …Section
Content: Page 1 page1.jpg
SectionContent: Page 2
page2.jpg
METS to GSAF
![Page 29: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/29.jpg)
dmdSecMODS: Title, …
fileSecpage1.jpgpage2.jpg
structMapdiv: Score
div: Page 1div: Page 2
SectionDescription
Metadata: Title, …Content:
Title, …Section
Content: Page 1 page1.jpg
SectionContent: Page 2
page2.jpg
METS to GSAF
![Page 30: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/30.jpg)
dmdSecMODS: Title, …
fileSecpage1.jpgpage2.jpg
structMapdiv: Score
div: Page 1div: Page 2
SectionDescription
Metadata: Title, …Content:
Title, …Section
Content: Page 1 page1.jpg
SectionContent: Page 2
page2.jpg
METS to GSAF
![Page 31: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/31.jpg)
METS to GSAF
• Walk structural metadata to create the tree of <Section> elements
• Descriptive metadata:– <Description>
• Crosswalk to desired metadata names
– <Content>:• Format metadata desired for display
• File data– <Content>:
• Inline text, link to images, etc.
![Page 32: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/32.jpg)
Customizing Chopin collection
• Focus on navigation– Metadata for custom access
• E.g. genre, dedicatee not in MARC/AACR2• Can support with METS, MODS, Greenstone
– Custom document navigation• Separate description from scores• Custom page navigation
– Improves usability
• Branding in next phase
![Page 33: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/33.jpg)
Comments on Chopin Early Editions
• Data created by staff using familiar tools– Structural md created in desktop application
• Catalog records a luxury• Catalog is DB of record
– Project IDs in 909– POIs point into Greenstone
• METS/MODS assembled by program– Expect to repurpose METS for other applications
• Customization: navigation, not branding– Faster to bring up collection, get user reaction
![Page 34: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/34.jpg)
Greenstone benefits for Chopin
• Robust, mature system• Recovered time in project
– Fast to bring up– UI out of the box– Dynamic page generation– Incremental customization
• XML compliant– Natural mapping from METS to GSAF
![Page 35: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/35.jpg)
Future work: Chopin
• Add DjVu image format• Repurpose METS for other applications
– OAI
• Standardize new digitization production flow– Project was first for METS, MODS, GS, & 6 depts.
– Standardize collection of structural metadata
– Plug in descriptive metadata as appropriate• Store archival descriptive metadata in METS object
• Repurpose via XSLT for delivery
![Page 36: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/36.jpg)
Other custom UI examples
• Lehigh Digital Bridges– Extensive changes to look
• Washington Research Libraries Consortium (WRLC)– Custom page banner– Popup page turner in Perl– GS as component of DL suite
![Page 37: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/37.jpg)
![Page 38: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/38.jpg)
![Page 39: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/39.jpg)
![Page 40: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/40.jpg)
![Page 41: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/41.jpg)
![Page 42: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/42.jpg)
Ongoing work: Greenstone
• Greenstone Librarian Interface (GLI)
• Greenstone 3
![Page 43: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/43.jpg)
Greenstone Librarian Interface (GLI)
• Collection management– Informed by work at
GS sites– Assist collection
designer– Support all phases of
collection build process
– Do not specify workflow
• Java-based GUI tool– Formerly called the
“Gatherer”
• 2 yrs in development• In beta outside of lab
– Bangalore, other sites
– in current distribution
![Page 44: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/44.jpg)
Greenstone 3
GS2 mature, 5+ yrs., wide deployment– Constraints: support legacy systems– Other technologies have matured: Java, XML
GS3: rewrite in Java, XML, XSLT• Distributed architecture, SOAP• METS as internal format
– Group assembled for Greenstone METS profile(s)
• OAI support planned• 1 year in dev; alpha testing in lab
![Page 45: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/45.jpg)
Conclusion
• Positive experiences
• Good direction for development
• Strong user community
• Proven in real digital library projects
![Page 46: Building Collections Using Greenstone Tod A. Olson Sr. Programmer/Analyst Digital Library Development Center University of Chicago Library](https://reader030.vdocuments.mx/reader030/viewer/2022032800/56649d485503460f94a2382b/html5/thumbnails/46.jpg)
Links & Further Information
Chopin Early Editions: http://chopin.lib.uchicago.edu/
Greenstone: http://www.greenstone.org/Downloads, documentation, examples
New Zealand Digital Library Project: http://www.nzdl.org/UNESCO & related collections, many demos
Witten & Bainbridge. How to Build a Digital Library. Morgan Kaufman, 2003.