university of north texas libraries building search systems for digital library collections mark e....
TRANSCRIPT
![Page 1: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/1.jpg)
University of North Texas Libraries
Building Search Systems for Digital Library Collections
Mark E. Phillips
Texas Conference on Digital Libraries
May 31, 2007, Austin Texas
![Page 2: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/2.jpg)
University of North Texas Libraries
University of North Texas Libraries - Digital Initiatives
Library Digital Collections = 31000+ Digital Objects
• 3 “Systems” – Congressional Research Service Archive
• 9,500+ CRS Reports– Portal to Texas History
• 20,000+ records – 115,205 files– UNT Libraries “Digital Collections”
• 1,800+ records – 131,481 files
• Digital Object Types– Images = 18,282– Physical Objects = 1,019– Texts = 11,668– Websites = 46– Sound Records = 20
![Page 3: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/3.jpg)
University of North Texas Libraries
Infrastructure• UNT Libraries Digital Library Infrastructure
– Highly customized installation of IndexData’s Keystone Digital Library System
– OAIS based system– Digital objects housed as xml files on filesystem– One xml file per digital object– Supports simple, complex and link records– Custom workflow for batch ingest– Manages web presentable files and descriptive and
preservation metadata– Digital masters stored in separate system
![Page 4: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/4.jpg)
University of North Texas Libraries
Search 1.0
• Keystone supplied search– Zebra retrieval engine– 1 index per “system”– Highly customizable search system– Vendor supplied search interface and
functionality
![Page 5: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/5.jpg)
University of North Texas Libraries
Search 1.0 - Issues
• Difficult configuration
• Issues with large xml file retrieval (10MB+ xml files)
• Search grammar not functioning correctly
• Relevance ranking was “magic”
• No custom searching
• Only searching at the digital object level
![Page 6: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/6.jpg)
University of North Texas Libraries
Search 1.5
• MySQL database for page level searching– In Document Searching (IDS)– Two levels of granularity (Zebra=object and
MySQL=page)– Easy customization– More documentation on relevance ranking– Logical search grammars
![Page 7: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/7.jpg)
University of North Texas Libraries
Search 1.5 – Issues
• Different search grammars Zebra vs. MySQL fulltext
• Scaling issues
• Search Performance
• System Resources
![Page 8: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/8.jpg)
University of North Texas Libraries
Search System Criteria• Customizable relevance ranking• Sorting • Simple search syntax
– Fielded Searching– Term Modifiers
• Wildcard Searches• Fuzzy Searches• Proximity Searches• Range Searches
– Boolean Operators– Grouping
• Caching• Implemented as a web-service
![Page 9: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/9.jpg)
University of North Texas Libraries
Search 2.0
• Solr is an open source enterprise search server based on the Lucene Java search library.
• XML/HTTP based• Hit highlighting• Faceted search• Caching• Replication• Web administration interface.
![Page 10: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/10.jpg)
University of North Texas Libraries
Current Architecture
Query
Digital CollectionsServer
Digital ObjectIndex
Page Index
Solr Solr
Spelling Suggestions
Results Page
![Page 11: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/11.jpg)
University of North Texas Libraries
![Page 12: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/12.jpg)
University of North Texas Libraries
![Page 13: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/13.jpg)
University of North Texas Libraries
![Page 14: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/14.jpg)
University of North Texas Libraries
![Page 15: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/15.jpg)
University of North Texas Libraries
![Page 16: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/16.jpg)
University of North Texas Libraries
Customizable Relevance
• Combine Full-text AND descriptive metadata– Positive Boost to Title – (+20)– Positive Boost to Subject – (+15)– Positive Boost to Creator – (+14)– Positive Boost to Metadata overall – (+5)– Full-text = Neutral boost
![Page 17: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/17.jpg)
University of North Texas Libraries
Better results
• Helps to overcome IDF’s effect on results
• Results order more logically
• Takes advantage of both metadata and full-text
• User defined relevance ranking?
![Page 18: University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May](https://reader035.vdocuments.mx/reader035/viewer/2022081603/56649ea85503460f94bab3bb/html5/thumbnails/18.jpg)
University of North Texas Libraries
Questions?