document management (aka ‘digital libraries’) the greenstone group: professor ian witten...
TRANSCRIPT
![Page 1: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/1.jpg)
Document management (aka ‘digital libraries’)
The Greenstone Group:
Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve Jones, Te Taka Keegan, Annika Hinze
![Page 2: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/2.jpg)
Our work includes…
• Document management
• Content management• Metadata management• Multimedia documents• Alerting and event
notification support
• OCR-ing services• Document & collection
visualization• User needs analysis• Text mining• Automatic metadata
extraction
![Page 3: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/3.jpg)
Greenstone software
• ‘digital library’ construction, use, and maintenance software
• Developed at Waikato (www.greenstone.org)• Open Source• Widely used internationally (UNESCO, FAO,
Texas A&M Uni, Kyrgyz Republic, …)
Digital library:A collection of digital objects (text, video, audio) along with methods for access and retrieval, [user]and for selection, organisation, and maintenance[librarian]
![Page 4: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/4.jpg)
Greenstone software features “Library” = set of separate collections
“Collection” = set of separate documents Multigigabyte collections
Hierarchical document model Multimedia picture, voice, music, video collections
Multi-language documents Unicode throughout
Multi-language interfaces French, Chinese, Arabic …
Web browser or CD-ROM
Searching full-text and fielded, ranked or boolean
Browsing hierarchical indexes created from metadata
Metadata Dublin core + collection-specific extensions
Plugins different document types and metadata specifications
Classifiers create browsing indexes (collection editor decides)
Compression techniques throughout uses MG
Distributed collections coming soon, with Corba
Open-source software free, extensible
Collections
Documents
Access
Importing
Distributing
![Page 5: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/5.jpg)
Greenstone supports: multilanguage documents
![Page 6: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/6.jpg)
Greenstone supports: hierarchically
structured documents
A book
![Page 7: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/7.jpg)
Greenstone supports: collection design, maintenance
Designing a collection with the Gatherer
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 8: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/8.jpg)
Greenstone supports: a wide (and growing) set of file formats
• DOC• PDF• XLS• LaTeX• Refer• MARC• …• highly extensible through ‘plugin’ mechanism
![Page 9: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/9.jpg)
Mobile document access
• handheld information access• browsing methods for varying screen sizes• studies on search behaviour (on- and off-line)• support for non-text documents (FunkyZoom
views of maps, images)
![Page 10: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/10.jpg)
Browsing and exploration: hierarchical phrase index
What’s in this collection?Is it any good?What coverage for topic X?My query returned too much/little, what now?
![Page 11: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/11.jpg)
Recent and proposed projects
• Making documents mobile: moving between large online collections and a PDA
• Text mining: extracting quality metadata from legacy documents
• User needs analysis: what sort of documents do a given set of users require, and how can the collection be managed?
• Visualization: making it easy to ‘see’ what’s in a collection, and supporting effective browsing
![Page 12: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/12.jpg)
Recent and proposed projects• Multi-language collections: tailoring a document
collection interface and interaction mechanisms to the language of its users
• Alerting services: bringing potentially useful documents to the user’s attention, without overwhelming them
• Supporting unusual users: collections for the physically disabled, illiterate or semi-literate, children, …
• Audio and image collections: novel browsing and searching mechanism
![Page 13: Document management (aka ‘digital libraries’) The Greenstone Group: Professor Ian Witten (leader); David Bainbridge, Dave Nichols, S.J. Cunningham, Steve](https://reader031.vdocuments.mx/reader031/viewer/2022032206/56649ed95503460f94be77e7/html5/thumbnails/13.jpg)
Recent and proposed projects
• Storage and searching: developed highly efficient techniques for storing, indexing, and searching text documents; implemented in Greenstone, but portable to other document management software
• Usability analysis: how easy is it to use your current document collection? How can access be improved?
• And a host of wacky and cool things: collaging document collections, music retrieval systems, ‘aerial’ views of documents, …