fourth annual midwest contentdm users group meeting
TRANSCRIPT
Fourth Annual Midwest CONTENTdm Users Group Meeting
Fourth Annual Midwest CONTENTdm Users Group Meeting
CONTENTdm DirectionsCONTENTdm Directions
Geri IngramOCLC Digital Collection ServicesUser Services Manager
Purdue
West Lafayette, Indiana
March 19, 2009
Information--looking for grounding in Web 2.0Information--looking for grounding in Web 2.0
No wonder we talk about information “space”
Porous boundaries:
no strictly “personally interesting” info
no completely “professional” info
If I can’t find it, I didn’t park it right. (tag it, catalog it, file it)?
If I can’t find it, I can’t get it, or use it (share it).
Striking our rootsStriking our roots
Ranganathan 1892-1972, Bangalore, India—known as the father of library science in India. A mathematician, epistemologist and librarian
Five laws of library science:
1. Books are for use.
2. Every reader his [or her] book.
3. Every book its reader.
4. Save the time of the User.
5. The library is a growing organism.
Saving the time of the user, at 21st century speed
Saving the time of the user, at 21st century speed
Most of our information transfer is digital—flying at the speed of light
• Where did I see (hear) that?
• What was the context?
• How does it relate—because it ALL relates!
• Who needs to know this?
• How can I share it?
Ranganathan’s bookRanganathan’s book
•Media, education, public service—
•We’re all in the same business, helping people find, get, and use information—THEIR information
•WHEN they need it, regardless of where it’s found, or how it’s requested.
Ranganathan’s book… might be an image
Every reader his or her book.Every reader his or her book.
• So, too, every digitized item is special to someone;
• The aggregate is special to most;
• And its integration with other complementary content is valuable to all
What’s important to you today?What’s important to you today?
• The state of the software we’re here to discuss
Have we been listening?
• What is the significance of Release 5?
Are we still listening?
• The future of the software
Will we keep listening?
From its inception CONTENT(dm) has been a response to searchers’ needsFrom its inception CONTENT(dm) has been a response to searchers’ needs
From its roots in bio-medical engineering
• To Libraries’ special collections
• To integrated digital collections for research and teaching
• To a globally created and accessed multi-media repository
CONTENTdm developers have listened to users
• We’re still listening
• We’re putting in processes to insure that we listen
• Well, and for the long haul
4.2Dec 2006
4.1Mar 2006
4.0Jun 20053.6
Feb 2004
3.5Jul 2003
3.7 & 3.8 Jul 2004
Version 3.4 Jan 2003
OAI support for harvesting of metadata
Multi-Site Server
Version 3.5 Jul / Sep 2003
WorldCat link for metadata harvesting
Customizable Web Templates
Version 3.6 Feb 2004
Batch Add wizard
Advanced Search has a new interface and added functionality
User Support Center introduced
Version 3.7 & 3.8 Jul 2004
Zoom and pan toolbar for viewing images; Compare button
Tab-delimited Text Import
Collection and item-level security
Web browser-based editor
Version 3.8
JPEG2000
Version 4.0 Jun / Oct 2005
EAD support
PHP-based API which supports broader customizations and interoperability
User Interface based on the new PHP API
OCR Extension with ABBYY FineReader
Version 4.1 Mar 2006
Redirects for obsolete URLS
PHP 5.0 supported
Version 4.2 Dec 2006
Multiple Compound Object Wizard
OCR Extension upgraded to ABBYY FineReader Version 8
Ability to highlight, view and clip individual newspaper articles
Search and browse by date range
Version 4.3 Oct 2007
Easier to manage and access PDF documents, especially multiple-page PDFs
Improved controlled vocabulary
OCLC Connexion digital import
3.4Jan 2003
CONTENTdm upgrades for users
3.1May 2001 3.2
Feb 2002
3.3Jul 2002
3.5.1Sept 2003
4.0.1Oct 2005
4.3Oct 2007
We have a long history of enhancing the software in response to our users’ requirements
We have a long history of enhancing the software in response to our users’ requirements
2003: Librarians said, “What about interoperability? What about providing federated searching?
And I need to brand my collections for disparate user groups!
2004: “We need to move more data, faster; I need to build big text collections, fast!.”
and “My users want more search functionality, and the ability to manipulate the display images
And, “My users want to see high-resolution images but they don’t want to install plug-ins! I don’t want to get tied to proprietary formats, but I don’t want to expose my valuable source images either.”
And in 2005, you added:And in 2005, you added:
“I need to protect some of my data from access by some of my users”
“I need to be able to maintain my metadata through a web browser.
“We need to provide our finding aids online—what do you have?”
And, “We want more control over our interface!”
In 2006, librarians said:In 2006, librarians said:
“Loading a book is an improvement, but I need to load whole libraries!”
And “My users are interested in newspapers—give me some desktop OCR and a good searching/highlight mechanism”
And, “When are you going to fix those darn DATES?”
By 2007 we knew we had to find a way to search the PDF faster, and completely
By 2007 we knew we had to find a way to search the PDF faster, and completely
You said, “My researchers need to find every instance of a phrase across thousands of documents, and they don’t want to wait!”
And, “help my catalogers mainstream the work—we want to leverage our existing MARC records”
By 2008, we finally got back to those EAD finding aids
…And to Unicode
…And the need for faceted searching, with relevancy ranking.
We are listeningWe are listening
As your digital library programs have grown, so have your needs for tools
Stewarding the materials through the entire life-cycle
Helping “every book (to find) its reader”
OCLC Digital Collection Services Trends in library digital collections OCLC Digital Collection Services Trends in library digital collections
Aggregation and integration of digital content is important
• Surface/expose collections in common and familiar discovery tools - search engines and aggregators - make collections radically accessible
• Universal search - the ability to search digital collections and other electronic resources through the same user interface rather than specialized sites
We are listening—how CONTENTdm fits into the bigger pictureWe are listening—how CONTENTdm fits into the bigger picture
Build a digital repository within the OCLC cooperative
A rich set of digital collections created by libraries, museums, and other cultural heritage organizations
Linked through the global discovery of WorldCat
• As of March 3rd, 1.4 million records (titles)
• 1.4 billion holdings
OCLC Digital Collection ServicesOCLC Digital Collection Services
Solutions to help you to create, manage, share and preserve your digital collections
Digitization
CONTENTdm
Hosting Services
Web Harvesting
WorldCat Harvesting
Digital Archive
CONTENTdm Version 5 Released December 2008CONTENTdm Version 5 Released December 2008
For more information about Version 5: www.oclc.org/news/releases/20093.htm
CONTENTdm 5 CONTENTdm 5
•Milestone release
• Released December 17th, 2008
•Significant changes throughout the software
•Sets foundation for future enhancements that will continue to further CONTENTdm’s use as the leading digital repository platform
State of the release—we are listening!State of the release—we are listening!
• HUGE diversity of environments and uses
• First service pack released February (5.0.1)
• Second service pack to be released by tomorrow (5.0.2)
fixes some serious and many just irritating problems!
• There may be a third service pack in April.
• There will be a version release 5.1 in May.
• Schedule—dual systems—migration
• Waiver of EULA restriction
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
1. Unicode Support
• Full support of Unicode for importing, storing, displaying and searching Unicode languages
• OCR language support expanded – 184 languages
• Supports the creation and exposure of digital collections in any language
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
2. Find Search Engine
• Find search engine integrated into CONTENTdm software
• More robust capacity and the ability to offer additional search features
• Relevancy sorting
• Faceted searching
• Spelling suggestion
• Unicode searching
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
2. Find Search Engine
• Leveraging existing OCLC technology by integrating the Find search engine
• Search in any language
• More tools to help end-users find what they are looking for, faster
• Better end-user experience
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
3. Controlled Vocabularies
• Integration with OCLC Terminologies Service
• Providing nine new thesauri for CONTENTdm users
• Adds efficiency to collection building by providing pre-loaded thesauri for cataloging
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
4. Reports
• More robust, scalable reporting module integrated into software
• Provides expanded reports:• Views by collection and item
• Top searches within CONTENTdm
• Web statistics by month, day, hour
• Top URLs, errors, referring sites, IP addresses, authenticated users, browsers, and countries
• Access to log files
• Export CONTENTdm reports
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
5. Flexible Workflows
• Added more options for approving and indexing items
• New batch and subset handling of pending items
• One-click approve & index on demand
• Scheduling options for approve and index
• Background processing
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
6. Registration
• New registration process added during installation
• One-click sends server information to OCLC
• Registered servers called once a month to gather data on usage
• FEEDBACK!
• User Support Center is being completely overhauled now!
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
7. Project Client
• New client application replaces old version
• New programming language
• New, more intuitive interface
• Unicode support
• More robust
• And many other enhancements
CONTENTdm 5 – Project ClientCONTENTdm 5 – Project Client
Some notes from the developer--Project Client GoalsSome notes from the developer--Project Client Goals
Update Technology
Increase throughput
Improve editing capability
Expand and improve support for different data types
Update TechnologyUpdate Technology
.NET 3.0
• Build on top of newest Microsoft technology to build modern application
HTTP file transfer
• Do away with old technologies FTP and SFTP
• Transfer small packets
• Easy to configure
• Robust
Why? Increase Throughputover the Acquisition StationWhy? Increase Throughputover the Acquisition Station
Acquisition Station
• Not optimized for multi processors
• Editing one at a time
• Long wait time for imports and uploads
• Maximum of 5000 items
• Pull down 100 items at a time for editing
Parallel ProcessingParallel Processing
• Decrease time waiting for operations to complete
• Maximize use of multi processor machines
• Network transfers asynchronously
• Upload asynchronously
Parallel Processing Multi Processor SystemsParallel Processing Multi Processor Systems
Project Client is optimized to support this architecture
Utilizes .NET threading technologies to scale well between 1 and n CPUs
Uses thread to move tasks to background and allow parallel work
• Upload Manager
• Background data checking
Template CreatorTemplate Creator
We added a hierarchal structure with the ability to turn on and off templates based on data types
• General
• Images
• TIF
• JPEG
• JPEG 2000
• Compound Object
• Video
• Audio
• URL
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
8. File Transfer
• Replaced FTP with custom HTTP transfer protocol
• Uploading items occurs in the background
• Continue working while items are uploaded
• Pause process and resume later
CONTENTdm 5 – File TransferCONTENTdm 5 – File Transfer
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
9. EAD
• New import process and display options
• Custom metadata mapping
• Full text searching
• Search term highlighting within the EAD
• Multiple display views
• XML Web service
CONTENTdm 5 – EADCONTENTdm 5 – EAD
CONTENTdm 5 – EADCONTENTdm 5 – EAD
CONTENTdm 5 Top TenCONTENTdm 5 Top Ten
10. Capacity
• Increased capacity throughout application
• Supports more collections, items for batch processing, and metadata fields
• Expand metadata schemas to incorporate preservation metadata or more custom fields
• Faster batch processing and conversion from existing databases
What’s up next? What’s up next?
• The CONTENTdm enhancements most requested now:
• Modern, easy-to-customize viewers for all media
• Integration of Web 2.0 tools like tagging
• A modern, interactive User Support Center for the community
• A better metadata harvest to WorldCat
Web customization and viewersWeb customization and viewers
Goals:
Lessen the labor required to upgrade
Provide a more coherent viewer experience
Provide smooth integration with players and viewers—leverage and improve upon the Web 2.0 features of worldcat.org (listmaking, tagging, etc.)
Approach:
• Completely overhaul the web interfaces
User Support CenterUser Support Center
New expert staff
Re-design for an interactive community experience
Surveyed stakeholders
Researching 3rd party platforms
for a Fall launch
USC re-design: goalsUSC re-design: goals
Make the USC a compelling user community space
Create a ‘one-stop-shop’ for all CONTENTdm known-issues, documentation, extensions, etc.
Increase user satisfaction
Make the USC extensible to support info on all the Digital Collection Services products and services
Provide an interactive space for User Groups to meet, plan, record, and share!
Improve exposure of digital items on the Web:Introducing the Digital Collection Gateway
Improve exposure of digital items on the Web:Introducing the Digital Collection Gateway
• Improve access & presence for digital collections
• Synchronize non-MARC metadata with WorldCat
• Provide self-service tools to drive synchronization
• Available for CONTENTdm collections with CONTENTdm 5.1--May 2009
You design the WorldCat.org display of your metadata
You can adjust where source metadatafields appear in the WorldCat.org display
The map from source metadata to WorldCat.org display for this collection is now changed for all records being synchronized with WorldCat
Digital Collection Gateway
Digital Collection GatewayDigital Collection Gateway
Builds upon fundamental work with XML Web Services
Available with CONTENTdm 5.1--May 2009 for CONTENTdm collections
• Will extend to support other digital repositories in second phase
The bigger picture: your users want digital content from many sources (wherever!)
The bigger picture: your users want digital content from many sources (wherever!)
Metadata Content management server
The WebEnd users retrieve the information they need
Next step for DCG is to enable other metadata to WorldCatNext step for DCG is to enable other metadata to WorldCat
• We know you need many interoperable tools to build and manage your organizations’ repositories
• CONTENTdm is the foundation for over 1,000 digital libraries worldwide—over 500 licenses shared collaboratively
• CONTENTdm is also fundamental for the OCLC Digital Repository
Every item is special to someone, the aggregate is special to most and the integration with other content is valuable to all
Thank you!Thank you!