oclc online computer library center data mining library collection silos: print books and e-books in...
TRANSCRIPT
OCLC Online Computer Library Center
Data Mining Library Collection Silos: Print Books and E-books in Library
Collections
Lynn Silipigni Connaway Ed O’Neill
Chandra PrabhaBrian Lavoie
Collection AssessmentCollection Assessment
Why assess collections?– Provide data for member libraries for decision-
making• Description of the collection
– Identify specific subject areas» Determine collection age» Rate of growth» Strengths and weakness
• Overlap/gap analysis• Identify last copy• Useful information
– Outside funding– Library collection comparisons– Remote storage decisions– Collection development and management– Identify role of non- ARL libraries
WorldCat as a CollectionWorldCat as a CollectionWorld’s largest bibliographic database– July 1, 2003 = 50 million+ records– 1 billion holdings
Ideal source for data-mining
Characteristics of WorldCat – Age– Subject, using NATC– Holdings by type of library
• ARL• Academic, non-ARL• Public• School• Special
WorldCat as a CollectionWorldCat as a Collection
Use of MARC data elements in WorldCat– Types of materials– Library holdings to determine audience levels
Collection assessment and collection use– Unique titles– Analyze and compare aggregate holdings for
libraries– Identify print books (p-books) and electronic
books (e-books)
WorldCat Holdings by Library TypesWorldCat Holdings by Library Types
528,372
3,491,551
6,921,912
14,741,919
15,674,236
15,968,006
21,468,245
210,018,358
215,263,036
368,450,292
0 50,000,000 100,000,000 150,000,000 200,000,000 250,000,000 300,000,000 350,000,000 400,000,000
Training Symbols
Networks & Processing Centers
Unidentified
State& National Libraries
School Libraries
Gov Libraries (Excludes State &National)
Special Libraries (Other)
Public Libraries
ARL LIBRARIES
Academic Libraries (NON-ARL)
WorldCatNumber of HoldingsWorldCatNumber of Holdings
WorldCatNumber of RecordsWorldCatNumber of Records
WorldCat HoldingsWorldCat Holdings
WorldCat HoldingsWorldCat Holdings
Study ObjectiveStudy ObjectiveDigital materials constitute increasing proportion of library collections
Effective strategies for integrating print and digital materials within a library collection– Eliminate redundancies– Meet user expectations
Data-mining increasingly important to support collection management decisions– WorldCat
• World’s largest bibliographic database• Ideal as source for data-mining
Data-mine WorldCat in order to examine characteristics of p-books and e-books
RationaleRationaleCollection management– Development– Cooperation– Deselection– Preservation
Space allocation and management
Meet user expectations
Services for off-site users
Migration from print to digital
Convenient access– 24/7 access– Desk-top delivery
ScopeScope
WorldCat– July 1, 2003 = 50 million+ records– 1 billion holdings
Digital Items
Books– Print (p-book)– Digital (e-book)
StrategyStrategyIdentify digital items
Identify digital items with at least one other manifestation in WorldCat– FRBRize database
• Work– Distinct intellectual or artistic expression– Cluster works in WorldCat
• Manifestation– Physical embodiment of a work
Identify digital items with p-book equivalents– Assumption
• If digital items have p-book equivalents, then digital items are e-books
– Identify publishers and publication dates
Need to DetermineNeed to Determine
Comparison of p-books and e-books– What is a book?– What is a p-book?– What is an e-book?– What is a digital item?– How do we extend p-book criteria to digital
world?
What is a Digital Item?What is a Digital Item?
Working definition of digital item– Computer file– OR Electronic resource– OR Appropriate 856 field
• Indicates electronic location or access
What is a P-book?What is a P-book?No consensus for definition of a book– Text (type = a) and monograph (bib level = m)
• Broadsides?• Pamphlets?• Government documents?• Children’s books?• Microforms?
– Authoritative Definitions• UNESCO
– Nonperiodical literary publication consisting of > 49 pages, covers excluded
• ANSI– Publications consisting of > 49 pages– Hard covers
• US Postal Service (publication)– Publications > 24 pages
A P-book IS:A P-book IS:Based on UNESCO definition
Working definition of a p-book– Printed on paper (excludes microform)– Language material– Monograph– Physical description– Form of item = regular or large print– Title does not include a GMD– Substantial length (> 49 pages; > 25 to
include juvenile titles)– Excludes manuscripts (dissertations and
theses)
What is an E-book?What is an E-book?Difficult to define e-book– Digital version of p-book (straightforward)– New conceptual views of a book in digital
environment
Assumption– P-book is well-defined– If digital item has manifestation as a p- book,
then digital item must also be a book– If p-book has digital equivalent or vice-versa,
ignore e-book that has no print equivalents
An E-book IS:An E-book IS:
E-Book = Electronic (Digital) + Book
Definition of e-Book:– Digital equivalents of p-books– New conceptual definitions of books in
digital environment
WorldCat Record AnalysisWorldCat Record Analysis
P-book records = 24,048,235 (48% of WC)
Digital item records = 795,630 (15% of WC)– Web sites
• Collections of interlinked, Web-accessible materials residing at a single location on the Internet
– Documents• Various forms of electronic documents• E-books with no p-book equivalents and no minimum page
requirements– Book chapters– Broadsides– Brochures– Pamphlets
– Reprints• E-books with p-book equivalents = 76,375 (1.5% of WC)
WorldCat Record AnalysisWorldCat Record Analysis
Digital item records (continued)– Interactive learning objects
• Computer programs offering self-contained, interactive tutorial or educational experience
– Software• Computer programs for creating and manipulating
information
– Serials• Journals• Proceedings
– Images– Theses– Other (2 records)
• Computer game• Raw data file
Digital Items in WorldCatDigital Items in WorldCat
Documents 32%
Web Sites 35%
Reprints 7%
Interactive Learning Objects 7%
Software 7%
Other 1%
Theses 2%Images 3%
Serials 6%
Publication Dates of Digital Items With P-Book Equivalents in WorldCat
Publication Dates of Digital Items With P-Book Equivalents in WorldCat
0
2,000
4,000
6,000
8,000
10,000
12,000
Pre-1800 1800-1849 1850-1899 1900-1924 1925-1949 1950-1959 1960-1969 1970-1979 1980-1989 1990-1994 1995-1999 2000-
Publication Date
Nu
mb
er
of
Do
cu
me
nts
Publishers of Digital Items With P-Book Equivalents in WorldCat
Publishers of Digital Items With P-Book Equivalents in WorldCatApproximately 15,000 unique publishers
Approximately 150 publishers with > 25 records
Top 10 publishers– Institute of Electrical and Electronic Engineers (IEEE)– National Bureau of Economic Research– US Government Printing Office– Springer– Inter-University Consortium for Political and Social Research– PowerKids Press– University of Virginia Library– MIT Press– Microsoft– Broderbund Software and Books
Discussion of AnalysisDiscussion of AnalysisSmall number of – E-books with p-book equivalents– Publishers with > 25 records for e-books with p-book
equivalents
Recent publication dates for e-books with p-book equivalents
More Web sites than documents or reprints
Difficult to identify and categorize digital items– Inconsistent cataloging policies and practices for digital
items– Inconsistent definitions for types of digital items
Future ResearchFuture ResearchEstablish accepted criteria for defining an e-book independent of p-books
Identify and compare type of library holdings and NATC subjects for p-books and e-books– Identify electronic collection silos
Continue to collect these data to compare for trends
Identify types of content/materials that are better suited for either print or digital environment