oclc online computer library center data mining library collection silos: print books and e-books in...

27
OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill Chandra Prabha Brian Lavoie

Upload: samantha-taylor

Post on 24-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

OCLC Online Computer Library Center

Data Mining Library Collection Silos: Print Books and E-books in Library

Collections

Lynn Silipigni Connaway Ed O’Neill

Chandra PrabhaBrian Lavoie

Page 2: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

Collection AssessmentCollection Assessment

Why assess collections?– Provide data for member libraries for decision-

making• Description of the collection

– Identify specific subject areas» Determine collection age» Rate of growth» Strengths and weakness

• Overlap/gap analysis• Identify last copy• Useful information

– Outside funding– Library collection comparisons– Remote storage decisions– Collection development and management– Identify role of non- ARL libraries

Page 3: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

WorldCat as a CollectionWorldCat as a CollectionWorld’s largest bibliographic database– July 1, 2003 = 50 million+ records– 1 billion holdings

Ideal source for data-mining

Characteristics of WorldCat – Age– Subject, using NATC– Holdings by type of library

• ARL• Academic, non-ARL• Public• School• Special

Page 4: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

WorldCat as a CollectionWorldCat as a Collection

Use of MARC data elements in WorldCat– Types of materials– Library holdings to determine audience levels

Collection assessment and collection use– Unique titles– Analyze and compare aggregate holdings for

libraries– Identify print books (p-books) and electronic

books (e-books)

Page 5: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

WorldCat Holdings by Library TypesWorldCat Holdings by Library Types

528,372

3,491,551

6,921,912

14,741,919

15,674,236

15,968,006

21,468,245

210,018,358

215,263,036

368,450,292

0 50,000,000 100,000,000 150,000,000 200,000,000 250,000,000 300,000,000 350,000,000 400,000,000

Training Symbols

Networks & Processing Centers

Unidentified

State& National Libraries

School Libraries

Gov Libraries (Excludes State &National)

Special Libraries (Other)

Public Libraries

ARL LIBRARIES

Academic Libraries (NON-ARL)

Page 6: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

WorldCatNumber of HoldingsWorldCatNumber of Holdings

Page 7: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

WorldCatNumber of RecordsWorldCatNumber of Records

Page 8: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

WorldCat HoldingsWorldCat Holdings

Page 9: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

WorldCat HoldingsWorldCat Holdings

Page 10: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

Study ObjectiveStudy ObjectiveDigital materials constitute increasing proportion of library collections

Effective strategies for integrating print and digital materials within a library collection– Eliminate redundancies– Meet user expectations

Data-mining increasingly important to support collection management decisions– WorldCat

• World’s largest bibliographic database• Ideal as source for data-mining

Data-mine WorldCat in order to examine characteristics of p-books and e-books

Page 11: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

RationaleRationaleCollection management– Development– Cooperation– Deselection– Preservation

Space allocation and management

Meet user expectations

Services for off-site users

Migration from print to digital

Convenient access– 24/7 access– Desk-top delivery

Page 12: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

ScopeScope

WorldCat– July 1, 2003 = 50 million+ records– 1 billion holdings

Digital Items

Books– Print (p-book)– Digital (e-book)

Page 13: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

StrategyStrategyIdentify digital items

Identify digital items with at least one other manifestation in WorldCat– FRBRize database

• Work– Distinct intellectual or artistic expression– Cluster works in WorldCat

• Manifestation– Physical embodiment of a work

Identify digital items with p-book equivalents– Assumption

• If digital items have p-book equivalents, then digital items are e-books

– Identify publishers and publication dates

Page 14: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

Need to DetermineNeed to Determine

Comparison of p-books and e-books– What is a book?– What is a p-book?– What is an e-book?– What is a digital item?– How do we extend p-book criteria to digital

world?

Page 15: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

What is a Digital Item?What is a Digital Item?

Working definition of digital item– Computer file– OR Electronic resource– OR Appropriate 856 field

• Indicates electronic location or access

Page 16: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

What is a P-book?What is a P-book?No consensus for definition of a book– Text (type = a) and monograph (bib level = m)

• Broadsides?• Pamphlets?• Government documents?• Children’s books?• Microforms?

– Authoritative Definitions• UNESCO

– Nonperiodical literary publication consisting of > 49 pages, covers excluded

• ANSI– Publications consisting of > 49 pages– Hard covers

• US Postal Service (publication)– Publications > 24 pages

Page 17: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

A P-book IS:A P-book IS:Based on UNESCO definition

Working definition of a p-book– Printed on paper (excludes microform)– Language material– Monograph– Physical description– Form of item = regular or large print– Title does not include a GMD– Substantial length (> 49 pages; > 25 to

include juvenile titles)– Excludes manuscripts (dissertations and

theses)

Page 18: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

What is an E-book?What is an E-book?Difficult to define e-book– Digital version of p-book (straightforward)– New conceptual views of a book in digital

environment

Assumption– P-book is well-defined– If digital item has manifestation as a p- book,

then digital item must also be a book– If p-book has digital equivalent or vice-versa,

ignore e-book that has no print equivalents

Page 19: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

An E-book IS:An E-book IS:

E-Book = Electronic (Digital) + Book

Definition of e-Book:– Digital equivalents of p-books– New conceptual definitions of books in

digital environment

Page 20: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

WorldCat Record AnalysisWorldCat Record Analysis

P-book records = 24,048,235 (48% of WC)

Digital item records = 795,630 (15% of WC)– Web sites

• Collections of interlinked, Web-accessible materials residing at a single location on the Internet

– Documents• Various forms of electronic documents• E-books with no p-book equivalents and no minimum page

requirements– Book chapters– Broadsides– Brochures– Pamphlets

– Reprints• E-books with p-book equivalents = 76,375 (1.5% of WC)

Page 21: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

WorldCat Record AnalysisWorldCat Record Analysis

Digital item records (continued)– Interactive learning objects

• Computer programs offering self-contained, interactive tutorial or educational experience

–  Software• Computer programs for creating and manipulating

information

– Serials• Journals• Proceedings

– Images– Theses– Other (2 records)

• Computer game• Raw data file

Page 22: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

Digital Items in WorldCatDigital Items in WorldCat

Documents 32%

Web Sites 35%

Reprints 7%

Interactive Learning Objects 7%

Software 7%

Other 1%

Theses 2%Images 3%

Serials 6%

Page 23: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

Publication Dates of Digital Items With P-Book Equivalents in WorldCat

Publication Dates of Digital Items With P-Book Equivalents in WorldCat

0

2,000

4,000

6,000

8,000

10,000

12,000

Pre-1800 1800-1849 1850-1899 1900-1924 1925-1949 1950-1959 1960-1969 1970-1979 1980-1989 1990-1994 1995-1999 2000-

Publication Date

Nu

mb

er

of

Do

cu

me

nts

Page 24: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

Publishers of Digital Items With P-Book Equivalents in WorldCat

Publishers of Digital Items With P-Book Equivalents in WorldCatApproximately 15,000 unique publishers

Approximately 150 publishers with > 25 records

Top 10 publishers– Institute of Electrical and Electronic Engineers (IEEE)– National Bureau of Economic Research– US Government Printing Office– Springer– Inter-University Consortium for Political and Social Research– PowerKids Press– University of Virginia Library– MIT Press– Microsoft– Broderbund Software and Books

Page 25: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

Discussion of AnalysisDiscussion of AnalysisSmall number of – E-books with p-book equivalents– Publishers with > 25 records for e-books with p-book

equivalents

Recent publication dates for e-books with p-book equivalents

More Web sites than documents or reprints

Difficult to identify and categorize digital items– Inconsistent cataloging policies and practices for digital

items– Inconsistent definitions for types of digital items

Page 26: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

Future ResearchFuture ResearchEstablish accepted criteria for defining an e-book independent of p-books

Identify and compare type of library holdings and NATC subjects for p-books and e-books– Identify electronic collection silos

Continue to collect these data to compare for trends

Identify types of content/materials that are better suited for either print or digital environment

Page 27: OCLC Online Computer Library Center Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’Neill

OCLC Online Computer Library Center

Questions and Discussion

[email protected]

[email protected]