managing collections in the networked environment: new analytic approaches

43
Managing Collections in the Networked Environment: New Analytic Approaches OCLC Research Webinar 9 September 2010 Constance Malpas, OCLC Research Zack Lane, Columbia University Helen Look, University of Michigan Jacob Nadal, University of California, Los Angeles

Upload: mora

Post on 05-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Managing Collections in the Networked Environment: New Analytic Approaches. OCLC Research Webinar 9 September 2010. Constance Malpas, OCLC Research Zack Lane, Columbia University Helen Look, University of Michigan Jacob Nadal, University of California, Los Angeles. Context. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

OCLC Research Webinar9 September 2010

Constance Malpas, OCLC ResearchZack Lane, Columbia UniversityHelen Look, University of MichiganJacob Nadal, University of California, Los Angeles

Page 2: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

2

Context

• Making library data “work harder” • Decision support: where should limited

institutional resources be directed?• New skill sets, professional cohort emerging

• Highlight significant work at RLG partner institutions

• Identify shared research priorities, methodologies• Staffing and infrastructure requirements,

organizational development

Page 3: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

3

Helen LookCollection AnalystUniversity of [email protected]

Zack LaneReCAP CoordinatorColumbia [email protected]

Jacob NadalPreservation [email protected]

Today’s Panelists

Page 4: Managing Collections in the Networked Environment:  New Analytic Approaches

Circulation Data at Columbia University Libraries

Zack Lane

ReCAP CoordinatorColumbia University

Page 5: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

5

Project: Look at System-wide Circ Data

570827 567739

527171

505288

474218 469217 467272

0

100000

200000

300000

400000

500000

600000

FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10

Total Charges by FY

0

5000

10000

15000

20000

25000

30000

35000

40000

7/1

/20

03

9/1

/20

03

11

/1/2

00

3

1/1

/20

04

3/1

/20

04

5/1

/20

04

7/1

/20

04

9/1

/20

04

11

/1/2

00

4

1/1

/20

05

3/1

/20

05

5/1

/20

05

7/1

/20

05

9/1

/20

05

11

/1/2

00

5

1/1

/20

06

3/1

/20

06

5/1

/20

06

7/1

/20

06

9/1

/20

06

11

/1/2

00

6

1/1

/20

07

3/1

/20

07

5/1

/20

07

7/1

/20

07

9/1

/20

07

11

/1/2

00

7

1/1

/20

08

3/1

/20

08

5/1

/20

08

7/1

/20

08

9/1

/20

08

11

/1/2

00

8

1/1

/20

09

3/1

/20

09

5/1

/20

09

7/1

/20

09

9/1

/20

09

11

/1/2

00

9

1/1

/20

10

3/1

/20

10

5/1

/20

10

Charges by Patron Group: Monthly

GRD OFF REG VIS

0

50000

100000

150000

200000

250000

FY03/04 FY04/05 FY05/062 FY06/072 FY07/082 FY08/092 FY09/102

Charges to Patrons by Patron Group:From FY03/04-09/10

GRD OFF REG VIS

• Several data categories

• Many ways to slice

• Tip of the iceberg

Page 6: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

6

Lots of Amazing (Granular) Data

• Main categories: accessions, retrieval, delivery and circulation

Page 7: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

7

Monday Tuesday Wednesday

Thursday Friday Saturday Sunday0

10000

20000

30000

40000

50000

60000

Request Volume by WeekdayFY02-FY09 (total: 289,140)

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 230

5000

10000

15000

20000

25000

30000

35000

Request Volume by HourFY02-FY09 (total: 289,140)

Retrieval Rate by Publication Date and Language

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%

30.00%

35.00%

40.00%

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

spa rus por pol ita hin ger fre eng ara

0

1000

2000

3000

4000

5000

6000Multi-dimensional Analyses

Page 8: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

8

Circulation Analysis Project: Spring 2010

• Identify bright and capable intern: Steve Zweibel!• Locate data sets• Understand data sets• Working with Systems staff to improve data• Reformatting Data• Manipulating data with Excel 2003/2007 (Pivot tables)• Presenting data with Power Point• Rethink, rework and refine

Page 9: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

9

Big Numbers: 18% decline in circ over 7 years

570827 567739

527171

505288

474218 469217 467272

0

100000

200000

300000

400000

500000

600000

FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10

Total Charges by FY

Page 10: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

10

What’s to Gain?

• Data set is clean and compact• Categories apply to both on-site and off-site

collections• Low barrier of entry for library school intern• Everyone has access; no-one is looking• Improves understanding of circ system, patron

behavior, staff habits, data analysis tools and system-wide trends

• Staff somewhat surprised that circulation data analysis was system-wide not ReCAP-specific

Page 11: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

11

Big Numbers: Revisited

0

5000

10000

15000

20000

25000

30000

35000

40000

7/1

/20

03

9/1

/20

03

11

/1/2

00

3

1/1

/20

04

3/1

/20

04

5/1

/20

04

7/1

/20

04

9/1

/20

04

11

/1/2

00

4

1/1

/20

05

3/1

/20

05

5/1

/20

05

7/1

/20

05

9/1

/20

05

11

/1/2

00

5

1/1

/20

06

3/1

/20

06

5/1

/20

06

7/1

/20

06

9/1

/20

06

11

/1/2

00

6

1/1

/20

07

3/1

/20

07

5/1

/20

07

7/1

/20

07

9/1

/20

07

11

/1/2

00

7

1/1

/20

08

3/1

/20

08

5/1

/20

08

7/1

/20

08

9/1

/20

08

11

/1/2

00

8

1/1

/20

09

3/1

/20

09

5/1

/20

09

7/1

/20

09

9/1

/20

09

11

/1/2

00

9

1/1

/20

10

3/1

/20

10

5/1

/20

10

Charges by Patron Group: Monthly

GRD OFF REG VIS

4 main patron groups: Grad Students, Faculty, Undergrads and Visitors

Page 12: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

12

Zack’s Deep Thoughts

• Trend of total charges is downward due to use of e-resources

• Leveling off indicates that use of print copy is still strong and critical

• Faculty charges have increased due to more Grads serving as adjunct faculty (with OFF privileges)

• Grads peak a month before Undergrads because of course requirements (tilted towards written papers instead of tests)

Page 13: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

13

What Else Did Steve Discover?

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

7/1/

2003

9/1/

2003

11/1

/200

3

1/1/

2004

3/1/

2004

5/1/

2004

7/1/

2004

9/1/

2004

11/1

/200

4

1/1/

2005

3/1/

2005

5/1/

2005

7/1/

2005

9/1/

2005

11/1

/200

5

1/1/

2006

3/1/

2006

5/1/

2006

7/1/

2006

9/1/

2006

11/1

/200

6

1/1/

2007

3/1/

2007

5/1/

2007

7/1/

2007

9/1/

2007

11/1

/200

7

1/1/

2008

3/1/

2008

5/1/

2008

7/1/

2008

9/1/

2008

11/1

/200

8

1/1/

2009

3/1/

2009

5/1/

2009

7/1/

2009

9/1/

2009

11/1

/200

9

1/1/

2010

3/1/

2010

5/1/

2010

Renewals by Patron Group: Monthly

GRD OFF REG VIS

Patrons renew when they must renew

Page 14: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

14

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

200000

FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10

OFF by Charge Type: FY 03/04-09/10

OFF - Sum of Charges OFF - Sum of Renewals OFF - Sum of Recalls OFF - Sum of Holds

Patron Group Habits:

• Charge/Renewal ratios are consistent with staff perceptions

• Faculty are more likely to hold onto the books that they have than charge out new ones

• Undergraduates have little need to renew with extended loan period

• Grad students charge a lot of material and hold it for more than one term

• Visitors hold books longer since obtaining OPAC renewal permission

0

50000

100000

150000

200000

250000

FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10

REG by Charge Type: FY 03/04-09/10

REG - Sum of Charges REG - Sum of Renewals REG - Sum of Recalls REG - Sum of Holds

Page 15: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

15

0

100000

200000

300000

400000

500000

600000

FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10

Onsite Vs. Offsite Charges

Offsite Onsite

Page 16: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

16

Relevant to Other Issues

Access Services considering changes to Hold policy:

• Holds have limited duration

• Data indicates that most Holds expire

• Should Hold duration be extended?

• Should Holds (via OPAC) be eliminated?

Enter Circ data analysis…

Page 17: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

17

3,661

5,342

5,9775,784

5,460

6,431 6,365

0

1000

2000

3000

4000

5000

6000

7000

FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10

Total Holds by FY

6,593

6,0436,262

10,191

11,065 11,105

11,825

0

2000

4000

6000

8000

10000

12000

14000

FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10

Total Recalls by FY

Overall Holds are half that of Recalls

Online Holds are one-eighth that of Recalls

0

2000

4000

6000

8000

10000

12000

FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10

Recalls: OPAC vs Everywhere Else

OPAC Everywhere Else

0

1000

2000

3000

4000

5000

6000

FY03/04 FY04/05 FY05/06 FY06/07 FY07/08 FY08/09 FY09/10

Holds: OPAC vs Everywhere Else

OPAC Everywhere Else

Page 18: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

18

Moving Forward

• Bring data to staff; don’t expect staff to come to data

• Learn from the intern• Query, pressure and improve data

• Congratulations Steve Zweibel! Part-time professional cataloging position at NYU and part-time reference at Hunter College Health Sciences Library

Page 19: Managing Collections in the Networked Environment:  New Analytic Approaches

Post-digitization Use of Print Collections

Helen Look

Collection AnalystUniversity of Michigan

Page 20: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

20

Background

• Study of post-digitization use of print collections at the University of Michigan

• University of Michigan digitization efforts

• HathiTrust Digital Library - http://www.hathitrust.org

Page 21: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

21

Access to Institutional Resources

• Harmonizing data from different sources

• Working with different staff to gather the data

• Consulting with internal and external colleagues

Page 22: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

22

Methodology

• Top 500 accessed titles in HathiTrust Digital Library by the University of Michigan community in 2009

• Title-level online usage was compared to title-level print usage

• Print circulation for the sample was compiled for 2008, 2009 and total circulation history

Page 23: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

23

Low Usage of the Print

• 98% (489) of the titles had zero circulation

• 2% (11) of the titles circulated

• 2009 circulation for the 11 titles was equal to or less than the 2008 circulation

2%

98%

2009 Circulation

Circulated Titles

Non-Circu-lated Titles

Page 24: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

24

Increased Discoverability of the Content

• 39% (193) of the titles had not circulated

• 61% (307) of the titles had circulated

• Hidden treasures

61%

39%

Historic Circulation

Circulated Titles

Non-Circu-lated Titles

Page 25: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

25

Subject Distribution

Technology

Social Sciences

Science

Political Science

Philosophy, Psychology, Religion

Naval Science

Music

Military Science

Medicine

Law

Language and Literature

History

Geography, Anthropology, Recreation

General Works

Fine Arts

Education

Bibliography, Library Science, General Info Resources

Auxiliary Sciences of History

Agriculture

0% 2% 4% 6% 8% 10% 12% 14% 16%

13%

15%

11%

2%

6%

2%

1%

2%

1%

0%

13%

15%

0%

10%

3%

2%

1%

1%

2%

Page 26: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

26

Patterns in the Overall Online Usage

<1 pa

gevi

ews

1 pa

gevi

ews

2 pa

gevi

ews

3 pa

gevi

ews

4 pa

gevi

ews

5 pa

gevi

ews

6 pa

gevi

ews

7 pa

gevi

ews

8 pa

gevi

ews

9 pa

gevi

ews

10 p

agev

iews

11 p

agev

iews

12 p

agev

iews

13 p

agev

iews

14 p

agev

iews

15 p

agev

iews

16 p

agev

iews

17 p

agev

iews

18 p

agev

iews

19 p

agev

iews

20+ p

agev

iews

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

Page 27: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

27

Lessons Learned

• Improved our understanding of the use of online and print materials made accessible through mass digitization

• Learned from the process about what data is available and where better metrics are needed

• Identified some potential patterns for further study

Page 28: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

28

“The temptation to form premature theories upon insufficient data is the bane of our

profession.”

– Sherlock Holmes 

Page 29: Managing Collections in the Networked Environment:  New Analytic Approaches

Data-based Preservation Decision Making

Jacob Nadal

Preservation Officer UCLA Library

Page 30: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

30

Preservation Theory and History:

Medicine, Zoos and Fortresses

• 20th century preservation was effectively local. We tried to protect or repair the items in the collection

– Rigid, comprehensive security and environmental standards

– Fortification of item (library binding, deacidification)– Replacement of weak items with hardened versions

(library editions, microfilm, facsimiles)

• Libraries are more like zoos than fortresses • Preservation was trying to deal with public

health problems in the metaphorical emergency room

Controversial assertion: What libraries call preservation is more like conservation at scale, and it’s still not to scale.

Page 31: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

31

Preservation Theory at UCLA: Public Health, Flood Control & Habitats• Preservation works from the collection down• Conservation works from the item up• At UCLA, one strategy governs both approaches

• Every activity has a:– 1) Method of analysis or evidence-gathering– 2) Treatment proposal & outside review– 3) Hedge or fail-safe option

• We’re operating a dam or flood control channel, not manning a wall under siege

• You can make your LA River jokes now… ha, ha…• As our program matures, the watchword is

habitat:– Habitats are flexible and adaptable– Habitats are sustainable or not, depending on certain

pressures– Habitats have local versions of global types

Page 32: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

32

The Los Angeles River

The concrete bottom reduces the effectiveness for flood control, creates a bad habitat for wildlife, and ruins its recreational value. All that effort for nothing!

The natural river is less work and functions better. A model worth emulating!

More about the River: http://www.lariver.org and http://folar.org/

Page 33: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

33

From Theory towards Practice: Habitat, Sure, but Who Lives There?• UCLA, like all big RLs, has some really shabby

books.

• These “brittle books” are frustrating.– Repair is not the answer: little structural integrity

means they’re irreparable or require “heroic” conservation

– Reformatting is costly: fragile, poor contrast materials, so scanning has to be careful and high-quality

• And yet, we’re obligated to preserve certain things:

– Materials with high Los Angelocity– Scarce within the UC system, California, or the world– Signature collections, future classics, academic

emphases

• Everything else, we want you to do for us– kthanksnextslide!

Page 34: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

34

From Practice to Practicalities

• Push decisions from the item to the network context

• Holdings review is the first step• Holdings data parsed into global (Worldcat), regional

(CA/350 miles of zip code 90095), and system (Worldcat Local/NGM)

• HathiTrust status checked (Portico, (C)LOCKSS, JSTOR to come?)

• These data are placed into a risk assessment model

• Series of automated recommendations are made

So, how does that dam/wall/habitat thing address the problem of lots of individual shabby books, in the context of a globally-intended collection of record?

Page 35: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

35

Risk Assessment Model From Candace Yano (UC Berkeley/Ithaka)

Initial number of copies Survival probability1 36.6032%2 59.8085%3 74.5199%4 83.8464%5 89.7592%6 93.5076%7 95.8841%8 97.3906%9 98.3457%

10 98.9513%11 99.3351%

12 99.5785%13 99.7328%14 99.8306%15 99.8926%16 99.9319%17 99.9568%18 99.9726%19 99.9827%20 99.9890%21 99.9930%22 99.9956%23 99.9972%24 99.9982%25 99.9989%

26 99.9993%

12 copies is where the curve is asymptotic

26 is derived from past decisions by UCLA

Page 36: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

36

Basic Scenario for Preservation Review

• Three outcomes:• Keep if [<12 global] OR [<3 CA] OR [0 UC] • Withdraw if [> 26 global]• Else Review

• Data is collected (point 1) then a proposed treatment gets external review by coll. managers (point 2) and all decisions are hedged by the network (point 3)

• “Keep” implies that preservation will see to it that the content remains in the collection

• “Withdraw” really means withdraw• “Review” means we need a genuine person to make a

decision (and people are both slow and idiosyncratic, so…)

Page 37: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

37

Alternate Scenarios

• Decision making starts with the basic scenario. We’ll fine-tune that as we collect decision data

• Seeking best match between collection managers decisions and automated indicators

• After a designated review period, may use a more risk-tolerant scenario to decide on materials lingering in “review” status

• At present, we have a known unknown regarding artifactual value

• Conservation screens materials and routes to preservation review. Conservators are eagle-eyed about artifactual value

• Preservation officer reviews all “withdraws” and, for better or worse, yours truly has a mild case of bibliomania

Page 38: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

38

The Hedgerow

#2 OR, with Hathi, Retain First

#3 AND, Retain First

#4 AND, with Hathi, Retain First

#1 OR, Retain First

Page 39: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

39

The Long Tail

Page 40: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

40

The Los Angeles Triangle

Inside this zone, we have a stewardship obligation, driven by preservation, a general good

Outside, we have options, driven by an institutional intention

Page 41: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

41

Acknowledgements and Next Steps

• What made this possible• Annie Peterson – summer intern in the UCLA Library

preservation office, from UIUC GSLIS. What made it possible

• Willingness by all to try a “Cynefin” style of work -- gradual sense-making and continuous process improvement

• What would make it easier• In-house statistics expertise and research support• Longer stretches of uninterrupted time• Better serials data – Communal 583 + Local Holdings

Records

• What comes next• More of the same, to refine and test our process• Application to other activities: gifts & exchange,

replacements and preservation-driven acquisition, preservation survey & audit

Page 42: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

42

For More Information

ReCAP Data Center (Columbia University)• Zack Lane & Colleen Major “

Impact Theories: Trends in Off-site Shelving Facility Use” (ACRL, 2008)

HathiTrust Digital Library• Helen Look "

Mass Digitization: Analyzing Online vs. Print Usage at a Large Academic Research Library" (ARL, 2010)

UCLA Library Preservation Blog• Jake Nadal and John Riemer “

Preservation Actions, MARC 21 Field 583, and Communal Local Holdings in OCLC WorldCat” (CONSER, 2009)

Page 43: Managing Collections in the Networked Environment:  New Analytic Approaches

Managing Collections in the Networked Environment: New Analytic Approaches

43

Questions, Comments?

Zack Lane - [email protected]

Helen Look - [email protected]

Jacob Nadal - [email protected]

http://www.oclc.org/research/events/webinars.htmhttp://itunes.apple.com/podcast/oclc-research-podcasts-webinars/id284764834