data reuse experiences within digital vs. physical zoological collections

27
The world’s libraries. Connected. Data Reuse Experiences within Digital vs. Physical Zoological Collections University of Michigan Museum of Zoology (UMMZ), February 20, 2014 Ixchel M. Faniel, Ph.D. OCLC Research [email protected] Elizabeth Yakel, Ph.D. University of Michigan [email protected]

Upload: oclc-research

Post on 14-Jul-2015

209 views

Category:

Education


1 download

TRANSCRIPT

The world’s libraries. Connected.

Data Reuse Experiences

within Digital vs. Physical

Zoological Collections

University of Michigan Museum of Zoology (UMMZ), February 20, 2014

Ixchel M. Faniel, Ph.D.

OCLC Research

[email protected]

Elizabeth Yakel, Ph.D.

University of Michigan

[email protected]

The world’s libraries. Connected.

• Institute for Museum and Library Services (IMLS) funded project led by Drs. Ixchel Faniel (PI) & Elizabeth Yakel (co-PI)

• Studying the intersection between data reuse and digital preservation in three academic disciplines to identify how contextual information about the data that supports reuse can best be created and preserved.

• Focuses on research data produced and used by quantitative social scientists, archaeologists, and zoologists.

• The intended audiences of this project are researchers who use secondary data and the digital curators, digital repository managers, data center staff, and others who collect, manage, and store digital information.

For more information, please visit http://www.dipir.org

The world’s libraries. Connected.

Research Motivations & Questions

1. What are the significant

properties of quantitative

social

science, archaeological,

and zoological data that

facilitate reuse?

2. How can these significant

properties be expressed

as representation

information to ensure the

preservation of meaning

and enable data reuse?

Faniel & Yakel 2011

The world’s libraries. Connected.

DIPIR Project

Nancy McGovern

ICPSR/MIT

Ixchel Faniel

OCLC Research

(PI)

Eric Kansa Open

Context

William Fink UM Museum of Zoology

Elizabeth Yakel

University of Michigan (Co-PI)

The Research Team

The world’s libraries. Connected.

DIPIR Project

Nancy McGovern

ICPSR/MIT

Ixchel Faniel

OCLC Research

(PI)

Eric Kansa Open

Context

William Fink UM Museum of Zoology

Elizabeth Yakel

University of Michigan (Co-PI)

The Research Team

The world’s libraries. Connected.

Research Methodology

ICPSR Open Context UMMZ

Phase 1: Project Start up

Interviews

Staff

10

Winter 2011

4

Winter 2011

10

Spring 2011

Phase 2: Collecting and analyzing user data

Interviews

data consumers

44

Winter 2012

22

Winter 2012

27

Fall 2012

Survey

data consumers

Over 1,600

Summer 2012

Web analytics

data consumers

Server logs

Ongoing

Observations

data consumers

11

✓Fall 2013

Phase 3: Mapping significant properties as representation information

The world’s libraries. Connected.

Research Methodology

ICPSR Open Context UMMZ

Phase 1: Project Start up

Interviews

Staff

10

Winter 2011

4

Winter 2011

10

Spring 2011

Phase 2: Collecting and analyzing user data

Interviews

data consumers

44

Winter 2012

22

Winter 2012

27

Fall 2012

Survey

data consumers

Over 1,600

Summer 2012

Web analytics

data consumers

Server logs

Ongoing

Observations

data consumers

11

✓Fall 2013

Phase 3: Mapping significant properties as representation information

The world’s libraries. Connected.

• Snapshot of Users

• Interviews

• Observations

• Discussion

Agenda

Image: DIPIR Team

The world’s libraries. Connected.

A Snapshot Of 40 Data Reusers

65%

90%

95%

reuse data directly

from colleagues

27.5%

reuse data from online

repositories and websites

reuse data from

museums and archives

35%

are systematists

study ecological trends

reuse data from

journal articles

20%

The world’s libraries. Connected.

The Discovery Process

“I knew from prior experience which museums had large

collections of material from the part of the world I was

interested in.” (CAU19)

“… we started from that [author] paper and then added to it from

other people’s work…So mostly from…reading other people’s

papers.” (CAU22)

“I am a graduate student at [university], in Zoology and one of my

committee members is an adjunct professor here, [name], so

she noticed that I had genetic data for the same individuals that

U of M has skull data for.” (CAU39)

“…that [aggregator repository] targets so many different

collections that once you have access you know pretty

much…You can identify very quickly what you need.” (CAU13)

The world’s libraries. Connected.

Selection Criteria

Data coverage Geographic precision

Matches another dataset

Availability of voucher specimen

Time period specimen collected

Sequence has been published

Results of pre-analysis

Relevant taxonomically

Condition of specimen

Location of repository

Availability of metadata

Physical variation of the species

Manner in which the specimen is preserved

The world’s libraries. Connected.

Interviews

Image: DIPIR Team

The world’s libraries. Connected.

Digital Data Selection Based On Locality

…often when it doesn’t meet my needs the most obvious

reasons would be there’s just not enough data or it doesn’t

cover…Like geographically it doesn’t cover the area I’m

interested in well enough (CAU03).

…that’s the first filter…looking for specific

species. And then for me, yeah, it’s been

mostly about the geographic precision

of the data, to say whether or not I can

use that record for something. (CAU26).

Image: Microsoft Clipart

The world’s libraries. Connected.

Digital Data Selection Based On Other Datasets

…we decide, okay, these

Georeferences have an error that

Is probably higher than, let’s

say, five kilometers but our

climate

data is the resolution, the pixel

size,…is may be 4.5 kilometers.

So, anything that is above that

size of pixel that we have, we

actually cannot use. (CAU14)

I include it [the sequence] in my dataset, do the analyses

I’m going to do and then based on the results of those

analysis look to see how those data match with the

data that I’ve collected. (CAU05)

Image: Microsoft Clipart

The world’s libraries. Connected.

Trusting Digital Data

“I can sort of qualitatively assess what the quality of

taxonomic data might be just by it being, having some mention

of the museum record. I know [a] …museum worker who is

often... I don't know about an expert in say, my group, but at

least has access to the relevant literature to make good

taxonomic decisions about those fishes from which they took

the tissue.” (CAU02)

“I would go back to the literature to look at the paper it came

from. I guess there is also to some degree the particular

researchers’ that actually produced that sequence; I might

actually know their reputations or what they kind of work on

and trust it more or less.” (CAU12)

The world’s libraries. Connected.

Trusting Digital Data

“A lot of times, it's just a matter of looking at what the Latin

name is that they supply because I can't really make a decision

based on the information that I'm given. If I had a picture, I

could use that when I'm taking into account their ability to

identify something. But the main way that I do it is by looking at

the geography of where they claim a specimen is located.”

(CAU17)

“Well, if there's a voucher specimen available then I can

request that specimen from the museum where it's housed,

re-examine it, confirm or deny that it is that particular species. If

the voucher's there and it's the right species, then I have to go

with it. If the voucher is not there, and I really question the

identification…Because it's unreliable in my mind.” (CAU20)

The world’s libraries. Connected.

Observations

Image: DIPIR Team

The world’s libraries. Connected.

Specimen Selection Based On Condition

“It needs to be intact right?

The skull needs to be

intact. That isn't in the

records usually, and I've

gotten used to the idea that

you just go and hope for

the best, and figure that if

they say they have 20, you

might find six you could

use. That would be a

helpful thing to know.”

(CAU34)

Image: DIPIR Team

The world’s libraries. Connected.

Specimen Selection Based On Holotypes For Comparison

“[Many] holotypes from the past

[are] deposited here in this

collection. And then it's really

useful to me, and important to

make a comparison with

those specimens that was the

original description when the

species already occur in the

country. But to do that in the

best comparisons, we need to

compare morphological data

with the new specimens that

we already collected in the

recent years.” (CAU29) Image: DIPIR Team

The world’s libraries. Connected.

Deciding To Visit UMMZ

“I think it’s because I was a student here so I know, I

knew what was here But I have to say, I worked on my

dissertation in the same area, I worked on skull

morphology, and so I learned as a graduate student that

you go and find the museums that are most likely to

have the specimens that you need.” (CAU34)

“And it's a good-sized collection. Especially in terms

of university's collections, there are a lot of specimens

here, good taxonomic diversity, and it's also close

for us . . . I'm going to the Smithsonian next week, but

that's a lot more expensive, a lot more time consuming.”

(CAU36)

The world’s libraries. Connected.

How Researchers Prepare For Their Visit To UMMZ

“Well, the crucial thing there is

getting a copy of the data

associated with the specimens

that are here…an Excel

spreadsheet that gave all the

information about the tissues

that are held here and the

morphological specimens.

Using that database, I was able

to then select which species we

need to study.” (CAU32)

Image: DIPIR Team

The world’s libraries. Connected.

Interaction With Repository Staff

“In this case, I was fortunate to

have [UMMZ staff], who took the

initiative to go through the

collections and find the most

well-preserved specimens that

he could . . . So, actually looking

through the collection that was

done by [UMMZ staff] and he

brought out the specimens for me

to use. So, that aspect was

alleviated by the fact that he gave

me a lot of help.” (CAU33)

Image: DIPIR Team

The world’s libraries. Connected.

Discussion

Image: DIPIR Team

The world’s libraries. Connected.

• In global age of online databases people still

need to see the actual specimens

• Condition and depth of the collection is important

• Aggregators vs. museum website vs. inventory

system

• Having data accessible online is great, but at

times it just is not sufficient

Discussion

The world’s libraries. Connected.

• The discovery processes are similar but selection

criteria are specific to research objectives

• Gaining trust in data about the specimen from a

distance

Discussion

The world’s libraries. Connected.

Acknowledgements

• Institute of Museum and Library Services

• Partners: Nancy McGovern, Ph.D. (MIT), Eric Kansa, Ph.D.

(Open Context), William Fink, Ph.D. (University of Michigan

Museum of Zoology)

• OCLC Fellow: Julianna Barrera-Gomez

• Doctoral Students: Rebecca Frank, Adam Kriesberg,

Morgan Daniels, Ayoung Yoon

• Master’s Students: Jessica Schaengold, Gavin Strassel,

Michele DeLia, Kathleen Fear, Mallory Hood, Annelise Doll,

Monique Lowe

• Undergraduates: Molly Haig

The world’s libraries. Connected.

Questions?

Ixchel Faniel

[email protected]

Beth Yakel

[email protected]

http://www.dipir.org