istec data management forum #3 pat burns dean of csu...

46
ISTeC Data Management Forum #3 Pat Burns Dean of CSU Libraries & VP for IT Friday, May 2, 2014 05/02/2014 ISTeC DM Forum3 1

Upload: others

Post on 26-Sep-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

ISTeC Data Management Forum #3

Pat Burns

Dean of CSU Libraries & VP for IT

Friday, May 2, 2014

05/02/2014ISTeC DM Forum3 1

Page 2: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

“A dangerous, foreboding, or deathlike influence or vapor.”

05/02/2014ISTeC DM Forum3 2

Page 3: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

“We are drowning in information and starving for knowledge.” – Rutherford D. Rogers

05/02/2014ISTeC DM Forum3 3

Page 4: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Understand needs for data management

Understand support for data management◦ Local IT infrastructure

◦ Central IT infrastructure: Globus

◦ Libraries’ Digital Repository, DM support

◦ Other, external

Develop an institutional approach/strategy?

05/02/2014ISTeC DM Forum3 4

Page 5: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 5

Data

Understanding

Knowledge

Information

Wisdom

IT Systems:‘Doing’

Increasing access to data = Increasing research productivity?

Innovation!

Page 6: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Analysis

Physical

Experiment

Numerical

experiment

Big

Data

05/02/2014ISTeC DM Forum3 6

The 4th Way of Doing Science?

Page 7: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Agencies with > $100m research funding must require federally-funded research to be made publically available, including data sets

05/02/2014ISTeC DM Forum3 7

Pubs & Data

Page 8: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Journal

Publication

w/ Data Sets

Journal

Subscription

Journal

Access

Research

(Grants)

05/02/2014ISTeC DM Forum3 8

Faculty

Libraries

IT: Networks, Storage,Access Mgmt.

Page 9: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 9

Page 10: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 10

Working Data Sets

Scholarly Data Sets(linked to

pubs)

Locality

Page 11: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Data sets, by themselves, are not “creative works,” and are therefore not copyrightable◦But, publications and user’s manuals describing them are

05/02/2014ISTeC DM Forum3 11

Page 12: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

“I’d sooner show someone my underwear, than share my data sets with them!”

05/02/2014ISTeC DM Forum3 12

Page 13: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 13

Page 14: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Unfunded mandate for us!◦Faculty

◦RA’s

◦Students

◦ IT

◦Libraries

05/02/2014ISTeC DM Forum3 14

Page 15: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

NSF is SERIOUS About Data Sharing

05/02/2014ISTeC DM Forum3 15

Page 16: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 16

Required as of Jan. 18, 2011

Sharing data will speed research and economic development!

Page 17: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

ARL Meeting Oct. 2012◦ Myron Gutmann,

Assistant Director, National Science Foundation, Directorate for Social, Behavioral & Economic Sciences

05/02/2014ISTeC DM Forum3 17

Page 18: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 18

Proposal Review‘Big’

Funding

Page 19: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 19

Proposal Review‘Small’

Funding

Produce

Data Set

Measure

Usage

If High ‘Big’ Funding

Page 20: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 20

Page 21: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Pubs w/ Data

Processed Data &

Data

Representations

Data Collections &

Structured DB’s

Raw Data & Datasets

05/02/2014ISTeC DM Forum3 21

1. Data within published articles

2. Supplementary files to articles

3. Data referenced fromarticles & held elsewhere

4. Published datasets

5. Data on local disks

Page 22: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Discoverable◦ Crawled by Google and others

Accessible◦ Robust infrastructure, accessibility

Organized/searchable ◦Metadata◦ Catalogued

Persistent◦ Persistent Digital Object Identifier (DOI)◦ Preserved and transcoded as needed

05/02/2014ISTeC DM Forum3 22

> 25% Failure

Rate of URL’s!!

Page 23: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

1. Federal agency systems: NIH PubMed Central◦ NASA only other federal agency deploying a DR?

2. Disciplinary repositories

3. Local repositories: CSU’s DigiTool◦ Data Management Plan templates

◦ Discoverability, accessibility, & preservation

◦ Usage stats

◦ Linked to pubs

4. Local file share files from a data store◦ Granular access control via a common infrastructure, e.g.

Globus

05/02/2014ISTeC DM Forum3 23

Page 24: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

1. Where do I put my scholarly data?◦ Wherever it’s free! Locality? Can I move it around?◦ Repository: persistence/longevity◦ Globus-enabled file share

2. How do I expose my data?◦ PubMed Central, NASA repository◦ Disciplinary repository◦ Local repository (metadata)

05/02/2014ISTeC DM Forum3 24

Page 25: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Association of Research Libraries (ARL) SHARE initiative: local repositories◦ “Shared Access Research Ecosystem”

◦ For preservation and access of scholarly works and data sets

◦ Not for storage of working files

CSU’s DigiTool Digital Repository (DR)

05/02/2014ISTeC DM Forum3 25

Page 26: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 26

Page 27: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 27

Page 28: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 28

HPC Centers,Disciplinary Repositories

Internet

NASA?CSU

ACNS GlobusFile Store

DigiTool DRMetadata

Local

RemoteGlobus User

AccessControl

Page 29: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

If files stored in (any) digital repository◦ ~OK for metadata and discoverability

If files stored elsewhere◦ Metadata should be stored on a digital repository

for discoverability, accessibility, citability, and persistence; and “point” to the data

If files stored locally◦ Can use Globus to manage access granularly

◦ What about backup and preservation?

05/02/2014ISTeC DM Forum3 29

Page 30: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 30

DigiTool

Local RAID

Off-campus

RAID PreservationRAID

Internet

2X2X DR Site

Total = 6X

2XMain Site

X = Storage Size

Page 31: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

NSF likes:◦ Managed jointly by Librarians and IT Professionals

◦ Connected to utrahigh-speed research network

◦ The best metadata, standards,…

◦ Discoverability (crawled by all services), accessibility

◦ For preservation of scholarly data and pubs

Limitations◦ Very limited access control (NSF likes this too)

◦ Does not support structured data (e.g. databases)

◦ Not for working data sets

05/02/2014ISTeC DM Forum3 31

Page 32: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

1. Cultural change

2. More work – ugh!

3. Greater capital cost◦ The cost of additional storage

required could be extreme

05/02/2014ISTeC DM Forum3 32

Page 33: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

It depends!

05/02/2014ISTeC DM Forum3 33

Page 34: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

SSD’s are required for performance, but expensive

Amazon announced ~80% cost reduction in storage - $10/TB-mo!◦We are exploring!!!

05/02/2014ISTeC DM Forum3 34

Page 35: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 35

Page 36: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 36

Big Data

Us

Page 37: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

If “incidental,” can be included as a direct cost in the grant

Some storage for scholarly data sets “free” to CSU Faculty◦ First 100 GByte free, thereafter $4k/TB

Storage of pubs+ always “free” to CSU faculty and students◦ Assumes no “big” embedded data

Exploring Amazon as an option

05/02/2014ISTeC DM Forum3 37

Game

Changer!

Page 38: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

For a single file◦ Small: 1 GByte

◦ Medium: 10-100 GBytes

◦ Big: >100 GBytes

For a set of files◦ 100X that of a single file?

How many files is ‘big?’

05/02/2014ISTeC DM Forum3 38

1 full rack stores:~350 Tbytes usable

& Costs ~$150k(~$1,000/TB)

Page 39: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 39

ACNSStorageHPC

& Other Devices

LocalStorage

Page 40: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

04/08/14CC-NIE Panel 40

BiSON

FRGPResearchNetworks

Internet

Border 1Border 2

ProductionLAN

Core 2 Core 1

Campus 100 GigCore Routing

Cluster

DYNES Server &Storage, FDT

“ScienceDMZ”

Commodity Users &Researchers (typ.)

DynamicVLANs

3 ea. 10 Gbps Wave

DYNESIDC

10 GbpsResearch

Connections (typ.)

~ 40 or 100 Gig?

1. ✔

5. ~

2. ✔

3. ✔

4. ✖

3. ✔

Page 41: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Later in this forum, you will hear more from our consummate experts

05/02/2014ISTeC DM Forum3 41

Page 42: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

ORCID – the Organizational Researcher Contributor ID

An unique numeric identifier to “Connect Research and Researchers”

Makes publication data gathering much easier

Complements the DOI

See http://orcid.org/

05/02/2014ISTeC DM Forum3 42

Page 43: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 43

So much of what we call management consists of making it difficult for people to work.

- - Peter Drucker

Make Work as Easy as Possible

Page 44: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

Do we need a standing Data Management Committee?

05/02/2014ISTeC DM Forum3 44

Page 45: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

45

Move in any direction, as long as its somewhat positive!

05/02/2014ISTeC DM Forum3

Page 46: ISTeC Data Management Forum #3 Pat Burns Dean of CSU ...istec.colostate.edu/pdf/activities/data-management-forum-2014/Burn… · 02/05/2014  · Discoverability, accessibility, &

05/02/2014ISTeC DM Forum3 46

Sometimes it is better to journey hopefully, than to

arrive!!!