an analysis and characterization of dmps in nsf proposals from the university of illinois rdap14...

17
An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014 William H. Mischo, Mary C. Schlembach, & Megan N. O’Donnell University of Illinois at Urbana- Champaign Iowa State University

Upload: chloe-crawford

Post on 16-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois

RDAP14 Research Data Access & Preservation SummitMarch 26, 2014

William H. Mischo, Mary C. Schlembach, &Megan N. O’Donnell

University of Illinois at Urbana-ChampaignIowa State University

Page 2: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

NSF Data Management Plans• Data Management Plans (DMPs): required

element in NSF proposals, January 2011

• July 2011: the Library, working with the campus Office of Sponsored Programs and Research Administration (OSPRA) began an analysis ofDMPs in submitted NSF grant proposals

• Currently, looked at 1,600 grants with 1,260 in the analysis.

Page 3: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Reasons for Analysis

•What storage venues and mechanisms for sharing and reuse are being used?

•Are the PI’s using local templates and local campus resources such as the IDEALS?

Page 4: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Follow-on• Develop campus-wide infrastructure (Research

Data Service - RDS)

• Assist in compliance with federal agencies

• Develop important partnerships with campus units (CITES, NCSA, Colleges) and national entities

• Develop best practices and standard approaches

Page 5: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Analysis• Analysis attempts to characterize and classify

DMPs into categories

• DMPs assigned multiple categories

• 1,260 DMPs from July 2011 to November 2013

Page 6: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Categories• PI Server – Servers and workstations that the PIs

(and their students/staff) use to store project data. laboratory server/workstations, external hard drives, group computer

• PI Website – Websites edited or administered by the PI or a group they belong to. Examples: lab website, project website, wiki, PI’s website

Page 7: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Categories• Campus – Services located, operated by, run by or

endorsed by Illinois. IDEALS, Netfiles and Box.net, NCSA, and Beckman Institute.

• Department – Used when a department was specifically mentioned as providing a storage or hosting resource. Departmental website, departmental server, departmental backup service or a web address traced back to an academic department (also given the “campus” label)

Page 8: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Categories• Remote – Services and sites not located on the

Illinois campus. NASA, other campuses, collaborative projects, non-Illinois institutes

• Disciplinary – Disciplinary repositories.GenBank, arXiv, ICPSR, SEAD, Nanohub, and Dryad

• Cloud – Storage services using cloud technology. Google Drive, Google Code, Box.net, Amazon, Microsoft, Dropbox

Page 9: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Categories• Publication - Scholarly outputs.

Journal articles, workshops, and conference presentations/posters.

• Analog - Physical records/data. Lab notebooks, photographs, files

• Specimens - Physical specimens. Usually biological or artifacts

Page 10: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Categories• Optical Disc - DVD, CD, and Blu-ray discs.

• Not specified – the DMP was not specific enough for us to categorize further.

• No Data – Indicated the proposal will produce no data products.

• Local Template Used – used a library authored template.

Page 11: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Category Number Percent

PI Server 503 39.9%PI Website 529 41.9%

Campus 667 52.9%Department 142 11.2%

Remote 353 28%Disciplinary 275 21.8%Publication 556 44.1%

Cloud 63 5%Optical Disc 56 4%

Analog 131 10.4%Specimens 111 8.8%

Not Specified 66 5.2%Collaborative 164 13%

No Data 103 8.2%

ALL DMPs (n=1,260)

Page 12: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Data Venue and Risk

Data LocationSubmitted Proposals Funded Proposals

Risk of Loss/Corruption/ Breach

n=1260 n=298

PI Server/Website 64% High 61% HighDepartmental Server/Website

11.2%Medium to

High7%

Medium to High

Campus-Wide Resource 52.9%

Low45%

LowIDEALS (Institutional Repos.)

21.9% 19.8%

NCSA 4.3% 16.4%Disciplinary Repository/Cloud

25.8%Medium to

Low21.4%

Medium to Low

Remote Repository 28%Medium to

High22.8%

Medium to High

Optical Disk, Specimens, Analog

19.4% Out of Scope 11% Out of Scope

Page 13: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Notables• Funded: 298

• Used local template: 254

• Only 87 DMPS contained information about file types

• IDEALS: 275

• NCSA/XSEDE: 55

• Dryad: 22

• ICPSR: 17

• GenBank: 55

• ArX: 61

Page 14: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Analysis

• Any differences in storage venue or technologies between the unfunded proposals and the funded proposals?

• Any differences between the proposals from the first year and the more current proposals?

• Other differences in proposal categories between funded and unfunded

• 734 active NSF awards, $861.8 million

Page 15: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Analysis: Funded vs. Not-funded• IDEALS institutional repository: frequencies:

62 funded, 197 not funded: chi-square: 0.17. need chi-square >= 3.84 to be significant

• Storing data on PI server or website: 183 funded, 569 not funded: chi-square: 0.7

• Disciplinary or Cloud: 67 funded, 241 not funded: chi-square: 0.85

• Remote storage: 68 funded, 267 not funded: chi-square: 3.01

Page 16: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Analysis• Use of IDEALS

before August 2012 = 108 after (thru November 2013) = 166chi-square: 4.59, p < .05

• Use of Disciplinary or Cloud before August 2012 = 121 after = 182chi-square: 4.33, p < .05

Page 17: An Analysis and Characterization of DMPs in NSF Proposals from the University of Illinois RDAP14 Research Data Access & Preservation Summit March 26, 2014

Implications and Conclusions1. No significant differences between

funded/unfunded proposals in storage venues - no funding advantage in IDEALS, Disciplinary.

2. But, more recent proposals suggest IDEALS and disciplinary repositories included at a significantly higher level. Why?

• What is the role of the library? The campus? The subject discipline?

• Connecting data to the literature important