collaborative data management at the university of california
DESCRIPTION
Webinar presented on December 5, 2012, by Joan Starr and Perry Willett of CDL/UC3, and Lisa Federer and Claudia Horning from UCLA. Part of the ACRL Digital Curation Interest (DCIG) Group Webinar Series.TRANSCRIPT
1
Photo credit: http://www.flickr.com/photos/joanet/2994421437/ By Jo@netJoanCampderrós‐i‐Canas
2
3
Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (Vitor Leite)
Why should researchers bother with DATA CITATION? What is their motivation?
To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationT id i tifi d ibilit th h di t bi ti t th i d t d iTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re‐use
Source: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
4
Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (Vitor Leite)
Why should researchers bother with DATA CITATION? What is their motivation?
To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationT id i tifi d ibilit th h di t bi ti t th i d t d iTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re‐use
Source: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
5
Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (Vitor Leite)
Why should researchers bother with DATA CITATION? What is their motivation?
To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationT id i tifi d ibilit th h di t bi ti t th i d t d iTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re‐use
Source: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
6
Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (Vitor Leite)
Why should researchers bother with DATA CITATION? What is their motivation?
To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationT id i tifi d ibilit th h di t bi ti t th i d t d iTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re‐use
Source: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
7
Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (Vitor Leite)
Why should researchers bother with DATA CITATION? What is their motivation?
To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationT id i tifi d ibilit th h di t bi ti t th i d t d iTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re‐use
Source: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)
8
9
This is also a question of an almost perfect fit with our historic mission to preserve and protect our institution’s scholarly output.
472,000 in September
455,000 in May!
10
11
IMAGE CREDIT:IMAGE CREDIT: http://www.flickr.com/photos/bleuman/6160605143/By Yunchung Lee
As Sherry described to you, Research has a life cycle.
12
13
Supporting the MANAGE AND SHARE STAGES ARE THE MERRITT curation and i i d l id ifi i EZIDpreservation repository and our long‐term identifier service, EZID.
BAKING DATA CURATION INTO THE COLLECTION PHASE: DATA‐UPAND ENHANCING COLLECTION OF E‐SCIENCE IN THE WEB ENVIRONMENT : WEB ARCHIVING SERVICE, OR WAS
To facilitate data publication, we are exploring this new Data Paper model.
We are engaged in a number of network‐level collaborations and partnerships, but these two have particular relevance to the data management space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.
And lastly, we have partnered with UVA, and many others to develop and l h t D t M t Pl T llaunch an easy to use Data Management Plan Tool.
Let’s take a brief look at all of these things, and then we’ll talk about what this means to you.
14
Current work includes the Datashare project at UC San Francisco (UCSF). p j ( )Datashare, as the name suggests, encourages researchers to share their data. See the Datashare website at http://datashare.ucsf.edu
15
In its capacity as a data Management tool, Merritt can function in one of p y g ,several ways: it can be a “dark” or inaccessible archive for important digital assets; It can serve as a “bright” archive with direct discovery and access; It can be the preservation back‐end for existing or new discovery and content management systems; or it can integrate with distributed data grids.
Current work includes the Datashare project at UC San Francisco (UCSF). Datashare, as the name suggests, encourages researchers to share their data. See the Datashare website at http://datashare.ucsf.edu
16
17
18
19
Preservation: Curation microservices and MerrittT b k d i i d i DCXL (D C i XL Pl I )To bake data curation into data creation: DCXL (Data Curation XL Plug‐In)To enhance data sharing, collecting and gathering: WAS serviceTo facilitate data publication, we are exploring this new Data Paper model.And behind many of these steps, the EZID service.
We are engaged in a number of network‐level collaborations and partnerships, but these two have particular relevance to the data
t ith D t ONE f d di t ib t d d t t k dmanagement space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.
And lastly, we have partnered with UVA, and many others to develop and launch an easy to use Data Management Plan Tool.
So let’s take a brief look at all of these things, and while I’m there, I’ll dive d l i t EZID hi h i th i Imore deeply into EZID, which is the service I manage.
20
Nobody thinks of Excel as a preservation‐ready tool, but everybody uses y p y , y yit! The KEY IDEA in keeping this EASY here is: let them use the tools they are use to using. (Get out of the way of that elephant!)
Gordon & Betty Moore Foundation + Microsoft Research are funding this.
Our part is requirements gathering; MS will do development. Open source plug in.
21
WAS allows curators to collect and manage web‐published content so g pthat scholars can use the content for private research and/or publish the content for public access.
The archives contain eScience content as well as government documents, event captures, and archives for specific research communities, such as unique data sets, collections of sites not otherwise grouped together, and the sites resulting from grant activity.
22
PUBLIC OR PRIVATE
WAS provides tools for analyzing site change over time and allows keyword searching for archived sites, and publishing an archive is optional. As of this writing, there are 93 active archives, over half of which are publically available.
23
Preservation: Curation microservices and MerrittT b k d i i d i DCXL (D C i XL Pl I )To bake data curation into data creation: DCXL (Data Curation XL Plug‐In)To enhance data sharing, collecting and gathering: WAS serviceTo facilitate data publication, we are exploring this new Data Paper model.And behind many of these steps, the EZID service.
We are engaged in a number of network‐level collaborations and partnerships, but these two have particular relevance to the data
t ith D t ONE f d di t ib t d d t t k dmanagement space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.
And lastly, we have partnered with UVA, and many others to develop and launch an easy to use Data Management Plan Tool.
So let’s take a brief look at all of these things, and while I’m there, I’ll dive d l i t EZID hi h i th i Imore deeply into EZID, which is the service I manage.
24
We are exploring this idea with various partners and funders, potentially p g p , p yto encourage conventions for describing data so that it can stand alone when appropriate
25
Preservation: Curation microservices and MerrittT b k d i i d i DCXL (D C i XL Pl I )To bake data curation into data creation: DCXL (Data Curation XL Plug‐In)To enhance data sharing, collecting and gathering: WAS serviceTo facilitate data publication, we are exploring this new Data Paper model.And behind many of these steps, the EZID service.
We are engaged in a number of network‐level collaborations and partnerships, but these two have particular relevance to the data
t ith D t ONE f d di t ib t d d t t k dmanagement space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.
And lastly, we have partnered with UVA, and many others to develop and launch an easy to use Data Management Plan Tool.
So let’s take a brief look at all of these things, and while I’m there, I’ll dive d l i t EZID hi h i th i Imore deeply into EZID, which is the service I manage.
26
DataONE is an NSF funded, virtual data center for biology, ecology, and , gy, gy,environmental sciences.
DataOne has the overarching goal of building a new culture of data access and data sharing. This is an international collaboration working with scientists and librarians, as well as other stakeholders.
1. Engaging the scientist in the data curation process2. Supporting the full data life cycle3. Encouraging data stewardship and sharing4. Promoting best practices5 E i iti5. Engaging citizens6. Developing domain agnostic solutions
27
28
How can EZID be in the business of issuing DataCite DOIs? California gDigital Library was one of the founding members.
DataCite was indeed formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“
The number has now grown to 16. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information and BGI, so there is a presence in Asia.
DATACITE’s primary methodology for achieving this mission: issuing DOIs (Di it l Obj t Id tifi ) f d t t(Digital Object Identifiers) for datasets.
29
These are the factors driving the collaboration: g
1. Institutions rely on soft funding… agencies have created a new demand, meet the demand or don’t get funded.
2. Approach is to work collaboratively to consolidate expertise and reduce costs
3. Libraries plus, plus4. Provide an environment that allows researchers to focus on research
30
Image credit:Image credit: http://content.cdlib.org/ark:/13030/kt667nc4xn/?query=service%20station&brand=calisphere, courtesy of Anaheim Public Library
31
Image credit:Image credit: http://content.cdlib.org/ark:/28722/bk0007s853c/?query=tools&brand=calisphere Courtesy of UC Berkeley, Bancroft Library; United Aircraft Corporation: Joint War Production Drive Committee
DATA CURATION LEADS TO GOOD OUTCOMES FOR RESEARCHERS.
• They’ll be motivated routinely to deposit in stable public storage. Data products (datasets and processing information) and the data papers that reward them with authorship credit
• Data journals will spring up around disciplines, even if disciplinary data papers are scattered across geographically distributed repositories.
• Data products will be re‐used, annotated, corrected, d i l li k d t f t diti l bli ti
32
Lots of work going on with data at UCLA, but I’m going to focus on just a couple of them
33
DMP Tool developed in part at UCLA – UCLA is second among Ucs in usage. More enrollees in DMPTool after presentation to administrator group than entire 4 months prior
34
35
Carly Strasser visit from CDL established interestInitial survey indicated researchers interested in many aspects of data management, especially data management plan
36
Initial results from the test indicate that researchers found the class useful
37
In Summer 2012, UCLA was one of 7 libraries to receive funding to add an informationist to an existing NIH funded research team
38
39
40
41
http://www.flickr.com/photos/sekihan/6100774057/ By sekihanp p y
42
43
Image source: http://www.flickr.com/photos/ausnahmezustand/4752989186/
By ausnahmezustand
44