collaborative data management at the university of california

Photo credit: http://www.flickr.com/photos/joanet/2994421437/ By Jo@netJoanCampderrós‐i‐Canas

2

Image credit: http://www.flickr.com/photos/vixon/116447718/ by barryegan (Vitor Leite)

Why should researchers bother with DATA CITATION? What is their motivation?

To provide fair credit to those responsible: exposureTo ensure scientific transparency and reasonable accountability for authors and stewards: transparencyTo aid in tracking the impact of the work: citation trackingTo help data authors verify how their data are being used: verificationT id i tifi d ibilit th h di t bi ti t th i d t d iTo aid scientific reproducibility through direct, unambiguous connection to the precise data used in a particular study: scientific re‐use

Source: ESIP—Earth Science Information Partners (http://wiki.esipfed.org/index.php/Interagency_Data_Stewardship/Citations/provider_guidelines)

4





5





6





7





8

This is also a question of an almost perfect fit with our historic mission to preserve and protect our institution’s scholarly output.

472,000 in September

455,000 in May!

10

IMAGE CREDIT:IMAGE CREDIT: http://www.flickr.com/photos/bleuman/6160605143/By Yunchung Lee

As Sherry described to you, Research has a life cycle.

12

Supporting the MANAGE AND SHARE STAGES ARE THE MERRITT curation and i i d l id ifi i EZIDpreservation repository and our long‐term identifier service, EZID.

BAKING DATA CURATION INTO THE COLLECTION PHASE: DATA‐UPAND ENHANCING COLLECTION OF E‐SCIENCE IN THE WEB ENVIRONMENT : WEB ARCHIVING SERVICE, OR WAS

To facilitate data publication, we are exploring this new Data Paper model.

We are engaged in a number of network‐level collaborations and partnerships, but these two have particular relevance to the data management space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.

And lastly, we have partnered with UVA, and many others to develop and l h t D t M t Pl T llaunch an easy to use Data Management Plan Tool.

Let’s take a brief look at all of these things, and then we’ll talk about what this means to you.

14

Current work includes the Datashare project at UC San Francisco (UCSF). p j ( )Datashare, as the name suggests, encourages researchers to share their data. See the Datashare website at http://datashare.ucsf.edu

15

In its capacity as a data Management tool, Merritt can function in one of p y g ,several ways: it can be a “dark” or inaccessible archive for important digital assets; It can serve as a “bright” archive with direct discovery and access; It can be the preservation back‐end for existing or new discovery and content management systems; or it can integrate with distributed data grids.

Current work includes the Datashare project at UC San Francisco (UCSF). Datashare, as the name suggests, encourages researchers to share their data. See the Datashare website at http://datashare.ucsf.edu

16

Preservation: Curation microservices and MerrittT b k d i i d i DCXL (D C i XL Pl I )To bake data curation into data creation: DCXL (Data Curation XL Plug‐In)To enhance data sharing, collecting and gathering: WAS serviceTo facilitate data publication, we are exploring this new Data Paper model.And behind many of these steps, the EZID service.

We are engaged in a number of network‐level collaborations and partnerships, but these two have particular relevance to the data

t ith D t ONE f d di t ib t d d t t k dmanagement space, with DataONE focused on distributed data networks and DataCite on persistent identifiers.

And lastly, we have partnered with UVA, and many others to develop and launch an easy to use Data Management Plan Tool.

So let’s take a brief look at all of these things, and while I’m there, I’ll dive d l i t EZID hi h i th i Imore deeply into EZID, which is the service I manage.

20

Nobody thinks of Excel as a preservation‐ready tool, but everybody uses y p y , y yit! The KEY IDEA in keeping this EASY here is: let them use the tools they are use to using. (Get out of the way of that elephant!)

Gordon & Betty Moore Foundation + Microsoft Research are funding this.

Our part is requirements gathering; MS will do development. Open source plug in.

21

WAS allows curators to collect and manage web‐published content so g pthat scholars can use the content for private research and/or publish the content for public access.

The archives contain eScience content as well as government documents, event captures, and archives for specific research communities, such as unique data sets, collections of sites not otherwise grouped together, and the sites resulting from grant activity.

22

PUBLIC OR PRIVATE

WAS provides tools for analyzing site change over time and allows keyword searching for archived sites, and publishing an archive is optional. As of this writing, there are 93 active archives, over half of which are publically available.

23






24

We are exploring this idea with various partners and funders, potentially p g p , p yto encourage conventions for describing data so that it can stand alone when appropriate

25






26

DataONE is an NSF funded, virtual data center for biology, ecology, and , gy, gy,environmental sciences.

DataOne has the overarching goal of building a new culture of data access and data sharing. This is an international collaboration working with scientists and librarians, as well as other stakeholders.

1. Engaging the scientist in the data curation process2. Supporting the full data life cycle3. Encouraging data stewardship and sharing4. Promoting best practices5 E i iti5. Engaging citizens6. Developing domain agnostic solutions

27

How can EZID be in the business of issuing DataCite DOIs? California gDigital Library was one of the founding members.

DataCite was indeed formed in 2009 by 10 Libraries and Research Centers with a Mission: “"Helping you find, access, and reuse data“

The number has now grown to 16. In addition there are 3 associate members, including the Korea Institute of Science and Technology Information and BGI, so there is a presence in Asia.

DATACITE’s primary methodology for achieving this mission: issuing DOIs (Di it l Obj t Id tifi ) f d t t(Digital Object Identifiers) for datasets.

29

These are the factors driving the collaboration: g

1. Institutions rely on soft funding… agencies have created a new demand, meet the demand or don’t get funded.

2. Approach is to work collaboratively to consolidate expertise and reduce costs

3. Libraries plus, plus4. Provide an environment that allows researchers to focus on research

30

Image credit:Image credit: http://content.cdlib.org/ark:/13030/kt667nc4xn/?query=service%20station&brand=calisphere, courtesy of Anaheim Public Library

31

Image credit:Image credit: http://content.cdlib.org/ark:/28722/bk0007s853c/?query=tools&brand=calisphere Courtesy of UC Berkeley, Bancroft Library; United Aircraft Corporation: Joint War Production Drive Committee

DATA CURATION LEADS TO GOOD OUTCOMES FOR RESEARCHERS.

• They’ll be motivated routinely to deposit in stable public storage. Data products (datasets and processing information) and the data papers that reward them with authorship credit

• Data journals will spring up around disciplines, even if disciplinary data papers are scattered across geographically distributed repositories.

• Data products will be re‐used, annotated, corrected, d i l li k d t f t diti l bli ti

32

Lots of work going on with data at UCLA, but I’m going to focus on just a couple of them

33

DMP Tool developed in part at UCLA – UCLA is second among Ucs in usage. More enrollees in DMPTool after presentation to administrator group than entire 4 months prior

34

Carly Strasser visit from CDL established interestInitial survey indicated researchers interested in many aspects of data management, especially data management plan

36

Initial results from the test indicate that researchers found the class useful

37

In Summer 2012, UCLA was one of 7 libraries to receive funding to add an informationist to an existing NIH funded research team

38

http://www.flickr.com/photos/sekihan/6100774057/ By sekihanp p y

42

Image source: http://www.flickr.com/photos/ausnahmezustand/4752989186/

By ausnahmezustand

44

collaborative data management at the university of california

Education

curationmicroservices

xl pl

unambiguousconnectiontotheprecisedatausedint

seethedatashare

esipearth

bi ti

currentworkincludesthedatashare

toprovidefaircredittothose