which role does datacite play in researchers’ data sharing ... · a closer look at a subset of...

19
Jonathan Dudek, Philippe Mongeon, Josephine Bergmans, Ingeborg Meijer January - 2019 Which role does DataCite play in researchers’ data sharing and data (re)use practices? Open Science Monitor Case Study

Upload: others

Post on 31-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

Jonathan Dudek, Philippe Mongeon, Josephine Bergmans, Ingeborg Meijer January - 2019

Which role does DataCite play in researchers’ data sharing and data (re)use practices?

Open Science Monitor Case Study

Page 2: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

DataCite - Open Science Monitor Case Study

European Commission Directorate-General for Research and Innovation Directorate A — Policy Development and Coordination Unit A.2 — Open Data Policy and Science Cloud E-mail [email protected] [email protected] European Commission B-1049 Brussels

Manuscript completed in January 2019.

This document has been prepared for the European Commission however it reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

More information on the European Union is available on the internet (http://europa.eu).

Luxembourg: Publications Office of the European Union, 2019

EN PDF ISBN 978-92-76-00855-2 doi: 10.2777/222101 KI-01-19-255-EN-N

© European Union, 2019. Reuse is authorised provided the source is acknowledged. The reuse policy of European Commission documents is regulated by Decision 2011/833/EU (OJ L 330, 14.12.2011, p. 39).

For any use or reproduction of photos or other material that is not under the EU copyright, permission must be sought directly from the copyright holders.

Page 3: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

EUROPEAN COMMISSION

Which role does DataCite play in researchers’ data sharing and data (re)use practices?

Open Science Monitor Case Study

2019 Directorate-General for Research and Innovation EN

Page 4: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

Table of contents

1 Introduction ............................................................................................................. 6

2 Background ............................................................................................................. 6

3 Drivers.................................................................................................................... 9

4 Barriers ................................................................................................................... 9

5 Impact .................................................................................................................. 10

6 Lessons learnt ........................................................................................................ 15

7 Policy conclusions ................................................................................................... 16

References.................................................................................................................... 17

Page 5: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

ACKNOWLEDGEMENTS

The case study part of Open Science Monitor led by the Lisbon Council together with CWTS, ESADE and Elsevier.

Authors

Jonathan Dudek – CWTS Leiden University

Philippe Mongeon – CFA, Aarhus University

Josephine Bergmans – CWTS Leiden University

Ingeborg Meijer – CWTS Leiden University

Acknowledgements

The study team would like to thank the interviewee of DataCite for the contribution to this case study as an interview partner.

Disclaimer: The information and views set out in this study report are those of the author(s) and do not necessarily reflect the official opinion of the Commission. The Commission does not guarantee the accuracy of the data included in this case study. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which may be made of the information contained therein.

Page 6: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

WHICH ROLE DOES DATACITE PLAY IN RESEARCHERS’ DATA SHARING AND DATA (RE)USE PRACTICES?

In summary, datasets in DataCite are unevenly distributed, contributors to one dataset are unevenly distributed, reuse is limited or not traceable, and if reuse takes place, it is mainly by the dataset producers. This suggest that when one of the biggest organisations in support of research data, only provides a very skewed insight into data-production, curation and reuse, there still is a long way to go. Researchers in the field of ocean science mainly use it as a place to store data, but not necessarily to reuse or cite is. This is a cultural issue, relating to the credit and reward system in science. It also suggests a big gap between policy and practice.

1 Introduction

This case study analyses DataCite: an international non-for-profit organization founded in 2009, which aims to improve data citation by providing persistent identifiers (Digital Object Identifiers, DOIs) for research data. DataCite serves as a central point for organizing research data and making it accessible and citable. The general objective of this case study is to investigate to what extent DataCite influences researchers’ behavior. First the background of the DataCite infrastructure is presented, followed by a reflection on the mission, drivers and challenges of DataCite. Then, we take a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production, sharing, and (re)use of datasets using DataCite records and a bibliographic database such as Scopus.

The latter is done by analysing DataCite metadata for datasets from one data repository and originator (IFREMER) in one field of science (ocean science); the individual and institutional actors involved in producing the datasets and making them available; the frequency of (re)use of those datasets; and the relationship between the creators and the (re)users (i.e. by asking: how do entities creating and using datasets differ or converge?).

2 Background

Though there are diverse methods to collect and analyze data, data are an inevitable part of doing science. Data sharing and open accessibility has been a topic of debate the past couple of decades, especially when research is funded publicly. In the context of Open Science there is a strong focus on Open Access (OA) publishing of academic articles. However, not much attention is being paid to the promotion of data sharing and referencing in this plan.

DataCite, as a non-profit organization, aims to improve data referencing by providing persistent identifiers (Digital Object Identifiers) for research data (and also other types of research artefacts). Originating from the idea that sharing data is a crucial part in transparent research processes and that the acceleration of science can more easily be accomplished when data is being made available and (re)used, DataCite has started to assign DOIs to datasets about 10 years ago. It tries to meet the needs of the international research community by supporting researchers to confidently locate, identify, and cite research data. Hereby, DataCite supports data centers with providing workflows and an infrastructure to identify and cite datasets, it provides support for publishers by enabling links of research papers to underlying data or other research artefacts, and it supports funding agencies by helping them to understand the reach and impact of their funding.

Page 7: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

DataCite believes to play a key role in the entire (research) data infrastructure. This infrastructure is essential as far as enabling the sharing and (re)use of data helps to save time and money and thus increases the efficiency of the scientific enterprise. DataCite currently is one of the most comprehensive sources available for open data with over 16 million data records as of January 2019, of those more than 13 million records with searchable metadata. Robinson-Garcia, Mongeon, Jeng and Costas (2017) have found out that 42% of the data on DataCite are categorized as datasets, 18% as text, 14% as images, and 7% as a collection. In total 762 data centers can be identified in DataCite. However, the distribution of records across data centers is uneven: 2% of the data centers cover more than 80% of all DataCite records. Figure 1 shows the top ten datacenters in terms of the number of datasets (based on the CWTS version of DataCite metadata, dating back to April 2018). It demonstrates how unequal datasets are distributed across datacenters, with half of all datasets recorded originating from two datacenters only.

Figure 1. Top ten of datacenters according to share of all datasets. N = 2,576,420

Figure 2 displays a list of the data types provided by the top 20 data centers in DataCite. Different types of institutions provide data in DataCite: ranging from for instance thematic repositories to research bodies, to scientific publishers, and to firms. Also, there is some inconsistency in the way data centers classify their records in DataCite. For instance, the data provided by Data-Planet is categorized as ‘other’. Looking at the subtype Robinson-Garcia et. al (2017) found that the data is listed as ‘data sheets’, which could perhaps be considered as datasets. Quite some data centers can be found which provide physical objects and some other data centers only provide text records.

2%

2%

3%

4%

4%

6%

7%

8%

24%

26%

0% 5% 10% 15% 20% 25% 30%

DRYAD

NRCT Data Center

Environmental Data Initiative

Partnership for Interdisciplinary Studies of Coastal Oceans(PISCO)

Pitt Quantum Repository

figshare Academic Research System

Global Biodiversity Information Facility

PANGAEA

The Cambridge Crystallographic Data Centre

Plutof. Data Management and Publishing Platform

Share of all datasets

Dat

acen

ter

Page 8: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

Figure 2. Type of data provided by the top 20 data centers in DataCite.

Reprinted from “DataCite as a novel bibliometric source: Coverage, strengths and limitations,” by Robinson-Garcia, N., Mongeon, P., Jeng, W., and Costas, R., 2017, Journal of Informetrics, 11(3), 846. Copyright 2017 by Elsevier Ltd.

Data centers in the USA have the most open data records in DataCite (47%), followed by the UK (19%), Estonia (13%) and Germany (13%). Data centers in other countries have a relatively small (rounded) percentage of open data records: Tanzania, Austria, and Uruguay belong to the countries with the smallest percentage of open data records as is displayed in table 1.

Table 1. Countries of data centers and its Open Data records (rounded numbers).

Country of data center

Open Data records % of records Country of data center Open Data records

% of records

USA 1,728,481 47% New Zealand 1,215 0%

United Kingdom 689,334 19% Romania 487 0%

Estonia 487,359 13% Czech Republic 470 0%

Germany 476,326 13% China 219 0%

Switzerland 359,776 10% South Africa 98 0%

Denmark 135,827 4% Belgium 84 0%

Canada 80,383 2% Ghana 52 0%

Page 9: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

3 Drivers

DataCite is an international non-for-profit organization founded in 2009. It is a consortium of public research institutions, funding bodies and publishers worldwide whose mission is to promote open research data accessibility and tracking. Ten years ago, DataCite was established following a need for one coordination point in bringing together persistent identifiers with which you can find, identify and cite data. There was a need for new content types to have DOIs, the recognition of those new content types, but also libraries wanting to deliver additional data services.

The main driving force behind DataCite is the need to increase transparency in research by making data available, thus showing the progress made in research. Fulfilling this goal, DataCite is funded through membership fees and grants.

The establishment of DataCite is not only about providing an infrastructure in order to allow people to work with data, it is about improving the entire research process as well. Since DataCite can count views, downloads, and citations, they try to play a role in developing data metrics, even though no credits are assigned to data sharing.

4 Barriers

One of the main challenges for DataCite is to continuously expand its database of members. It is important to find ways of collaboration with all new types of members, for instance individuals or commercial organisations, to represent them well, and to make sure that sufficient input is retrieved from all of them.

Another challenge is to set the right priorities in the development of the infrastructure. At the moment, the main focus is on improving a service recently made available aiming to provide insight in the relation between different persistent identifiers.

Being involved in developing data metrics is also important for DataCite. At the same time, it is not always easy to explain why costs are connected to the development of an infrastructure such as DataCite, even though people are aware of the importance of a data infrastructure.

The biggest challenge the scientific community faces in this context is that there is no reason for researchers to share data because they are not rewarded for this. As the existence of a data sharing culture in science is crucial, policies should be put in place that promote the development of such a culture and of a reward system for data sharing.

Thailand 54,090 1% Unknown 50 0%

Netherlands 40,330 1% Spain 37 0%

France 6,373 0% Poland 23 0%

Australia 4,731 0% Hungary 14 0%

Ireland 3,731 0% Tanzania 9 0%

Sweden 2,775 0% Austria 5 0%

Italy 2,096 0% Uruguay 1 0%

Total distinct 3,704,098 100%

Page 10: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

One of the central elements that DataCite brings to the overall data sharing infrastructure is the attribution of DOIs for individual datasets. Still, it is also important that metadata related to DOIs is persistent and of usable quality, and that connections can be made between datasets and other forms of scientific output.

5 Impact

5.1 For Science

5.1.1 The case of the IFREMER data center

To provide an example of how datasets listed by DataCite relate to scientific work, we take a closer look at a subset of DataCite records. More specifically, we investigate how researchers use this infrastructure and reuse the data in it for their research.

Each dataset listed by DataCite originates from a so-called data center. Those data centers do not necessarily have to be entities exclusively dedicated to preserving data. Instead, the term includes data repositories, as well as libraries, research centers, and publishers.

For the purpose of this study, a distinct data center in the field of ocean science was selected so as to get insights into the data sharing practices of a single institution. The field of ocean science has significant data-related activity, as observed from entries in DataCite; on the other hand, it is a field of global reach and relevance to the Sustainable Development Goals.

It was also important for our case study to choose a data center for which there is some indication of data (re)use (i.e., references to the datasets in the scientific literature). A preliminary inquiry showed that datasets by the Institut Français de Recherche pour l’Exploitation de la Mer (IFREMER) received the most citations of all ocean science data centers, hence, it was selected.

IFREMER is listed as a DataCite data center since 2013 and is a French research institute that manages oceanographic databases and designs and implements tools for the observation, experimentation and monitoring of the marine environment. IFREMER addresses societal challenges around climate change effects, marine biodiversity, pollution prevention, and seafood quality. It allows the scientific community to have access to the development, management and distribution of large research infrastructures, such as fleets, computational resources, data centers, testing facilities, and laboratories. (The Institute, 2018)

We collected all 186 IFREMER datasets included in the CWTS version of DataCite, which dates to April 2018. As a second source, metadata for IFREMER-datasets was retrieved manually to collect additional data on affiliations of authors of datasets, which are not included in metadata directly obtainable from DataCite. For a detailed discussion of metadata provided by DataCite, refer to Robinson-Garcia et al. (2017). IFREMER-datasets were registered with DataCite beginning in 2014; the years of publication of datasets and the times of registration in the DataCite records differ (see Figure 3).

Page 11: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

Figure 3. IFREMER datasets per year of publication and per year of registration with DataCite.

For 134 (72%) of the datasets metadata was provided in English; metadata for the remaining 52 (28%) datasets was in French. Most datasets (64%) included were labeled with a Creative Commons (CC) license. CC-licenses specify in which contexts and how intellectual work can legally be (re)used.1 For the remaining datasets, licenses were not explicitly stated; however, verbal statements on (re)use possibilities of datasets were provided in almost all cases. Figure 4 shows the share of datasets by license type and language. License types are ordered from the least restrictive (CC0) to the most restrictive (BY-NC-ND).

Figure 4. Shares of datasets per CC-licensing type and language of datasets.

1 https://creativecommons.org/about/

1 1 1 1

11

23

75

31

17

25

0 0 0 0 0

20

6155

9

41

0

10

20

30

40

50

60

70

80

2000 2004 2009 2012 2013 2014 2015 2016 2017 2018

Num

ber

of d

atas

ets

Year

Publishing year Registering year

1%

44%

8% 9%

2%4%

33%

1%

46%

7% 6%3%

6%

31%

0%

25%

6%

12%

0% 0%

58%

0%

10%

20%

30%

40%

50%

60%

70%

CC0 BY BY-NC BY-NC-SA BY-ND BY-NC-ND no licenseprovided

Sha

re o

f da

tase

ts

total EN datasets FR datasets

Page 12: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

5.2 For Science and Industry

In this section, we analyse the link between IFREMER datasets and publications in a bibliographic database such as Scopus. An overview is provided of the variety of contributors to datasets covered by DataCite, quantifying those contributions. Subsequently, the frequency and the context of data (re)use is investigated. More specifically, we intend to determine to which extent creators/contributors and (re)users of datasets are identical or different.

Although IFREMER is specified as the data center of datasets, not all datasets stemmed from IFREMER directly. Instead, various publishing entities as well as data repositories acted as further intermediaries. One of the most pronounced institutions in this regard was SEANOE, a publisher of scientific data in the field of marine sciences. (About SEANOE, n.d.) Altogether, 103 (55%) datasets originated from this publisher. A large part of the other data-contributors/publishers is IFREMER-related. For the publishing institutions of the remaining datasets, see Figure 5.

Figure 5. Numbers of datasets per contributor.

Authors are not necessarily affiliated with the institution serving as the publisher of a dataset. Since many datasets are results of team efforts, author teams with very mixed affiliation backgrounds can be observed. Unsurprisingly, IFREMER is the most prominent affiliation, with a total of 123 datasets, and a total of 133 authors being affiliated to it or to a subsidiary organisation of IFREMER. Figure 6a and 6b respectively show the top ten institutional affiliations with the most authors and the top ten institutional affiliations with the most shared datasets. Overall, we find that the distribution of datasets is very skewed, with a very important share of the datasets originating from a single institution.

0 20 40 60 80 100 120

Coriolis data centre

Ifremer - Cellule d'Administration Quadrige²

Ifremer / Dyneco-Vigies

Ifremer centre de Méditerranée

TAAF

CATDS (CNES, IFREMER, CESBIO)

CATDS (CNES, IFREMER, LOCEAN)

CATDS (CNES, IFREMER, LOCEAN, ACRI)

Délégation Ifremer océan Indien

Ifremer - Géosciences marines

IFREMER

Sismer

IFREMER / IDM/SISMER

SEANOE

Number of Datasets

Publ

ishi

ng I

nstit

utio

n

Page 13: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

Figure 6a. Top ten affiliations of authors.

Figure 6b. Top ten affiliations of datasets.

A total of 284 distinct authors was assigned to the datasets observed. Datasets usually are results of several contributing investigators: on average, a dataset has four authors. A few authors are highly prevalent among datasets, with three of them (co-)authoring more than 50 datasets.

5.2.1 Reuse

One of the missions of DataCite is to promote proper referencing of datasets in scholarly articles and other types of documents. Thus, we sought empirical evidence of usage of

3

4

4

6

6

7

9

13

14

133

0 20 40 60 80 100 120 140

University of Bordeaux, UMR 5805 EPOC

IRD UMS-Imago

Università Ca’ Foscari

Université de la Rochelle

CNRS, France

University of Tasmania

LOCEAN

Laboratoire d’Océanographie de Villefranche (LOV)

CNRS, IPGP

Ifremer

Number of authors

Aff

iliat

ion,

top

ten

4

4

4

6

7

7

8

11

55

123

0 20 40 60 80 100 120 140

ENSTA ParisTech

MIO

CEFREM

University of the Azores

Université de la Rochelle

Laboratoire d’Océanographie de Villefranche (LOV)

CNRS

LOCEAN

CNRS, IPGP

Ifremer

Number of datasets

Aff

iliat

ion,

top

ten

Page 14: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

IFREMER datasets by looking at the cited references of all documents indexed in the Scopus database. Overall, we identified 43 such references in 12 distinct Scopus documents.

The results show that references to IFREMER datasets are quite rare, with an average of only 0.23 references per dataset. Furthermore, those few references are highly concentrated, with one single dataset out of the 12 cited datasets accumulating 30 out of 43 (70%) of all references.

Previous work (Park, You, & Wolfram, 2018) has found that (re)used datasets are often not listed in the references, but are rather mentioned in the articles’ text or acknowledgements. A search for mentions of IFREMER datasets in abstracts of Scopus articles with the two keywords “dataset” and “IFREMER” returned 21 entries. The same keyword search in Web of Science acknowledgements returned 1000 entries. This shows that there is a potential for discovering mentions of datasets in abstracts and acknowledgement texts of publications beyond what can be captured based on formal citations in publications.

We further provide an overview of the dataset (re)users and their relationship with the data producers/creators. The dataset usage indicate the number of papers from the Scopus database in which a reference to an IFREMER dataset was made. In total, 208 different authors were found citing IFREMER datasets, originating from 77 different research organizations. The top ten of those organizations are listed in Figure 7.

Figure 7. Top ten affiliations of authors citing datasets.

We find that, just like the data producers, the (re)use of datasets is highly concentrated: of all organizations serving as affiliations, a small number is responsible for most of the identified instances of data (re)use. In this case, it is IFREMER leading the list, with a total of 36 affiliated authors (17% of all citing authors). Other users are a range of globally spread institutions (from China and Japan to USA), consisting mainly of universities and global (NOAA) or national public institutes (e.g. meteo france). Industrial partners are not shown.

A further analysis investigated the overlap of authors of datasets and citing authors. Nine out of the twelve datasets cited share at least one author with the publication it is cited by; of the 208 unique citing authors, 31 (15%) are also authors of datasets.

4

5

6

6

6

6

7

7

16

36

0 5 10 15 20 25 30 35 40

japan agcy marine earth sci & technol

univ toulouse

meteo france

natl oceanog ctr

noaa

univ washington

met office

ocean univ china

univ paris 06

ifremer

Number of citing authors

Org

aniz

atio

n of

citi

ng a

utho

rs

Page 15: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

5.3 For Society

DataCite’s impact on society is still indirect by help making data available in the right manner. For instance, the Higgs Boson (the Higgs particle helps explain why other particles get their mass) has a DataCite DOI. This knowledge is available for everyone. Furthermore, DataCite does not only have universities as members, but for instance also funding organizations and individuals. This is considered beneficial by DataCite because in this way different types of organizations are concerned with the governance and strategy of DataCite.

6 Lessons learnt

This case study provides insights into some of the challenges of providing a technical infrastructure for research data. Those challenges refer to how openly available research data can be (re)used, but also, in how far metadata available for this research data is useful for informing about the origins of datasets. Here we reflect on the technical limitations encountered, and on the policy consequences of the findings. Finally, in the light of the latter we discuss the role of DataCite for Open Science.

In realizing its objectives DataCite could still make some steps. The aim to improve data referencing by providing persistent identifiers for research data works on the one hand, because data centers do join DataCite, although the number of members could still be increased. Together with a lack of culture in academia to cite data, the question is whether DataCite will be found and used by scientists, especially if they don’t know about its existence.

When looking into the case of IFREMER it appears that the (re)use of datasets is highly concentrated: a small number of organizations is responsible for most of the identified instances of data (re)use. Furthermore, there is quite some overlap in authors producing the datasets and the citing authors of datasets in the case of IFREMER.

The technical limitations encountered are the following: The biggest challenge in measuring and assessing output listed by DataCite is what we call a lack of data control. This study shows that information cannot be assembled in a consistent and complete manner for all datasets alike. Apparently, metadata obtainable from DataCite depends on how it is made available by providers of datasets, the publishers. This applies for the naming of entities of origin, for the acquisition of author names, and for the indication of open access licenses. Furthermore, affiliations of authors of datasets are not obtainable from DataCite, thus making it impossible to arrive at adequate conclusions at the level of organizations of origin.

Starting from the assumption that not all (re)use of datasets leads to trackable and listed references, the second major limitation concerns the assessment thereof. As far as a standardized extraction and collection of mentions of datasets beyond references is not feasible, (re)use cannot be captured completely. This applies to mentions of datasets in abstracts or the acknowledgements of publications or mentions in publications’ text bodies. Consequently, reuse analysis currently does not represent a thorough picture, diminishing insights into the interaction with (openly available) datasets.

In summary, datasets in DataCite are unevenly distributed, contributors to one dataset are unevenly distributed, reuse is limited or not traceable, and if reuse takes place, it is mainly by the dataset producers. This suggest that when one of the biggest organisations in support of research data, only provides a very skewed insight into data-production, curation and reuse, there still is a long way to go. Researchers in the field of ocean science mainly use it as a place to store data, but not necessarily to reuse or cite is. This is a

Page 16: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

cultural issue, relating to the credit and reward system in science. It also suggests a big gap between policy and practice.

7 Policy conclusions

At this point there is still a long way to go before we can set the necessary theoretical and empirical foundations of data production, sharing and (re)use indicators at a global (or European) scale. In this situation, policy makers can spur progress in the direction of both facilitated data (re)use and the evaluation of openly used research data by embarking on three specific actions.

Firstly, caution towards research evaluation on the mere basis of indicators of data sharing and (re)use should be maintained. The current state of access to metadata does not substantiate the creation of meaningful indicators.

Secondly, promoting the further development of data infrastructures is crucial; in particular the importance of support of the development of data referencing standards. The latter will barely be achieved without consistent frameworks to organize datasets into, and without ways of demonstrating the retrievability of meaningful metadata indicators to researchers and research organizations alike. Only then, we expect, will initiatives raising awareness for the adequate mentioning and referencing of (re)used datasets fall on fruitful ground.

More generally, the promotion of data citation is of crucial importance. Most scholars do recognize that data is an important component of science. However, data citation and data sharing are not a common practice in academia and industry. Since there is no reward system for data citation and data sharing for scholars, there is no extrinsic motivation, which makes them do so. DataCite tries to pave the way for scholars to provide an infrastructure of cited and shared data and believes that by paving the way more and more scholars and data centers will find them. Still, the lack of a culture of data sharing and (re)use remains an important issue. This cultural change could be started, for example, by providing PhD candidates with an Open Science training which eventually could be the beginning of a new way of thinking and working in academia.

Concluding, both metadata heterogeneity and missing trackability of dataset usage result in a considerable impairment on the way towards open data. With DataCite as a major actor and infrastructure provider, these concerns, fortunately, can be directed and be perceived as driving urges towards a further development of this platform.

While improvements to the technological infrastructures and data quality are necessary and may help foster a culture of data sharing and (re)use, it may be useful to also take the opposite view: it may be the lack of a data sharing and (re)use culture that contributes to poor quality of data infrastructures itself. However, to know more about those mechanisms would require additional insights into specific fields such as the ocean sciences, with a focus on data sharing and (re)use practices.

The impact of a platform like DataCite in the scientific community is not easily grasped. Still, improvements in the structure and presentation of datasets may eventually lead to advances for open data in general and a greater recognition of the role played by DataCite. Such endeavors should thus be encouraged and supported if we are to live up to the idea of Open Science.

Page 17: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

References

About SEANOE (n.d.). Retrieved from https://www.seanoe.org/html/about.htm

cOAlition S (2018). Making full and immediate Open Access a reality. Retrieved from: https://www.nwo.nl/actueel/nieuws/2018/11/coalition-s-maakt-richtlijnen-voor-implementatie-plan-s-bekend.html

Park, H., You, S., & Wolfram, D. (2018). Informal data citation for data sharing and reuse is more common than formal data citation in biomedical fields. Journal of the Association for Information Science and Technology, 69(11), 1346–1354. https://doi.org/10.1002/asi.24049

Robinson-Garcia, N., Mongeon, P., Jeng, W., & Costas, R. (2017). DataCite as a novel bibliometric source: Coverage, strengths and limitations. Journal of Informetrics, 11(3), 841-854. https://doi.org/10.1016/j.joi.2017.07.003

Stephane Berghmans, Helena Cousijn, Gemma Deakin, Ingeborg Meijer, Adrian Mulligan, Andrew Plume, Sarah de Rijcke, Alex Rushforth, Clifford Tatum, Thed van Leeuwen, Ludo Waltman (2017). Open Data, a researcher perspective. http://dx.doi.org/10.17632/bwrnfb4bvh.1

The Institute. (2018). Retrieved from https://wwz.ifremer.fr/en/The-Ins

Page 18: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

Getting in touch with the EU IN PERSON All over the European Union there are hundreds of Europe Direct Information Centres. You can find the address of the centre nearest you at: http://europa.eu/contact ON THE PHONE OR BY E-MAIL Europe Direct is a service that answers your questions about the European Union. You can contact this service – by freephone: 00 800 6 7 8 9 10 11 (certain operators may charge for these calls), – at the following standard number: +32 22999696 or – by electronic mail via: http://europa.eu/contact Finding information about the EU ONLINE Information about the European Union in all the official languages of the EU is available on the Europa website at: http://europa.eu EU PUBLICATIONS You can download or order free and priced EU publications from EU Bookshop at: http://bookshop.europa.eu. Multiple copies of free publications may be obtained by contacting Europe Direct or your local information centre (see http://europa.eu/contact) EU LAW AND RELATED DOCUMENTS For access to legal information from the EU, including all EU law since 1951 in all the official language versions, go to EUR-Lex at: http://eur-lex.europa.eu OPEN DATA FROM THE EU The EU Open Data Portal (http://data.europa.eu/euodp/en/data) provides access to datasets from the EU. Data can be downloaded and reused for free, both for commercial and non-commercial purposes.

Page 19: Which role does DataCite play in researchers’ data sharing ... · a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production,

This case study analyses DataCite: an international non-for-profit organization founded in 2009, which aims to improve data citation by providing persistent identifiers (Digital Object Identifiers, DOIs) for research data. DataCite serves as a central point for organising research data and making it accessible and citable. The general objective of this case study is to investigate to what extent DataCite influences researchers’ behaviour. First the background of the DataCite infrastructure is presented, followed by a reflection on the mission, drivers and challenges of DataCite. Then, the study takes a closer look at a subset of DataCite records in order to determine how researchers use such facility in the production, sharing, and (re)use of datasets using DataCite records and a bibliographic database such as Scopus.

Studies and reports

[Catalo

gue num

ber]