rdaeu russia_fg_1_july2014_final

33
The Research Data Alliance in Europe, an update… EXTREME SCALE SCIENTIFIC COMPUTING WORKSHOP Moscow – 30 June & 1 July 2014 Fabrizio Gagliardi BSC, Spain - ACM Europe Chair

Upload: research-data-alliance

Post on 11-Jun-2015

98 views

Category:

Technology


0 download

DESCRIPTION

Presentation to Russian and other computer scientists by Fabrizio Gagliardi (with many slides from H. Hanahoe and F. Berman)

TRANSCRIPT

Page 1: Rdaeu  russia_fg_1_july2014_final

The Research Data Alliance in Europe, an update…

EXTREME SCALE SCIENTIFIC COMPUTING WORKSHOPMoscow – 30 June & 1 July 2014

Fabrizio Gagliardi BSC, Spain - ACM Europe Chair

Page 2: Rdaeu  russia_fg_1_july2014_final

2

Fabrizio Gagliardi reborn in BSC, Spain After 30 years at CERN in Geneva Many EU projects And last 8 years in Microsoft and Microsoft

Research Long history of projects in Russia on Grid

computing, Big data, HPC and computing vision @ MSU and MSR HPC summer schools 2009-2012

Introduction

Page 3: Rdaeu  russia_fg_1_july2014_final

3

Big data, hype and HPC

“Big data” means different things to different people

(consider Satoshi’s previous talk)

• corporate data are not so big and demanding when compared to scientific data

• social data are large but access is easy and trivially parallel

• scientific data in new research domains like genetics is a bigger challenge

• not true for all scientific data, CERN will produce 100 PB/year starting next year but with easy access and simple processing models, still a very expensive game…

Page 4: Rdaeu  russia_fg_1_july2014_final

4

Horizon2020: Research and Innovation

Horizon 2020 is the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020).

In addition to the private investment that this money will attract. It promises more breakthroughs, discoveries and world-firsts by taking great ideas from the lab to the market.

Page 5: Rdaeu  russia_fg_1_july2014_final

5Research and Innovation

Research AND Innovation, not Research OR Innovation

Research activities with innovation in mind Innovation should have job creation in mind But how to take great ideas from the lab to the

market? What can a research funder do? Which instruments do we have?

Page 6: Rdaeu  russia_fg_1_july2014_final

6job creation is important

Following slides adapted from Joe McKendrick/Forbes, September 2012http://www.smartplanet.com/blog/bulletin/7-new-types-of-jobs-created-by-big-data/682

7 new types of jobs created by Big Data

In today’s unforgiving global economy, those organizations that compete on analytics stand the best chance of outsmarting the competition. The only catch is, they need skilled professionals who know how to manage, mine and draw actionable insights from all the “Big Data” now streaming across enterprises.

Page 7: Rdaeu  russia_fg_1_july2014_final

7job creation is important

1. Data scientists: this emerging role is taking the lead in processing raw data and determining what types of analysis would deliver the best results

2. Data architects: organizations managing Big Data need professionals who will be able to build a data model, and plan out a roadmap of how and when various data sources and analytical tools will come online, and how they will all fit together

3. Data visualizers: organizations need professionals who can “harness the data and put it in context, in layman’s language, exploring what the data means and how it will impact the company”

Page 8: Rdaeu  russia_fg_1_july2014_final

8job creation is important

4. Data change agents: driving “changes in internal operations and processes based on data analytics.” They need to be good communicators, they know how to apply statistics to improve quality on a continuous basis

5. Data engineer/operators: people that make the Big Data infrastructure hum on a day-to-day basis. “They develop the architecture that helps analyse and supply data in the way the business needs, and make sure systems are performing smoothly”

6. Data stewards: ensure that data sources are properly accounted for

7. Data virtualization/cloud specialists: ability to build and maintain a virtualized data service layer; organizations need professionals that can also build and support these virtualized layers or clouds

Page 9: Rdaeu  russia_fg_1_july2014_final

9

network infrastructure, GÉANT

HPC/distributed computing/software infrastructure

scientific data infrastructure

e-infrastructure building bridges

Page 10: Rdaeu  russia_fg_1_july2014_final

10issues to be addressed (e-infrastructure)

The EC in coordination with EU Member States is looking after research data as an infrastructure

As a valuable and a strategic resource, research data opens at least three key issues to be addressed(*): How data can be networked How to envision and set up data governance on a

global scale How the EU can play a leading role in helping start and

steer this global trend

(*) Fred Friend, Jean-Claude Guédon Herbert van Sompel “Beyond Sharing and Re-using: Toward Global Data Networking”

Page 11: Rdaeu  russia_fg_1_july2014_final

11Policy context

A Reinforced European Research Area Partnership for Excellence and Growth, COM(2012) 392 – July 2012

Towards better access to scientific information: boosting the benefits of public investments in research, COM(2012) 401 final - July2012

Commission, Recommendation on access and preservation of scientific information, C(2012) 4890 final – July 2012

Horizon 2020 - Open Access to Scientific Publications - Pilot on research data

Data Management Plan

Open Science

Page 12: Rdaeu  russia_fg_1_july2014_final

12

RESEARCH INFRASTRUCTURE (E-INFRASTRUCTURE HIGHLIHGTED) Work Programme 2014-2015

CALL 1DEVELOPING NEW

WORLD CLASS INFRASTRUCTURES

CALL 2INTEGRATING AND OPENING

RESEARCH INFRASTRUCTURES OF PAN-EUROPEAN

INTEREST

CALL 3E-INFRASTRUCTURES

CALL 4SUPPORT TO INNOVATION,

HUMAN RESOURCES, POLICY AND INTERNATIONAL

COOPERATION FOR RESEARCH

INFRASTRUCTURES

DESIGN STUDIES

SUPPORT TO PREPARATORY PHASE OF ESFRI PROJECTS

SUPPORT TO THE INDIVIDUAL IMPLEMENTATION

AND OPERATION OF ESFRI PROJECTS

SUPPORT TO THE IMPLEMENTATION OF CROSS-CUTTING INFRASTRUCTURE

SERVICES AND SOLUTIONS FOR CLUSTER OF ESFRI AND OTHER RILEVANT RESEARCH INFRASTRUCTURE

INITIATIVES IN A GIVEN THEMATIC AREA

INTEGRATING AND OPENING EXISTING NATIONAL AND REGIONAL RESEARCH INFRASTRUCTURES OF

PAN-EUTROPEAN INTEREST

MANAGING, PRESERVING AND COMPUTING WITH BIG RESERACH DATA

E-INFRASTRUCTURES FOR OPEN ACCESS

TOWARDS GLOBAL DATA E-INFRASTRUCTURES:

RESEARCH DATA ALLIANCE

Pan-European High Performance Computing

infrastructure and services

Centres of Excellence

for Computing applications

Network of HPC Competence Centres for SMEs

PROVISION OF CORE SERVICES

ACROSS E-INFRASTRUCTURES

RESEARCH AND EDUCATION

NETWORKING – GEANT

E-INFRASTRUCTURES FOR VIRTUAL RESEARCH

ENVIRONMENTS (VRE)

INNOVATION SUPPORT

MEASURES

INNOVATIVE PROCUREMENT PILOT ACTION IN THE FIELD OF SCIENTIFIC INSTRUMENTATION

STRENGTHENING THE HUMAN CAPITAL OF

RESEARCH INFRASTRUCTURES

NEW PROFESSIONS AND SKILLS

FOR E-INFRASTRUCTURES

POLICY MEASURES FOR RESEARCH

INFRASTRUCTURES

INTERNATIONAL COOPERATION FOR RESEARCH

INFRASTRUCTURES

E-INFRASTRUCTURE POLICY DEVELOPMENT AND

INTERNATIONAL COOPERATION

NETWORK OF NATIONAL CONTACT

POINTS

CALLS IN 2014DEADLINES SEPT 2014 AND JAN 2015

INITIATIVES STARTING IN 2015 UNTIL 2018

Page 13: Rdaeu  russia_fg_1_july2014_final

Fran Berman

Research Data Driving Solutions to Complex Scientific and Societal Challenges

Who is most at risk to contract asthma?

How can we increase

wheat yields?

How accurate is the Standard Model of Physics?

Image: Lucas Taylor

How can we best address energy needs and sustain the environment?

Image: Ceinturion, Wikipedia

Page 14: Rdaeu  russia_fg_1_july2014_final

Fran Berman

Data-Sharing Driving Innovation Across Sectors and Communities

Page 15: Rdaeu  russia_fg_1_july2014_final

Fran Berman

World-wide Efforts Focusing on Infrastructure to Support Research Data Sharing, Access, Use

Science, Humanities, Arts Communities

E-Infrastructure professionals, data analysts,

data center staff, …Data

Scientists

Libraries, Archives, Repositories, Museums

Page 16: Rdaeu  russia_fg_1_july2014_final

Fran Berman

Institutional Data Sharing Practice

Data Access and Distribution Policy

Data Discovery Tools

Common Metadata Standards

Digital Object Identifiers

Data CitationStandards

Data Analytics Algorithms

Data Preservation Practice

Data Scientists and Expert Support

Sustainable Economic Models

Curation Practice and Policy

Auditing, Certification and Reporting Practice

Fran Berman

Many Infrastructure Building Blocks Needed to Accelerate Progress

Data Use and

Re-use

Data Discovery and Data Sharing

Research Dissemination and

Reproducibility

Data Access (now) and

Preservation (later)

Page 17: Rdaeu  russia_fg_1_july2014_final

Fran Berman

Research Data Alliance Created to Accelerate Development of Research

Data Sharing Infrastructure Worldwide

RDA community efforts focus on building social, organizational and technical infrastructure to

reduce barriers to data sharing and exchange

accelerate the development of coordinated global data infrastructure

RDA and RDA/US are supported in part by the National Science Foundation.

Page 18: Rdaeu  russia_fg_1_july2014_final

Fran Berman

RDA Approach: CREATE ADOPT USE

RDA Members come together as

• Working Groups – 12-18 month efforts to build, adopt, and use specific pieces of infrastructure

• Interest Groups – longer-lived discussion forums that spawn Working Groups as specific pieces of needed infrastructure are identified.

Working Group efforts focus on the development and use of data sharing infrastructure

• Code, policy, infrastructure, standards, or best practices that are adopted and used by communities to enable data sharing

• “Harvestable” efforts for which 12-18 months of work can eliminate a roadblock

• Efforts that have substantive applicability to groups within the data community, but may not apply to everyone

• Efforts for which working scientists and researchers can start today

RDA and RDA/US are supported in part by the National Science Foundation.

Page 19: Rdaeu  russia_fg_1_july2014_final

Fran Berman

Precipitous Growth

RDA Launch / First Plenary

March 2013

RDA SecondPlenary

September 2013

RDA ThirdPlenary

March 2014First RDA

organizational telecon: August 2012

Global Data Planning Meeting: October 2012

First Working Groups and Interest Groups

240 participants

First “neutral space” community meeting (Data Citation Summit)

First Org. Partner Meet-up

First BOFs

380 participants from 22 countries

RDA FourthPlenary

September 2014

First Organizational Assembly

6 co-located events

14 BOF, 12 Working Groups, 22 Interest Groups

497 participants

Amsterdam

First Working Group exchange meeting

RDA Plenary 2Washington, DC

RDA Plenary 1 / LaunchGothenburg, Sweden

RDA Plenary 3Dublin, Ireland

RDA and RDA/US are supported in part by the National Science Foundation.

Page 20: Rdaeu  russia_fg_1_july2014_final

Fran Berman

Map courtesy traveltip.org

Austral-pacific

4%

Africa2% South

America1%

The RDA Community Today: Over 1850 members from 80+ countries

(as of 6/14)

Asia4%

RDA and RDA/US are supported in part by the National Science Foundation.

Page 21: Rdaeu  russia_fg_1_july2014_final

Fran Berman

RDA Interest (IG) and Working Groups (WG) by Focus (as of 6/14)

Domain Science - focused• Toxicogenomics

Interoperability IG• Structural Biology IG• Biodiversity Data

Integration IG• Agricultural Data

Interoperability IG• Wheat Data Interoperability WG• Digital Practices in History and

Ethnography IG• Defining Urban Data Exchange

for Science IG• Geospatial IG• Marine Data Harmonization IG• RDA/CODATA Materials Data

Infrastructure and Interoperability IG

• Research Data Needs of the Photon and Neutron Science Community IG

Data Stewardship - focused• Research Data Provenance

IG• RDA/WDS Certification of

Digital Repositories IG• Preservation e-

infrastructure IG

• Long-tail of Research Data IG

• RDA/WDS Publishing Data IG

• RDA/WDS Repository Audit and Certification Working Group

• Domain Repositories Interest Group

Reference and Sharing - focused• Data Citation WG• Standardization of Data Categories and

Codes WG

• RDA/CODATA Legal Interoperability IG• Data Description Registry Interoperability

Working Group

Community Needs - focused• Community Capability Model

IG• Engagement IG• Development of Cloud

Computing Capacity and Education in Developing World Research IG

• Ethics and Social Aspects of Data IG

Base Infrastructure - focused• Data Foundation and Terminology WG• Metadata Standards Directory WG• Practical Policy WG• PID Information Types WG• Data Type Registries WG

• Data in Context IG• Big Data Analytics IG• Data Brokering IG• Federated Identity Management IG• Metadata IG• PID Interest Group• Service Management IG

Page 22: Rdaeu  russia_fg_1_july2014_final

Fran Berman

RDA/US Goals: Contribute to RDA

“international” efforts and leadership

Bring US efforts to broader RDA community

Build the RDA community within the US

Leverage and implement RDA deliverables in the US to amplify impact

Collaborate closely with other RDA “regions” on key programs and initiatives

RDA/US: Collaborate Globally, Contribute Locally

RDA and RDA/US are supported in part by the National Science Foundation.

NSF-supported RDA/US initiatives:• Outreach (RDA RDA/US)• RDA Deliverables

Amplification• Student / Early Career

Engagement

RDA/US Steering Committee• Fran Berman, RPI• Larry Lannom, CNRI• Mark Parsons, RPI• Beth Plale, IU

RDA US membership (yellow states)

Page 23: Rdaeu  russia_fg_1_july2014_final

23The European plug-in to RDA …

RDA Europe Forum – strategic advice RDA Europe Science Workshops –

interaction & feedback from target audience

RDA Europe national & pan-European outreach – to engage new members & disseminate outputs

RDA Europe policy report – to support European policy-makers & funders

RDA Europe, the European plug-in to the global RDA, supports RDA global and brings European voice to the table

Page 24: Rdaeu  russia_fg_1_july2014_final

24Europe as a Global Partner

Societal challenges of our time transcend borders Data and computing intensive science is made of

global collaborations Research data are global Research Data Alliance: enable data exchange at

global scale

Page 25: Rdaeu  russia_fg_1_july2014_final

25

Domain initiatives are very important Marine data sharing – Southern Ocean Observing

System Genetic data sharing – human genome project Astronomy – SKA CERN LHC

But domain initiatives will not necessarily enable bridges to be constructed across disciplines, time, and industry

So the EC, the USA, and Australia committed resources to forming the Research Data Alliance

International

Page 26: Rdaeu  russia_fg_1_july2014_final

26

RDA has so far not got enough traction with the HPC big data and computer science communities

This will need to be addressed urgently since the HPC community dealing with Big Data will need a close interaction with application user communities, support from the policy makers at national and international level and of course adequate financial support by the relevant funding agencies

Important therefore to work together… And link with relevant other initiatives such as NDS in

the US (presented by Ed Seidel yesterday) and such as EUDAT in EU

Relation to HPC

Page 27: Rdaeu  russia_fg_1_july2014_final

27

“We are taking our work beyond Europe's borders, to reach global scale. To make the scientific resources of the world work together, interoperating and open to discovery. For example we are working with partners like the US and Australia in the Research Data Alliance to make scientific progress broader, deeper and more workable”. Neelie Kroes, Vice-President of the European Commission responsible for the Digital Agenda - Open Access to science and data = cash and economic bonanza, 19 November 2013

Why a Research Data Alliance?

… So much to gain from collaboration …

Page 28: Rdaeu  russia_fg_1_july2014_final

28

SAVE THE DATE

Page 29: Rdaeu  russia_fg_1_july2014_final

29

Info:

[email protected]

Fran Berman

Page 30: Rdaeu  russia_fg_1_july2014_final

30

Input to this presentation kindly provided by Fran Berman, Hilary Hanahoe and public presentations by EC officials

But the opinions expressed in this talk are under my entire responsibility as any mistake or omission

Thanks for your attention!

Acknowledgments

Page 31: Rdaeu  russia_fg_1_july2014_final

31

Resources

Page 32: Rdaeu  russia_fg_1_july2014_final

32First RDA Infrastructure Deliverables Coming this Fall

Data Type Registries WG• Deliverables: System of data type

registries, formal model for describing types, working model of a registry.

• Initial Adopters and Users: CNRI, International DOI Foundation, Deep Carbon Observatory

Practical Code Policies• Deliverables: Survey of policies in

production use, testbed of machine actionable policies, deployment of 5 policy sets, policy starter kits

• Initial Adopters and Users: RENCI, DataNet Federation Consortium, CESNET, Odum Institute, EUDAT

Persistent Identifier Information Types• Deliverables: Minimal set of PID types, API

• Initial Adopters and Users: Data Conservancy, DKRZ

Language Codes• Deliverables: Operationalization of ISO

language categories for repositories.

• Initial Adopters and Users: Language Archive, Paradisec

Data Foundations and Terminology• Deliverables: Common vocabulary for

data terms, formal definitions and open registry for data terms

• Initial Adopters and Users: EUDAT, DKRZ, Deep Carbon Observatory, CLARIN, EPOS

Metadata Standards• Deliverables: Use cases and prototype

directory of current metadata standards starting from DCC directory

• Initial Adopters and Users: JISC, DataOne

Page 33: Rdaeu  russia_fg_1_july2014_final

Fran Berman

Next Steps for the RDA

Continuing pipeline of infrastructure deliverables adopted and used to accelerate data sharing

Increasing coordination of infrastructure

Increasing cross-boundary collaborations between domains, sectors, organizations

International and regional programs focusing on workforce, outreach, expansion of infrastructure impact

New partners in the Organizational Assembly

Focused strategy to support development of industry infrastructure for data sharing

More Infrastructure

Focus on Industry

Synergistic Programs

Effective Community

RDA/US is supported in part by the National Science Foundation.