rdaeu russia_fg_1_july2014_final
DESCRIPTION
Presentation to Russian and other computer scientists by Fabrizio Gagliardi (with many slides from H. Hanahoe and F. Berman)TRANSCRIPT
The Research Data Alliance in Europe, an update…
EXTREME SCALE SCIENTIFIC COMPUTING WORKSHOPMoscow – 30 June & 1 July 2014
Fabrizio Gagliardi BSC, Spain - ACM Europe Chair
2
Fabrizio Gagliardi reborn in BSC, Spain After 30 years at CERN in Geneva Many EU projects And last 8 years in Microsoft and Microsoft
Research Long history of projects in Russia on Grid
computing, Big data, HPC and computing vision @ MSU and MSR HPC summer schools 2009-2012
Introduction
3
Big data, hype and HPC
“Big data” means different things to different people
(consider Satoshi’s previous talk)
• corporate data are not so big and demanding when compared to scientific data
• social data are large but access is easy and trivially parallel
• scientific data in new research domains like genetics is a bigger challenge
• not true for all scientific data, CERN will produce 100 PB/year starting next year but with easy access and simple processing models, still a very expensive game…
4
Horizon2020: Research and Innovation
Horizon 2020 is the biggest EU Research and Innovation programme ever with nearly €80 billion of funding available over 7 years (2014 to 2020).
In addition to the private investment that this money will attract. It promises more breakthroughs, discoveries and world-firsts by taking great ideas from the lab to the market.
5Research and Innovation
Research AND Innovation, not Research OR Innovation
Research activities with innovation in mind Innovation should have job creation in mind But how to take great ideas from the lab to the
market? What can a research funder do? Which instruments do we have?
6job creation is important
Following slides adapted from Joe McKendrick/Forbes, September 2012http://www.smartplanet.com/blog/bulletin/7-new-types-of-jobs-created-by-big-data/682
7 new types of jobs created by Big Data
In today’s unforgiving global economy, those organizations that compete on analytics stand the best chance of outsmarting the competition. The only catch is, they need skilled professionals who know how to manage, mine and draw actionable insights from all the “Big Data” now streaming across enterprises.
7job creation is important
1. Data scientists: this emerging role is taking the lead in processing raw data and determining what types of analysis would deliver the best results
2. Data architects: organizations managing Big Data need professionals who will be able to build a data model, and plan out a roadmap of how and when various data sources and analytical tools will come online, and how they will all fit together
3. Data visualizers: organizations need professionals who can “harness the data and put it in context, in layman’s language, exploring what the data means and how it will impact the company”
8job creation is important
4. Data change agents: driving “changes in internal operations and processes based on data analytics.” They need to be good communicators, they know how to apply statistics to improve quality on a continuous basis
5. Data engineer/operators: people that make the Big Data infrastructure hum on a day-to-day basis. “They develop the architecture that helps analyse and supply data in the way the business needs, and make sure systems are performing smoothly”
6. Data stewards: ensure that data sources are properly accounted for
7. Data virtualization/cloud specialists: ability to build and maintain a virtualized data service layer; organizations need professionals that can also build and support these virtualized layers or clouds
9
network infrastructure, GÉANT
HPC/distributed computing/software infrastructure
scientific data infrastructure
e-infrastructure building bridges
10issues to be addressed (e-infrastructure)
The EC in coordination with EU Member States is looking after research data as an infrastructure
As a valuable and a strategic resource, research data opens at least three key issues to be addressed(*): How data can be networked How to envision and set up data governance on a
global scale How the EU can play a leading role in helping start and
steer this global trend
(*) Fred Friend, Jean-Claude Guédon Herbert van Sompel “Beyond Sharing and Re-using: Toward Global Data Networking”
11Policy context
A Reinforced European Research Area Partnership for Excellence and Growth, COM(2012) 392 – July 2012
Towards better access to scientific information: boosting the benefits of public investments in research, COM(2012) 401 final - July2012
Commission, Recommendation on access and preservation of scientific information, C(2012) 4890 final – July 2012
Horizon 2020 - Open Access to Scientific Publications - Pilot on research data
Data Management Plan
Open Science
12
RESEARCH INFRASTRUCTURE (E-INFRASTRUCTURE HIGHLIHGTED) Work Programme 2014-2015
CALL 1DEVELOPING NEW
WORLD CLASS INFRASTRUCTURES
CALL 2INTEGRATING AND OPENING
RESEARCH INFRASTRUCTURES OF PAN-EUROPEAN
INTEREST
CALL 3E-INFRASTRUCTURES
CALL 4SUPPORT TO INNOVATION,
HUMAN RESOURCES, POLICY AND INTERNATIONAL
COOPERATION FOR RESEARCH
INFRASTRUCTURES
DESIGN STUDIES
SUPPORT TO PREPARATORY PHASE OF ESFRI PROJECTS
SUPPORT TO THE INDIVIDUAL IMPLEMENTATION
AND OPERATION OF ESFRI PROJECTS
SUPPORT TO THE IMPLEMENTATION OF CROSS-CUTTING INFRASTRUCTURE
SERVICES AND SOLUTIONS FOR CLUSTER OF ESFRI AND OTHER RILEVANT RESEARCH INFRASTRUCTURE
INITIATIVES IN A GIVEN THEMATIC AREA
INTEGRATING AND OPENING EXISTING NATIONAL AND REGIONAL RESEARCH INFRASTRUCTURES OF
PAN-EUTROPEAN INTEREST
MANAGING, PRESERVING AND COMPUTING WITH BIG RESERACH DATA
E-INFRASTRUCTURES FOR OPEN ACCESS
TOWARDS GLOBAL DATA E-INFRASTRUCTURES:
RESEARCH DATA ALLIANCE
Pan-European High Performance Computing
infrastructure and services
Centres of Excellence
for Computing applications
Network of HPC Competence Centres for SMEs
PROVISION OF CORE SERVICES
ACROSS E-INFRASTRUCTURES
RESEARCH AND EDUCATION
NETWORKING – GEANT
E-INFRASTRUCTURES FOR VIRTUAL RESEARCH
ENVIRONMENTS (VRE)
INNOVATION SUPPORT
MEASURES
INNOVATIVE PROCUREMENT PILOT ACTION IN THE FIELD OF SCIENTIFIC INSTRUMENTATION
STRENGTHENING THE HUMAN CAPITAL OF
RESEARCH INFRASTRUCTURES
NEW PROFESSIONS AND SKILLS
FOR E-INFRASTRUCTURES
POLICY MEASURES FOR RESEARCH
INFRASTRUCTURES
INTERNATIONAL COOPERATION FOR RESEARCH
INFRASTRUCTURES
E-INFRASTRUCTURE POLICY DEVELOPMENT AND
INTERNATIONAL COOPERATION
NETWORK OF NATIONAL CONTACT
POINTS
CALLS IN 2014DEADLINES SEPT 2014 AND JAN 2015
INITIATIVES STARTING IN 2015 UNTIL 2018
Fran Berman
Research Data Driving Solutions to Complex Scientific and Societal Challenges
Who is most at risk to contract asthma?
How can we increase
wheat yields?
How accurate is the Standard Model of Physics?
Image: Lucas Taylor
How can we best address energy needs and sustain the environment?
Image: Ceinturion, Wikipedia
Fran Berman
Data-Sharing Driving Innovation Across Sectors and Communities
Fran Berman
World-wide Efforts Focusing on Infrastructure to Support Research Data Sharing, Access, Use
Science, Humanities, Arts Communities
E-Infrastructure professionals, data analysts,
data center staff, …Data
Scientists
Libraries, Archives, Repositories, Museums
Fran Berman
Institutional Data Sharing Practice
Data Access and Distribution Policy
Data Discovery Tools
Common Metadata Standards
Digital Object Identifiers
Data CitationStandards
Data Analytics Algorithms
Data Preservation Practice
Data Scientists and Expert Support
Sustainable Economic Models
Curation Practice and Policy
Auditing, Certification and Reporting Practice
Fran Berman
Many Infrastructure Building Blocks Needed to Accelerate Progress
Data Use and
Re-use
Data Discovery and Data Sharing
Research Dissemination and
Reproducibility
Data Access (now) and
Preservation (later)
Fran Berman
Research Data Alliance Created to Accelerate Development of Research
Data Sharing Infrastructure Worldwide
RDA community efforts focus on building social, organizational and technical infrastructure to
reduce barriers to data sharing and exchange
accelerate the development of coordinated global data infrastructure
RDA and RDA/US are supported in part by the National Science Foundation.
Fran Berman
RDA Approach: CREATE ADOPT USE
RDA Members come together as
• Working Groups – 12-18 month efforts to build, adopt, and use specific pieces of infrastructure
• Interest Groups – longer-lived discussion forums that spawn Working Groups as specific pieces of needed infrastructure are identified.
Working Group efforts focus on the development and use of data sharing infrastructure
• Code, policy, infrastructure, standards, or best practices that are adopted and used by communities to enable data sharing
• “Harvestable” efforts for which 12-18 months of work can eliminate a roadblock
• Efforts that have substantive applicability to groups within the data community, but may not apply to everyone
• Efforts for which working scientists and researchers can start today
RDA and RDA/US are supported in part by the National Science Foundation.
Fran Berman
Precipitous Growth
RDA Launch / First Plenary
March 2013
RDA SecondPlenary
September 2013
RDA ThirdPlenary
March 2014First RDA
organizational telecon: August 2012
Global Data Planning Meeting: October 2012
First Working Groups and Interest Groups
240 participants
First “neutral space” community meeting (Data Citation Summit)
First Org. Partner Meet-up
First BOFs
380 participants from 22 countries
RDA FourthPlenary
September 2014
First Organizational Assembly
6 co-located events
14 BOF, 12 Working Groups, 22 Interest Groups
497 participants
Amsterdam
First Working Group exchange meeting
RDA Plenary 2Washington, DC
RDA Plenary 1 / LaunchGothenburg, Sweden
RDA Plenary 3Dublin, Ireland
RDA and RDA/US are supported in part by the National Science Foundation.
Fran Berman
Map courtesy traveltip.org
Austral-pacific
4%
Africa2% South
America1%
The RDA Community Today: Over 1850 members from 80+ countries
(as of 6/14)
Asia4%
RDA and RDA/US are supported in part by the National Science Foundation.
Fran Berman
RDA Interest (IG) and Working Groups (WG) by Focus (as of 6/14)
Domain Science - focused• Toxicogenomics
Interoperability IG• Structural Biology IG• Biodiversity Data
Integration IG• Agricultural Data
Interoperability IG• Wheat Data Interoperability WG• Digital Practices in History and
Ethnography IG• Defining Urban Data Exchange
for Science IG• Geospatial IG• Marine Data Harmonization IG• RDA/CODATA Materials Data
Infrastructure and Interoperability IG
• Research Data Needs of the Photon and Neutron Science Community IG
Data Stewardship - focused• Research Data Provenance
IG• RDA/WDS Certification of
Digital Repositories IG• Preservation e-
infrastructure IG
• Long-tail of Research Data IG
• RDA/WDS Publishing Data IG
• RDA/WDS Repository Audit and Certification Working Group
• Domain Repositories Interest Group
Reference and Sharing - focused• Data Citation WG• Standardization of Data Categories and
Codes WG
• RDA/CODATA Legal Interoperability IG• Data Description Registry Interoperability
Working Group
Community Needs - focused• Community Capability Model
IG• Engagement IG• Development of Cloud
Computing Capacity and Education in Developing World Research IG
• Ethics and Social Aspects of Data IG
Base Infrastructure - focused• Data Foundation and Terminology WG• Metadata Standards Directory WG• Practical Policy WG• PID Information Types WG• Data Type Registries WG
• Data in Context IG• Big Data Analytics IG• Data Brokering IG• Federated Identity Management IG• Metadata IG• PID Interest Group• Service Management IG
Fran Berman
RDA/US Goals: Contribute to RDA
“international” efforts and leadership
Bring US efforts to broader RDA community
Build the RDA community within the US
Leverage and implement RDA deliverables in the US to amplify impact
Collaborate closely with other RDA “regions” on key programs and initiatives
RDA/US: Collaborate Globally, Contribute Locally
RDA and RDA/US are supported in part by the National Science Foundation.
NSF-supported RDA/US initiatives:• Outreach (RDA RDA/US)• RDA Deliverables
Amplification• Student / Early Career
Engagement
RDA/US Steering Committee• Fran Berman, RPI• Larry Lannom, CNRI• Mark Parsons, RPI• Beth Plale, IU
RDA US membership (yellow states)
23The European plug-in to RDA …
RDA Europe Forum – strategic advice RDA Europe Science Workshops –
interaction & feedback from target audience
RDA Europe national & pan-European outreach – to engage new members & disseminate outputs
RDA Europe policy report – to support European policy-makers & funders
RDA Europe, the European plug-in to the global RDA, supports RDA global and brings European voice to the table
24Europe as a Global Partner
Societal challenges of our time transcend borders Data and computing intensive science is made of
global collaborations Research data are global Research Data Alliance: enable data exchange at
global scale
25
Domain initiatives are very important Marine data sharing – Southern Ocean Observing
System Genetic data sharing – human genome project Astronomy – SKA CERN LHC
But domain initiatives will not necessarily enable bridges to be constructed across disciplines, time, and industry
So the EC, the USA, and Australia committed resources to forming the Research Data Alliance
International
26
RDA has so far not got enough traction with the HPC big data and computer science communities
This will need to be addressed urgently since the HPC community dealing with Big Data will need a close interaction with application user communities, support from the policy makers at national and international level and of course adequate financial support by the relevant funding agencies
Important therefore to work together… And link with relevant other initiatives such as NDS in
the US (presented by Ed Seidel yesterday) and such as EUDAT in EU
Relation to HPC
27
“We are taking our work beyond Europe's borders, to reach global scale. To make the scientific resources of the world work together, interoperating and open to discovery. For example we are working with partners like the US and Australia in the Research Data Alliance to make scientific progress broader, deeper and more workable”. Neelie Kroes, Vice-President of the European Commission responsible for the Digital Agenda - Open Access to science and data = cash and economic bonanza, 19 November 2013
Why a Research Data Alliance?
… So much to gain from collaboration …
28
SAVE THE DATE
30
Input to this presentation kindly provided by Fran Berman, Hilary Hanahoe and public presentations by EC officials
But the opinions expressed in this talk are under my entire responsibility as any mistake or omission
Thanks for your attention!
Acknowledgments
31
Resources
32First RDA Infrastructure Deliverables Coming this Fall
Data Type Registries WG• Deliverables: System of data type
registries, formal model for describing types, working model of a registry.
• Initial Adopters and Users: CNRI, International DOI Foundation, Deep Carbon Observatory
Practical Code Policies• Deliverables: Survey of policies in
production use, testbed of machine actionable policies, deployment of 5 policy sets, policy starter kits
• Initial Adopters and Users: RENCI, DataNet Federation Consortium, CESNET, Odum Institute, EUDAT
Persistent Identifier Information Types• Deliverables: Minimal set of PID types, API
• Initial Adopters and Users: Data Conservancy, DKRZ
Language Codes• Deliverables: Operationalization of ISO
language categories for repositories.
• Initial Adopters and Users: Language Archive, Paradisec
Data Foundations and Terminology• Deliverables: Common vocabulary for
data terms, formal definitions and open registry for data terms
• Initial Adopters and Users: EUDAT, DKRZ, Deep Carbon Observatory, CLARIN, EPOS
Metadata Standards• Deliverables: Use cases and prototype
directory of current metadata standards starting from DCC directory
• Initial Adopters and Users: JISC, DataOne
Fran Berman
Next Steps for the RDA
Continuing pipeline of infrastructure deliverables adopted and used to accelerate data sharing
Increasing coordination of infrastructure
Increasing cross-boundary collaborations between domains, sectors, organizations
International and regional programs focusing on workforce, outreach, expansion of infrastructure impact
New partners in the Organizational Assembly
Focused strategy to support development of industry infrastructure for data sharing
More Infrastructure
Focus on Industry
Synergistic Programs
Effective Community
RDA/US is supported in part by the National Science Foundation.