e-infrastructure landscape: a researcher’s viewpoint. carol… · • support any device and any...

43
e-Infrastructure landscape: A Researcher’s Viewpoint Professor Carole Goble FREng, FBCS CITP University of Manchester, UK Open Middleware Infrastructure Institute Software Sustainability Institute UK RUGIT (Russell Universities Group IT Directors, Imperial College, London 28 March 2012

Upload: others

Post on 16-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

e-Infrastructure landscape:

A Researcher’s Viewpoint

Professor Carole Goble FREng, FBCS CITP

University of Manchester, UK

Open Middleware Infrastructure Institute

Software Sustainability Institute UK

RUGIT (Russell Universities Group IT Directors,

Imperial College, London 28 March 2012

Page 2: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Why me? • University-based e-Infrastructure researcher, developer

and service provider: workflow management, catalogues, data/model/workflow repositories, VREs, catalogues, data collection, ontologies, publishing

• Mixed team of CS researchers, scientific informaticians, software engineers and IT Services: 1 seconded from ITS, 2 working with us.

• e-Infrastructure provider to large-scale projects: biodiversity, astrophysics, astronomy, chemistry, genomics, digital document preservation, systems biology, social science …

• e-Infrastructure provider to long tail researchers: biology, genomics, chemistry, social science, archaeology, music, …

Page 3: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Why me?

• Open Middleware Infrastructure Institute UK and Software Sustainability Institute, ESNW e-Science Centre with ITS@Manchester.

• Chaired expert panel for BIS “Delivering the UK’s e-Infrastructure for research and innovation”, 2010

• Leading a Think Tank Tea Club for academic-lead consultation on a strategy for e-Infrastructure for Research

• Chair UK User Forum for ELIXIR ESFRI

Page 4: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

What is e-Infrastructure? The integration of digitally-based technology, resources,

facilities, and services combined with people and

organizational structures needed to support modern,

collaborative research (and teaching).

1. Data and Storage

2. Software (and Algorithms)

3. Hardware (Compute)

4. Networks

5. Security and authentication (BIS Report)

6. People (Collaboration, Skills, Capacity)

7. The Digital Library

Page 5: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

e-Infrastructure Tiers Researcher and Community Specific Applications

Visualisation tools, Simulation tools, Research Tools etc

Stuff we want to be there and buy into

Compute, Storage, Networks, Backup, Service Hosting

Library Services, File and Don’t Forget Data Store,

Institutional Repository, Security Services

Core

Base

Specific

Com

munity

Specific

Com

modity, m

aybe C

usto

mis

ed

Pan-Application and Possible Pan

Community Specific Infrastructure

Often community specific.

Maybe even developed by your researchers

R and Matlab, Specialist Data Management,

Workflow Management, CDK Toolkit for Chemistry

LIMS, ELN, Catalogues and Repositories

Page 6: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Fundamental Expectations

of Institutional IT Support • Lots and lots of storage.

• Backup and fast recovery.

• Fast and reliable network.

• Wireless everywhere.

• Compute of different kinds that works with my tools and not too many hoops, if any.

• Reliable Service hosting.

• Support any device and any operating system.

• Availability of skilled and helpful staff when you need them. Preferably known.

• Ability to use my community’s/Labs e-infrastructure

Page 7: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Political Landscape

Page 8: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Landscape of UK

e-infrastructure Reviews

BIS, RCUK Delivering the UK’s e-

Infrastructure for research and

innovation, 2010

RCUK & RS Review of e-Science 2009

OSI Developing the UK’s e-Infrastructure for science and innovation, 2007

BIS, RCUK, HEFCE, DEL, SFC, Report of

the e-Infrastructure Advisory Group, 2011

BIS A Strategic Vision for UK e-Infrastructure, 2011

Strategy for the UK Research Computing Ecosystem, 2011

BIS Technology and Innovation Futures

Foresight Report, 2010

JISC Review 2010

Page 9: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Coordination

Capacity

Data

• Coordination across research councils.

• Changing behaviours to reward and enable reuse.

• Overcoming fragmentation.

http://www.rcuk.ac.uk/documents/research/esci/e-Infrastructurereviewreport.pdf

Page 10: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

UK Research Computing

Ecosystem

Campus/ Regional

HPC e.g HPC-Wales

Special purpose

HPC e.g. DiRAC

National HPC

e.g. HECToR Public data

analysis

cloud

Thematic

petabyte

data store

Specialist

databases

Software

development

HPC

Enhance

current DTCs

Software

development

E-science

Open and

accessible

JANET

Cyber-

security

Page 11: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

e-Infrastructure Leadership Council

• The additional £145M of capital investment in e-infrastructure recently announced by BIS builds on the initial outputs from the “A Strategic Vision for UK e-Infrastructure” report.

• Co-chaired by Dominic Tildesley and David Willett

• A 10 year roadmap for investment for networks, data and storage, compute, software and algorithms, people and skills and security and authentication.

• BIS dominated. e-Infrastructure for business

• HPC and Hartree oriented. Strong commercial interest. Concerns about representation and vested interests.

• Keen on specialist centres

• “Today to out-Compute is Out-Compete”

Page 12: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Rear-guard Actions

• RCUK e-Infrastructure subgroup

– Funding councils, championed by Doug Kell (BBSRC)

• e-Infrastructure academic user forum

– Academic lead, Chairs: Prof David De Roure (Oxford) and Prof Peter Coveney (UCL)

– Community thought leadership group

– Concerns about the constitution of the e-Infrastructure Leadership Council

Page 13: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

e-Infrastructure

Academic User Forum • e-Infrastructure academic user forum

– Concerns about the constitution of the e-Infrastructure Leadership Council

– Builds on earlier e-Science Forum and HPC-SIG

– A whole-community exercise

• new Digital Research conference – St Catherine's College in Oxford 10-12th September.

– Showcasing successful digital research, tools and methods, and building community especially around big data, open data, open science and the next generation of digital researchers.

Page 14: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

RCUK consultation for a Capital

Investment Roadmap • E-infrastructure investment

– fast turn round on university clusters – opportunistic in

year spend from other parts of government

– Speed which the community mobilised seen very

favourably.

– RUGIT now needs to articulate future capital needs

through the RCUK consultation on capital for RCUK

and HEI to be ready if some more funding is found.

– Is RUGIT lined up with your HEIs?

– http://www.rcuk.ac.uk/research/Infrastructure/capcons

t/Pages/home.aspx

Page 15: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

RUGIT Help?

• Participate in the Academic User Forum

• Lobby e-Leadership Council

• Remember its not all flippin’ HPC and

RUG Institutions are where much of the

research is done.

• Respond to the CIR

• Be prepared to respond to funding

councils

Page 16: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

EPSRC Software Strategy

• Software as an Infrastructure – Survey, response, action plan

– http://www.epsrc.ac.uk/SiteCollectionDocuments/other/SoftwareAsAnInfrastructure.pdf

• Areas – Identification of new areas and grand challenges

– Enabling and promoting collaboration

– Research and Development

– Training

– Career Path Support

– Joint funding models

– Supporting Innovation

– User Support

– Quality of Code

– Sustainability of Code

Page 17: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

European e-Infrastructure The mood is reuse and cooperation.

• e-IRG e-Infrastructure Reflection Group http://www.e-irg.eu

• Siena Standards and Interoperability for eInfrastructure and Implementation Initiative http://www.sienainitiative.eu

• Horizon 2020 http://ec.europa.eu/research/horizon2020

• Still keen on Grid and HPC though have finally embraced Cloud

• GEANT: very successful network-level e-infrastructure.

• EGI: now well established European Grid e-infrastructure – EGI faces challenges due to lack of standardisation, a focus on a small

number of application domains and financing.

– Grid stuff is messy: National infrastructures, numerous middleware types, disconnect between EMI and EGI.

– Recent spin out for data – e.g. EUDAT (eudat,eu) though a bit archive focused

• PRACE: HPC infrastructure for Europe – very successful community but struggling to transition to a cash

contribution model from a resource contribution model.

Page 18: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

European e-Infrastructure

• Usual obsession with Scale

– Exascale Challenge

• CRESTA, DEEP, Mont-Blanc

• combined funding of 25 million

• different aspects of the exascale challenge “using

a co-design model spanning hardware,

systemware and software applications”.

• Centres of Excellence

– European Software Centres of Excellence

Page 19: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Community focused

e-Infrastructure • Own services, catalogues, data management

environments, datasets, tools and metadata standards, tools, policies, licensing conventions….

• Large number of EU community specific e-Infra: – BioVeL for BioDiversity

– SCAPE for Digital Libraries

– ScalaLife (Comp Bio), MAPPER, VPH…blah blah….

• Flagships, ESFRI projects, Innovative Medicine Initiatives, e.g. OpenPHACTS

• National e.g. UKDA

• International community: e.g. BioStar, Open Bioinformatics Federation, BioSharing, COMBINE, PubMed….

Page 20: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

European Strategy Forum on

Research Infrastructures

Page 21: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

ELIXIR, http://www.elixir-europe.org/

• Data Access, Curation and Integration – managing the data deluge

– integrating the data to reduce fragmentation of effort and research

– incorporating and exploiting new types of data

– maintaining open access to biological data in order to enhance competitiveness and innovation.

– radically enhancing Europe’s data infrastructure and make it more accessible.

– presenting users with a single, transparent interface to a world of resources

Hub and Spoke Model

European Bioinformatics Institute

1500 databases in

public use.

> 2000 web services

Page 22: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Ecology and BioDiversity • DataONE Investigators toolkit: Dryad data

repository, Workflow tools, Excel tools, R and

MatLab libraries and tools, Specialist Data

Management tools, Bibliography tools, remote

file management tools, data management

planning tools, software tools catalogue

• BioVeL infrastructure: workflows, data

management, web services, portals, catalogues,

repositories, Excel tools

• CAMERA 2.0: portal, workflows, services,

metadata management, data management… http://camera.calit2.net

http://www.dataone.org/investigator-toolkit

http://www.biovel.org

Page 23: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Find, exchange and

interlink, preserve,

publish data, models,

publications, SOPs&

analyses.

User access control.

Mix local and central

stores. Post-project

archiving & publication.

Gateway to

public tools

and resources

Standards

compliant.

Launch and validate

models and analyses:

JWS Online

Find experts,

colleagues

and peers

Group, Project, Consortium,

Community Sharing

Personal Storage

http://www.sysmo-db.org

Page 24: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Faciliting/leveraging Institutional

e-Infrastructure and know-how

• Many research teams have e-Infrastructure and technically skilled people. How do you leverage, enable and exploit them? And vice versa?

Page 25: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Commercial or Open

Cloud-type e-Infrastructure • PAYG Compute

– Amazon Cloud and Azure

• Cluster computing – Condor

• Software repositories – Github, BitBucket

• Data Management – Dryad, DataVerse,

• Data Commons – FigShare

• Specialist repositories & Catalogues

– nanoHUB, myExperiment, OpenWetWare, BioModels,

• Communication and Publishing – LinkedIn, YouTube, SlideShare,

Twitter, Wikipedia, GoogleDocs, DropBox, GitHub, Wikis and blogs galore!

• Research management – VivoWeb

• Bibliographic tools – Mendeley, ReadCube, CiteULike,

Google Citations

• Review and conference Management

– EasyChair

• Specific social networks – ResearchGate, BioMedExpert,

SciLink, OurSpace, BioCrowd

• LIMS and Lab Management – YourLabData, LabGuri

• Workflow management – Taverna, Triana, Kepler, Pegasus

• Research Cloud Services – e.g. Digital Science

DataONE Software

Catalogue lists about

300 tools

Page 26: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

http://www.digital-science.com/

Page 27: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them
Page 29: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Researcher’s e-infra dimensions,

obligations & opportunities

• Personal, Group

• Project, Consortium

• Community

• Institutional

• National, International

• Cloud Commodity

• Open source

Assemble our own e-infrastructure

Page 30: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Flexibility Cherished above all

things for

Institutional e-Infrastructure.

One Size Does Not Fit All.

What the researchers want that

actually fits how they work.

Page 31: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Where I Stopped

Due to lots and lots of discussion

Page 32: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Reflections on Research

and its impact on e-Infrastructure

Page 33: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Pan-Institutional, International Teams

Collective Intelligence • Experimental scientists, Theoretical

scientists, Scientific informaticians, Computational Scientists, Modellers, Specialist Tool developers, Service & resource providers, Infrastructure developers, System Administrators…

• Groups/Centres of different sizes

• “Long Tail” Postgrads/Citizens

• Planned/Emergent, Evolving

• Collective intelligence and crowd-

sourcing

• Community datasets, services

• Curation and collective campaigns

Page 34: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Make

Prediction

Log

Identify

Problem

Generate

Hypothesis

Investigate

Prior

Knowledge

Make

Observations

Test

Hypothesis

Perform

Experiments

Organise

Data Analyse

Data

Devise

New

Experiments

Draw

Conclusions

Communicate

Validate &

Compare

Results

Validate &

Compare

Results

Pool

Results

Agile

Science

with Data

Research is

often not

beautifully

planned.

Page 35: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Data • Exascale data is being handled: SKA, LHC…

• Biggish local data (e.g. Next Gen Seq)

• Small local data (e.g. spreadsheets & wikis)

• Scruffy data – text, semi-structured, emergent…

• Online data sets - public or consortium

• Moving lots of data around. Securely.

• Network traffic. Local compute & caching. Licensing. Standard formats.

• Bottlenecks: integration & analysis, community specific curation & stewardship, preservation if needed

• Open Data

• Data Journals

http://www.elixir-europe.org

Page 36: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Analytics Platforms

• Research as a Service

• Virtualisation & Automation

• Software is e-Infrastructure

– EPSRC Software Strategy report

– Software Sustainability Institute UK

– Software Carpentry

– Improving software practices

• The. Cloud.

– Community and UK Cloud

– Putting datasets / services on The Cloud

Page 37: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Open innovation

platforms/architectures

• Open data, Open APIs, Open

Licensing, Open Source, Open

Standards

• Enabling: others innovate

• Researcher-centric

• Adoption ramps.

DropandCompute

Page 38: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Open (maybe even reproducible research)

• Open Data, Open Access, Open Source Software – Elsevier climb down, Royal Society review

– RCUK Public consultation, Research Council Mandates

– Open Access – who pays?

– Scientists publish. They do not generally share outside trusted collaborations.

• Transparency and provenance – Possible re-computation on VM farms

• Stewardship – Stewardship mandates and policies vs Practical

Stewardship vs Community Stewardship

– Curation burden for reusability

Page 39: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

Trend: New Publishing

• All scientific commodities – Software, models, data, methods,

know-how, articles, algorithms, services

• Integrated publishing – Data+discussion+method

– Data journals, Software Journals…

• Credit! Software, Data, People – People: Orcid, OpenID

– Software and Data journals

– Software citation? Data Cite

– Data and Software management policies and practices. Altmetrics

Page 40: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

People

• Income generators: academics and students – their productivity should be highest priority IMHO

• Capability and Relationships – Training for young and mid-career

researchers/research technologists.

– Enable mixed skilled research teams to include research technologists and IT staff

– Bind ITS with research groups

– Comms channels – dog food

• Value and reward highly skilled research technologists within HE institutions with a career structure.

Google Tech Stop

Page 41: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

How can RUGIT Help?

• Researchers’ needs – Top down (alone) is just not going to wash

– Partnering with researchers

– Lots of exaflop machines won’t help most of us….

• Use, leverage and share what is already used – Infrastructure already in your institutes used by your

researchers. Or even developed by your students and researchers

– Researchers & students are using many external services. And will continue to do so.

– Community and commodity solutions.

• Getting out the way…. – Enable don’t hamper, esp. domain specific e-infrastructure.

Page 42: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

How can RUGIT Help?

• Joining up

– Greater provision by institutions

– Sustainability through collectivism

– Inventing is costly

• There is no one size fits all

– Avoid reinventing wheels but there are many kinds of

wheel - data and metadata.

• Rapid responsiveness

– Research opportunities often have narrow time

windows. Perfect solution too late is no good. Good

enough solution in time is perfect.

Page 43: e-Infrastructure landscape: A Researcher’s Viewpoint. Carol… · • Support any device and any operating system. • Availability of skilled and helpful staff when you need them

How can RUGIT Help?

• Be Open, Be Permeable. Be Flexible – Presume that you will not be the developers of much

of the “upper” infrastructure and the applications your users need.

– But you will need to be flexible enough to adapt into them.

– Future proof through openness.

• Research, Teaching, Life bleed together. – Project and research management. Joined up grant

submitting, paper writing, financial management, HR

– JeS / ROS / Local systems mismatch is example