supporting-researchers-in-the-cloud
TRANSCRIPT
Supporting Researchers in the Cloud
Dr. Ann Borda
Executive Director / VeRSIVictorian eResearch Strategic Initiative
Chief Executive Officer /VPAC Victorian Partnership for Advanced Computing and V3 Alliance
IDC Report – The Digital Universe in 2020
• From 2005 to 2020, the digital universe will grow by a factor of 300, from 130 exabytes to 40,000 exabytes, or 40 trillion gigabytes (more than 5,200 gigabytes for every man, woman, and child in 2020).
• From now until 2020, the digital universe will about double every two years.
• Only a tiny fraction of the digital universe has been explored for analytic value. By 2020, as much as 33% of the digital universe will contain information that might be valuable if analyzed.
• By 2020, nearly 40% of the information in the digital universe will be "touched" by cloud computing providers — meaning that a byte will be stored or processed in a cloud somewhere in its journey from originator to disposal.
• The amount of information individuals create themselves — writing documents, taking pictures, downloading music, etc. — is far less than the amount of information being created about them in the digital universe.www.emc.com/leadership/digital-universe/
5
What is the cloudThe word 'cloud' is now ubiquitous when discussing online technologies and services.
“Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. “
The U.S. National Institute of Standards and Technology (NIST) Sept 2013
6
Service Models - Definitions
Software-as-a-Service (SaaS): Applications served over the Internet, like Google Docs. Such applications frequently include collaboration or sharing features that would be more difficult to implement in desktop software.
Platform-as-a-Service (PaaS): Specialized APIs for building applications on the Internet, like Google App Engine.
Infrastructure-as-a-Service (IaaS): Low-level services for basic storage and computing. A variety of services: Amazon Web Services, Windows Azure, and Google Compute Engine.
9
Cloud Service Providers – Just a few!
1. Amazon2. VMware3. Microsoft Azure4. SalesForce5. Google 6. Rackspace7. IBM8. Citrix9. Joyent10. SoftLayer
11. OpenStack12. Cisco13. AT&T14. GoGrid15. Oracle16. SAP17. Dropbox18. Verizon/Terremark
Challenges and Opportunities
Researchers as extreme “information workers”
Consumers and creators of information and knowledge
Open and closed access publishing of data and results “Data sets are becoming the
new instruments of research”
Dan Atkins, University of Michigan
Drivers• Key technological drivers
• Moores Law – the exponential increase in � �computing power and solid-state memory – faster cheaper devices, higher capacity storage, etc
• dramatic increase in communication bandwidth
• Research Process: coping with the data deluge– Finding and accessing data– Linking data– Processing data– Interpreting data – Presenting results
• Increased international and cross-organisational collaboration - Doing what was previously impossible
Genomics High EnergyPhysics
Astronomy
Research Patterns
*Humanities and social sciences (hybrid models across these types)
Based on a slide by Ewan Birney, EMBL, 2010
21st Century Research
Organise Data
DiscoverData
PublishData
Use Data
Collect Data
Generate Data
> 80% of researcher’s
time
< 20% of researcher's
time
A basic reality of research
Acknowledgement: Dr. Rhys Francis, Oct 2013
Reality of Data Use?
Researcher infrastructures at a Glance
Compute Data
Networks
Tools
Data scales can vary(Most research isn’t about big data)
Infrastructure Stack 2009-2013 (NCRIS & Super Science)
Infrastructure Co-ordination
Research Data CommonsBetter data management, description and access
Extended Bandwidth
Better HPC modelling Larger data collections
Shared Access Methods
NCI&Pawsey+ $156M
Australian Research Education Networks
(AREN)+ $40M
Research Data Storage Initiative
(RDSI) + $50M
Australian Access
Federation (AAF)
+ $2M
Australian National Data
Service (ANDS)+ $75M
Digital LaboratoriesBetter research tools, environments and workflows
National eResearch
Collaboration Tools and Resources
+ $69.5M
Australian eResearch Infrastructure Council $1.5M
NationalCapabilities
+ $246M
ResearchIntegration+ $144.5M
R.Francis, AERIC, 2010
Do computational modeling, complete data analysis,visualize results
NCIPawsey
Keep data and observations,
describe, collect, share,find, and re-use them
ANDSRDSI
Use new tools, apps,work remotely and collaborate in the
cloud
NeCTAR
A National Stimulation Package!
Increased connectivity and bandwidth ARENAAF Single Sign On, High reliance services, High reliability servers
Acknowledgement: Rhys Francis, AeRIC, May 2012
19
Cloud – What’s in it for researchers?
Cloud computing is cost-competitive for a wide variety of research workloads and application scenarios
Some examples:
•24/7 Web applications•Shared access to documents, data, tools to multiple collaborators• large-scale data processing• "bursty" or "spiky" CPU-intensive workloads
Humanities and the NeCTAR Research Cloud
• Researcher Lauren Gawne (pictured right) is a linguistics expert who has just completed a PhD thesis on a linguistic description of a Tibeto-Burman language of Nepal called Lamjung Yolmo at the University of Melbourne.
• “I can now search texts much quicker and modify them to suit my purposes. I don't really 'write code' as in create programs from scratch, but I'm quite happy to tinker with it… I am part of a generation who grew up with computers and I navigate html and xml when writing blogs and so forth.”
.
Software as a Service - NeCTAR Research Cloud
“By running Tilemill on the NeCTAR servers I was able to work on the program through my browser, and it ran as effectively for me as for any other student in the workshop (even those with shiny new computers). I was also free to move between working on it at home, or from a computer in the office.
For the collaborative group map we could work together remotely, which definitely made it easier.”
Virtual Laboratories
• CSIRO - Virtual Geophysics Laboratory
• Genomics Virtual Laboratory
• University of Tasmania - Marine Virtual Laboratory• The All Sky Virtual Observatory
• Climate and Weather Science Laboratory
• Humanities Networked Infrastructure (HuNI)
• The Characterisation Virtual Laboratory: research environments for exploring inner space
http://www.nectar.org.au/virtual-laboratories-1
eResearch Tools• Macquarie University - UniCarbKB: e-infrastructure for glycomics• University of Western Australia - cloud-based bio-informatics tools• University of Queensland - OzTrack: tools for the storage, analysis and visualisation of animal
tracking data• Monash University - Bioscience Data Platform - TARDIS in the cloud• Australian Synchroton - tools for the Australian Synchroton community• Australian National University - Drishti and Voluminous - volume visualisiation tools• University of NSW - federated Archaeological information management system• Curtin University - Collaborative and Automated Tools for the analysis fof marine imagery and
video (CATAMI)• Monash University - Geology from Geodynamics• Queensland Cyber Infrastructure – Quadrant• Centre of Excellence for Particle Physics - high throughput computing for globally connected
science• University of Queensland - Aust-ESE project - tools to support collaborative authoring and
mangement of electronic scholarly editions• University of Adelaide - Submission, harmonisation and retrieval of ecological data – SHaRED• University of Melbourne - Human Variome Project, Australian node clinical and molecular data
linkage tools• Schizophrenia Research Institute - Extension and Enhancement of Systems for the Australian
Schizophrenia Research Bank• CSIRO - Cloud based image analysis and processing toolbox
http://www.nectar.org.au/eresearch-tools
Data, its management and use, -a common consideration between the stakeholders
Data
Data
Data
Data
Frame work for access and use
"as researchers move into the cloud, and the world grows information rich, where are libraries (in the cloud)"?
“use the cloud to go to them, not they come to us?"
Some gaps:– Research interactions often cross an organisation
boundary, and across multiple infrastructures– Researchers managing complex workflows– What is the role of service providers like libraries– What new skills/roles to make stuff happen– Addressing barriers to uptake
– Conventional wisdom suggests we may need interoperability or a standard (or two), AND a lot of people working together.
Understanding Research Innovation
McCrindle Research – Emerging Research Methods http://www.mccrindle.com.au
Europeana Research
Platform
Content & Data
Tools “Portal”
“Annotation”“API”
“SPARQL”
Services
http://pro.europeana.eu/web/europeana-cloud
Helix Nebula
• http://helix-nebula.eu/• CERN, EMBL, ESA• Helix Nebula - the Science Cloud, will support the
massive IT requirements of European scientists.
• The project aims to pave the way for the development and exploitation of a Cloud Computing Infrastructure, initially based on the needs of European IT-intense scientific research organisations, while also allowing the inclusion of other stakeholders’ needs (governments, businesses and citizens).
Linked open Data Cloud• http://linguistics.okfn.org/resources/llod/
- See more at: http://linguistics.okfn.org/resources/llod/#sthash.v6oTG7ys.dpuf
Digital Curation• Digital data curation involves a wide range of
activities, many of which may be suitable for deployment within a cloud environment.
• These range from infrequent, resource-intensive tasks which will benefit from the ability to rapidly provision resources, to day-to-day collaborative activities which can be facilitated by networked cloud services.
E.g. Kindura project (duraspace.org)https://jiscinfonetcasestudies.pbworks.com/w/page/45197715/Kindura
Digital curation and the cloud White Paper 2012
Future EverythingApps For EuropeApps for Europe is a support network that provides tools to transform ideas for data based apps into viable businesses, and FutureEverything is a key member of that network.
It brings together a powerful European network of individuals and organisations who have been involved in open data programmes and in supporting promising ideas to help ideas to scale
http://futureeverything.org
New Models of Delivery• Complement the continuum of library
/information services that are provided at local, faculty, institutional, state, national levels– For researcher uptake - all levels of
infrastructure must be in place, or at least services they can access
– Bring researcher champions on board• People as Infrastructure• Design everything with the user in mind
Joined up Thinking
• Identify the most commonly requested services (plug-and-play services) and strategise with agility and in consultation…
• Libraries as aggregators, facilitators, evaluators, validators, ‘nodes’ …
• Flexible underlying (technical) services…. fronted by a collaborative (aka human help desk) shop front…
• Invest in accessible & coordinated user support & training
• Invest in awareness and outreach
Any Questions?
Dr. Ann Borda, V3 AllianceLevel 3, Thomas Cherry Building (201), The University of Melbourne 3010 Phone: 03 8344 8322 |Mobile: 0437 469 417 | Email: [email protected] | [email protected]
Copyright (c) 2013, V3 Alliance, Dr. Ann BordaThis work is licensed under a Creative Commons Attribution 2.5 Australia License. To view a copy of this license visit:http://creativecommons.org/licenses/by/2.5/au/