rdsi project history 2.0
TRANSCRIPT
RDSI CREATING AUSTRALIA’S RESEARCH DATA CLOUD
2|
Table of Contents
03 13 23 38 43Chapter I Chapter II Chapter III Chapter IV Chapter V
48 52 56 59 67Chapter VI Chapter VII Chapter VIII Chapter XI Chapter X
In the Beginning Consulting the community
The data collectionsAn integrated infrastructure
Moving data
Accessing data Protecting data Working with data Selecting solutions Case studies
74 91Chapter XI Chapter XI
Looking back and looking ahead
The team
In the beginning…
Chapter I
4|
…there was nothing
At the very beginning of this project there was nothing in the sector for collaborative storage.
Really nothing. So it started from scratch, to persuade people that it was a good idea to store
data, except not just in your university or in your research
group.
– Dr Nick Tate
RDSI Project Director
5|
No discoveries without data
RDSI is in my mind, entirely about the delivery of data and the amassing of the data. It is my belief that without bringing collections together, without making them accessible, you can’t make new discoveries.
‒ Prof Nathan BindoffDirector TPAC and Professor of Physical Oceanography
6|
A Major Gap
I was working for the Australian Research Council when we were doing projects on the cutting edge in research which crosses disciplinary boundaries. Like many at the same time we were convinced the future of research would be shaped by the adoption of digital technologies, increased computational power, high performance computing and massive data storage and tools to make it possible to link and re-use existing research data. After considerable federal investment and matching efforts from state governments and universities, there was one major gap: massive data stores to service the growing eResearch communities existing and coming in to being.
‒ Prof Doug McEachernRDSI Project Board
7|
How it started
There is growing recognition that new ways to conduct research have emerged and are being validated across most research disciplines. Adding to traditional forms of research that rely on experiment, theory and testing hypotheses using data, it is now evident that researchers also:
» collect increasingly larger sets of data as a primary form of research; and
» use modelling tools to assist them in deriving patterns, perceptions and trends that can form the basis for establishing and confirming hypotheses.
Information and communications technology (ICT) is the cornerstone to such new approaches, providing the means not only for increasingly powerful computer-enabled simulation and modelling, but also the very avenue to manage and integrate the increasing volume and complexity of datasets and collections. Hence, ICT is not only a resource to administer and manage research but also to drive and innovate the ways in which research is conducted.
– Strategic Roadmap for Australian Research Infrastructure,2008, p19
In 2008, the Government’s Strategic Roadmap for
Australian Research Infrastructure highlighted the
need to manage and integrate the increasing volume
and complexity of research datasets and collections.
8|
Research storage before RDSI
Part of my role here at QCIF is to help research groups with their storage needs. In the days before RDSI it caught me by a huge surprise that even at a major university it was quite hard for a research group to get the right kind of storage, to have some better way of collaborating than sending a spreadsheet across the globe.
I remember the first meeting I had with one professor at UQ who had come from Oxford. He said, ‘Look, I’m a centre director, I’m a professor, and I have spent six months just trying to get a little bit of storage. I’m worried I should be doing something else.’
‒ Graham CheneResearch Manager, QCIF
9|
Research storage before RDSI
A retiring academic in Marine Science came to us at the University of Sydney library. He had spent his whole life collecting data from the beaches of Australia. He asked us, ‘What do I do with all of this? It’s in various formats, it’s valuable, but I don’t know what to do with it.’ We approached the university, but at that stage the view was that when research results were published, the research was finished. Why would they want to store the data and why would they want to share it?
‒ John Shipp RDSI Project Board
10|
Research storage before RDSI
We asked another researcher what she was doing to curate her data. She said, ‘Every week I download it onto CD. I make three copies: one for home, one for the office, and one I send to my mother in Perth.’
Even today I have a lingering fantasy that somewhere in Perth there is a little old lady with these CDs stacked up around her and one day they will cave in on top of her and she will be crushed by her daughter’s research. This is just imagination, but it’s an indication of what people were doing because they didn’t have facilities to curate their data properly.
‒ John Shipp RDSI Project Board
11|
2010: RDSI begins
Dates:
2010-1014
Funding:
$50m
Program:
Education Investment Fund (EIF) under the Super Science
Initiative
Lead agent:
The University of Queensland
Chair:
Prof Max Lu
Deputy Vice-Chancellor (Research)
Project Director:
Dr Nick Tate
The aim of the RDSI Project is that researchers will
be able to use and manipulate significant
collections of data that were previously either
unavailable or difficult to access, and that there will
be a consistent means of accessing this data.
The Project will be realised through the creation
and development of data storage infrastructure
accessed through a common infrastructure layer
and provided by agencies within the sector, or
commercial providers, or both.
12|
Where should we put the data?
The question was, where should we put the data? This was stopping progress across research. People couldn’t keep storing it on their desktops, and there were no onshore cloud services at the time. That left us with two options: either one big data centre or a small number of sites around Australia. The problem with one site is then you have to put everything else around it, and you lose innovation. Several is better.
‒ Dr Rhys FrancisRDSI Board
Consulting the community
Chapter II
14|
Opening a DialogueJu
ne
July
Au
gu
st
Se
pte
mb
er
Oct
ob
er
No
vem
be
r
De
cem
be
r
Jan
ua
ry
Fe
bru
ary
Ma
rch
Ap
ril
Ma
y
2011 2012
ReDS Consultation Series
9 Feb – 2 Mar
Vendor Briefing
1 Aug
ReDS Tinman
Consultation
28 Sep
DaSh Tinman
Consultation
11 Nov
ReDS Strawman
Consultation Series
3 June – 1 July
DaSh Strawman
Consultation Series
17 June
15|
Consulting the community
Because the Education Investment Fund has a restriction that you can’t use the funding for operation, we needed operational partners who could provide the operational working funds. We didn’t know if anybody would be interested in putting their hand up to do that. We were therefore consulting to see who would be interested, under what circumstances they might be interested, and what would be possible for them to achieve that would meet the project objectives and the needs of researchers. By doing that we were able to put together a plan.
‒ Dr Nick TateRDSI Project Director
16|
From Straw Man to Tin Man
It was a large community engagement exercise. For each program, we would find out who might be interested, get them together for a workshop, bounce around ideas, and then consolidate the thinking into a document. We had a lot of good feedback—in some cases probably 100 individual contributions to the Straw Man documents. We would then develop a Tin Man document for each of the programs and work through those, to ultimately lead into the business plan for the project. It matured the thinking of the project quite quickly, and it didn’t feel like an invention being planted on the community from outside. This was something the community built itself.
‒ Dr Markus BuchhornResearch Data Manager, RDSI Project
17|
Building awareness
A moment stands out in my mind, from one of the early workshops. A lady was there who was researching children affected by asthma. She wasn’t sure if her data would be suitable because it was not a large dataset. I assured her that it was the potential for reuse, not the size, that was important. By the end of the workshop she was excited about the possibilities for enhancing her research.
‒ Mary Sharp Infrastructure Advisory Panel
18|
Node Development
The NoDe Development programme wasdesigned to identify, strengthen and developresearch data centres so that they were able tohold and process high data volumes. These datacentres became Nodes of the RDSI project, andtheir operators were provided withestablishment funding from this programme.
The Node Development programme fundedthe development of eight high capacity Nodes:six Primary Nodes located in Brisbane, Sydney,Canberra, Melbourne, Adelaide and Perth, withtwo additional Nodes in Townsville and Hobart.
19|
Identifying the Nodes
RDSI went through a process of calling for proposals. We were all looking at what our individual contribution to the national infrastructure would be, and I remember in Queensland we focused on life sciences and eco sciences as being the main areas where we thought that there was a really significant body of research expertise.
‒ Rob CookCEO, QCIF
20|
Responding to the call for Nodes
We put a proposal to our members that Intersect propose to become a Node of RDSI. Intersect has twelve university members, quite a lot, and the thing that is interesting is that all twelve quickly said, ‘Yes, this is a really good idea.’ So we responded to the call for proposals and we were one of the Nodes selected from the country.
‒ Dr Ian GibsonCEO, Intersect Australia
21|
Key to better research outcomes
When we think about the services we offer at eResearch SA, we think about what we can do to make researchers more effective. It’s not our job to produce research outcomes, but it is our job to enable researchers to create better research outcomes. RDSI has been key to that.
‒ Mary Hobson CEO, eResearch SA
22|
RDSI funded NodesJa
nu
ary
Fe
bru
ary
Ma
rch
Ap
ril
Ma
y
Jun
e
July
Au
gu
st
Se
pte
mb
er
Oct
ob
er
No
vem
be
r
De
cem
be
r
22/03/2012
5/04/2012
21/05/2012
25/06/2012
5/07/2012
10/07/2012
4/09/2012
2/11/2012
The data collections
Chapter III
24|
Research Data Services
The ReDS programme was designed to identify research data holdings of lasting value and importance and contribute funding to their development at the most appropriate Node. ReDS delivers storage services in support of significant data collections, research data sets and associated access tools which are aggregated into related holdings that add value to each other through co-location.
25|
Selecting collections
The value of data is only realised when it’s used, so having data that is going to be reused was a critical part of whether it could be stored through the ReDS program. Every collection that has been given an allocation through the ReDS program meets certain criteria that indicate or demonstrate its national significance.
‒ Peter HicksReDS Program Manager, RDSI Project
26|
Merit criteria
Determining criteria by which research data might be assessed for merit was difficult across disciplines. You could assess a collection based on how many people it would be shared with. But what might be an appropriate audience size for sharing climate data would not be the same for sharing medical data or humanities data. It was very challenging to find merit criteria that would carry the same weight across all disciplines.
‒ Dr Frankie StevensResearch Data Manager, RDSI Project
27|
Data storage is a commitment
Many of the Nodes had experience with merit allocation processes used for high performance computing. But with HPC, the resources are used, and at the end of the cycle the whole process is repeated. When you try to think about that for data, for long-term storage, you’re not thinking about a resource that goes away in six months. You’re making a long-term commitment. And so the assessment of merit was a much bigger challenge.
‒ Dr Markus BuchhornResearch Data Manager, RDSI Project
28|
Identifying collections
The Intersect model is we have staff located on campus with our member universities, a team of roughly 15 people already talking to research groups, telling them about options and services available to them. So that team were now looking for collections that might benefit from being on the RDSI infrastructure.
‒ Dr Ian Gibson CEO, Intersect Australia
29|
Exposing scienceagencycollections
At NCI we’ve worked very closely with the science agencies we are associated with—CSIRO, Geoscience Australia, and the Bureau of Meteorology—to expose the national and international collections that have been locked up inside those agencies. And the win-win is that the exposure to the national community is of value because these collections otherwise would not have been available, and the confluence of them enables transdisciplinaryresearch. The advantage to the agencies is that the availability of the copy here puts that data in a rich computational environment, much richer than they have internally. That’s the nature of the win-win that’s been possible.
‒ Prof Lindsay BottenDirector, National Computational Infrastructure
30|
The long tail of research data
Coordinated research groups like astronomers are able to collect their data and curate it, and eventually they will find the storage. But the humanities, medicine, the environmental sciences, they were less united in their purpose, and I always saw RDSI as a vehicle for bringing those people together and providing them the nascent infrastructure to store their data and make it available.
‒ John Shipp RDSI Board
31|
Data at RDSI Nodes
Last updated 14/12/2014Click graph for current figures
32|
55 Petabytes of data available in over 70 Petabytesof storage
That these are huge numbers is beyond question; perhaps more astonishing is that this is an order of magnitude increase in just 4 years.
Even more encouraging is that this data is spread across every one of the 22 research disciplines. From Humanities to Radio Astronomy, no Field of Research has been left untouched.
33|
Collections by Field of Research
Last updated 14/12/2014Click graph for current figures
34|
About large allocations
The RDSI project allocated up to $9.4M of funds tosupport large collections at Nodes. Nodes were giventhe opportunity to submit funding proposals forcollections that were too large to be funded under theinitial ReDS agreement. Large collection proposalswere evaluated by the RDSI Project Board whodecided a total of 5 proposals will be funded.
35|
Large allocations
I’m really pleased about the large allocations. Having that strategic view at the research community level of how storage is used to aggregate data from particular domains in a way that enables advanced research, is a critical outcome of the ReDS programme.
‒ Peter HicksReDS Programme Manager, RDSI Project
36|
A national medical research data storage facility
Recognising that there was a potential need to support health and medical research data collections, we put in a proposal to establish a national medical data storage facility using funds from the ReDS special allocation process. We put that proposal to the medical research community and we were overwhelmed with responses. Forty-seven institutions around Australia nominated major collections they would like to store. So what we have now is an opportunity to build a national medical research data storage facility. This is something that is quite uncommon globally. It will allow researchers to get the serendipitous second use outcomes and impacts from that data.
‒ Dr Ian GibsonCEO, Intersect Australia
37|
The large allocations Murchison Widefield Array Data Archive: The Murchison Widefield
Array (MWA) project is funded for operations over a two-year period,
which commenced in July 2013. These data sets, of international
importance, will assist a global Astronomy and Astrophysics
community to do research into the main science goals of the MWA,
which include: exploration of the early Universe and the search for
signals from the first stars and galaxies; exploration of the transient
and dynamic Universe; studies of the Earth's ionosphere; and the study
of astrophysics related to objects in our galaxy and in the distant
galaxies.
Participating RDSI Node is iVEC.
National Environmental Research Data Centre: The National
Environmental Research Data Collection (NERDC) comprises
international and national reference collections spanning five fields.
This multidisciplinary confluence of collections: (a) spans the
lithosphere, crust, biosphere, hydrosphere, troposphere, and the
stratosphere; (b) encapsulates the complex interactions within, and
amongst, these layers, and (c) will enable new, transdisciplinary
approaches to research.
Participating RDSI Node is NCI.
National Genomics Data Storage Facility: Genomics is a critical and
complex science for the understanding of living forms and the way
they are impacted by the environment, for improving medicine, and
understanding food amongst many other uses. The facility will store
and make available large volumes of genome data generated at the
leading national centres, as well as essential national and
international genome libraries.
Participating RDSI Nodes are Intersect, VicNode and QCIF.
Australian Coordinated Characterisation Data Space: The ACCDS
underpins national-scale research programs, in particular two
recently established characterisation-intensive ARC Centres of
Excellence: the ARC Centre of Excellence in Advanced Molecular
Imaging (CAMI), which will develop innovative imaging
technologies to explore the immune system; and the ARC Centre of
Excellence for Integrative Brain Function (CIBF), which is tackling
the challenging problems involved in understanding how the human
brain works.
Participating RDSI Nodes are VicNode, Intersect and QCIF.
Australian National Medical Research Data Storage Facility: The
foundation data sets for this collection represent major national
assets supporting research into Australia’s most significant diseases
including heart disease, mental illness, the major cancers, as well as
the increasing problems of lifestyle diseases such as diabetes and
obesity. Importantly, children’s health and the health of our aging
population are both well supported in the foundation data sets of
the ANMRDSF.
Participating RDSI Nodes are Intersect, VicNode and QCIF.
An integrated infrastructure
Chapter IV
39|
Data Sharing Programme
The DaSh programme was designed to developthe DaSh Collaboration Network (DaShNet)and the DaSh Technical Architecture. Theintegration of these two major parts of theDaSh programme became the DaSh TechnicalFramework, which describes the network, datamovement capabilities, security and identitymatters, data access, cloud gateway access,test platform for the programme’s componentsand workflow automation capabilities for theRDSI-funded Nodes.
40|
Why is it hard?
Why is it hard, relative to other programs? It’s hard because data has a narrative that’s different for every data stream. Every data-oriented organisation thinks their data is special and different from everybody else’s. Really it should be about revealing data, gathering it together, and developing tools to express and analyse it.
‒ Prof Nathan BindoffTPAC Director and Professor of Physical Oceanography
41|
Innovative and ambitious
RDSI was an innovative and ambitious project. It seems like it ought to be straightforward, but it’s actually very hard to present datasets in a meaningful way to the world. It’s not a simple problem. RDSI has highlighted what’s possible.
‒ Jim McGovernInfrastructure Advisory Panel
42|
Building technical skills across the sector
One of the things the project has been able to do is to fund technical and data specialists at the Nodes. Many of the techniques and skills for storing, moving, and accessing large sets of data were new to the project and to the Nodes. Being able to invest in building up a community of people in the sector with these skills and capabilities has been one of the important contributions of RDSI.
‒ Viviani Paz RDSI Project Manager
Moving data
Chapter V
44|
The need for a fast network
One of the significant technical challenges to consider was moving data. We experienced this early in the project as the ARCS Data Fabric drew to a close and people were trying to move the Integrated Marine Observing System data from Perth to Hobart and Brisbane. The data was going at the speed of congealed porridge running down a hill. It was clear that the network capabilities—not just fast networks, but the techniques, the tools, the software for using them—weren’t in place. So the Project has solved a very serious challenge. When AARNetput in the data transfer network, they are now certifying that they’re getting 95 percent use of the capacity of a 10-gigabit link. That is enormous compared with what was there. It’s not just the bandwidth that’s changed.
‒ Dr Nick Tate RDSI Project Director
45|
The challenge in moving data
Without a doubt, the biggest challenge you have when moving data is that a lot of these datasets are built upon many files that are inherently small in nature, and they’re quite difficult to move over large distances. Even though you might have a lot of bandwidth available to you, you can find that 10 terabytes of data can take weeks if not months to transfer.
‒ Brendan DaveyDeputy Director, TPAC
46|
Network enhancements, AARNetand NRN
Our goal was to ensure we could get multiple 10 gigabits of capacity into the RDSI sites. We were looking to get every Node connected to the AARNet backbone with capacity significantly over and above what a big university would have, so they could provide services to a community. And to do that we needed to ensure there were redundant fibre paths, and the appropriate network infrastructure on those fibres to deliver that capacity.
‒ Peter ElfordNetwork Program Manager (2013), RDSI Project
47|
Connecting Nodes to Nodes, and Nodes to Researchers
The Data Sharing Network (DaShNet) is a reliable high-speed networkservice built over the new AARNet4 backbone network. It connectsRDSI-funded Nodes to each other and researchers around Australia. Itcan ultimately support up to 100 gigabits per second, significantlyincreasing data transfer rates across the country.
Accessing data
Chapter VI
49|
Making it easy to find and access data
Mediaflux will enable the research community to have useful data management tools that are consistent across Australia, so that people will be interacting with research data in the same way, irrespective of where they’re located.
‒ Dr Frankie Stevens Research Data Manager, RDSI Project
50|
Mediafluxand the Nodes
51|
Identity and access management
In Australia we already had the Australian Access Federation, a trust fabric for identity management. But one of the things that became very plain early on is that although it handles web access for everyone, it can’t yet do access via other methodologies. This is where the project got into territory where we were cutting new ground.
‒ Richard NorthamNode Development Manager, RDSI Project
Protecting data
Chapter VII
53|
Securing the data
One of the toughest parts of looking at the security models for this project was to understand how the Nodes would collaborate and share responsibility for the data and data transfers. How they would manage the relationships with researchers to give them a level of comfort that the integrity of their data is being maintained. When your research data has been under your direct control and then it goes outside your own perimeter, there’s a concern that you don’t know what’s happening with it. So for me, it’s been a management job of perception more than anything else.
‒ Mark McPhersonRDSI Security Policy Manager
54|
Will my data be safe?
One of our biggest challenges in getting people on board is to assure them that their data is going to be safe, it’s going to be secure, that they’ll have access to it, and their partners who use the data will have access to it.
‒ Brendan Davey Deputy Director, TPAC
55|
Removing the obstacles
We were initially concerned about whether researchers would adopt a central data storage facility at all. We drew up a list of 10 obstacles, and you could be pretty sure that if you started talking to a researcher about putting their data on RDSI, they’d start going through these 10 obstacles one by one without you prompting them. ‘I can’t touch it anymore, it’s insecure, you’re only here until the end of 2014, I might have to pay for it,’ and so on. One by one we’ve been able to make these obstacles insignificant.
‒ Rob Cook CEO, QCIF
Working with data
Chapter VIII
57|
Connecting with compute facilities
The Raijin supercomputer at NCI
“Data is a vital enabler of research. Big data can only be handled in a rich computational environment. It’s not the data alone. It’s not the compute alone. It’s the confluence of the two. People advance their research by being able to have well-managed, integrated collections of data where they can explore new ideas by having a confluence of different datasets available to them.”
‒ Prof Lindsay BottenDirector, National Computational Infrastructure
58|
Services researchers need
RDSI has allowed us a platform to develop the services that researchers actually need. It has completely revolutionised not only our thinking but our outlook and our future.
‒ Mary Hobson CEO, eResearch SA
Selecting solutions
Chapter IX
60|
Vendor Panel
The Vendor Panel programme, implemented in partnership with the
Council of Australian University Directors of Information Technology
(CAUDIT), was created to facilitate the procurement of storage
related infrastructure, software and services. The purposes for the
programme were twofold. Firstly to allow Nodes, universities and
other authorised users to avoid lengthy tendering processes by
using an appropriately constructed panel, and secondly to support
volume pricing across Nodes and the wider Higher Education and
Research Sector.
61|
An open mind towards solutions
In many of the research infrastructure projects we’ve seen before, there has been a focus on adopting only open source solutions or solutions developed by other researchers. We took the view that we should go with a completely open mind to the process. You can use commercial or open source or other not-for-profit infrastructure software. It doesn’t matter. The important thing is to pick the best that’s available at the time and to make sure it’s affordable, and that’s why we’ve been negotiating collectively. For example, the software we’re using in our data transfer network is a commercial product and by negotiating effectively, we’ve been able to acquire it at a price that allows the sector to do things which have not been possible in the past. With the Vendor Panel transferring to CAUDIT over the coming months, this is a legacy for the sector as a whole.
‒ Dr Nick TateRDSI Project Director
62|
Vendors on the PanelTwenty-two and counting
Last updated 14/12/2014Click image for current members
63|
Saving Money for the Sector
CIOs have challenges in navigating procurement for IT services, testing the market, keeping up with suppliers. The Vendor Panel simplified storage options for the whole sector and was a catalyst for CIOs to open a dialogue with the research community. It took data storage down from being too complicated, too hard, to being a commodity product you can buy and use as you need. Probably we won’t realise the benefits for a few years, but it saved the sector an enormous amount of money.
‒ Peter NikoletatosRDSI Project Board
64|
A rich set of proposals
I appreciated the opportunity to work alongside my colleagues in evaluating a rich set of proposals from a diverse group of respondents. The experience contributed to the expansion of my knowledge around the complex nature of research support structures. The professional and pragmatic driving force within RDSI brought great satisfaction in knowing that the effort would in time be a major enabler of Research within Australia.
‒ Rick Van HaeftenInfrastructure Advisory Panel and Independent Vendor Panel Evaluation Committee
65|
Towards public cloud
This is potentially the last generation of serious storage the Nodes will own. The project has looked extensively at how public cloud could complement in-house storage and compute, and we’ve established agreements with Amazon Web Services to eliminate costs in moving research data in and out of the public cloud, to enable the Nodes to make some informed decisions about that in the future.
‒ Paul CampbellVendor Engagement Specialist, RDSI Project
66|
A data storage ecosystem
When it comes to large-scale infrastructure, organisations for the foreseeable future will use a hybrid solution. They will have some capability internally, some from private cloud providers such as Intersect, and some from public cloud infrastructure. We are part of an ecosystem that allows the data to flow around these different parties which collectively provide an ongoing solution to data storage.
‒ Dr Ian GibsonCEO, Intersect Australia
Chapter X
From July 2013, the RDSI Project began collecting use cases on how research groups across Australia are using collections stored at RDSI nodes, and why RDSI-funded storage is important to their research. From high energy physics to the humanities, from climate to cancer research, researchers are discovering common needs around research data. They all need to preserve and store their data, access and share it with collaborators, bring disparate collections together to be analysed by common tools, and in many cases, reuse data that was collected by someone else or for a different purpose.
Case Studies
68|
A major newhuman genome collection
How RDSI is helping:
The sequencing will generate 4.5 petabytes
of data over the next 3 years. Storage
through RDSI helps reduce costs to
researchers and allows the data to be
moved easily among Nodes for analysis.
The outcome:
Australian researchers are positioned to
take a leading role in emerging genomics
research through access to cost-effective
genome sequencing and genomic
collections of international importance.
The challenge:
The Garvan Institute of Medical Research is
sequencing over 4000 healthy human
genomes to create a Medical Genomics
Reference Bank for researchers around the
world.
Image courtesy of P. Morris, Garvan Institute
“We see this as providing Australia with a seat at the table and an opportunity to be amongst the world leaders in an area that’s emerging so rapidly.”
– A/Prof Marcel Dinger
Head of Clinical Genomics and Genome Informatics, Garvan Institute
69|
Accessto what was once inaccessible
How RDSI is helping:
Through RDSI storage, Richard is now able
for the first time to make this collection of
over 1 petabyte of data accessible and
searchable by researchers everywhere.
The outcome:
The footage is being used by the
Queensland Government to track turtle
hatchling success rates at Raine Island and
by JCU to study the ecology and biology of
venomous box jellyfish.
The challenge:
Award-winning natural history
cinematographer and marine scientist
Richard Fitzpatrick has 20 years worth of
film footage of the complex behaviours of
ocean and terrestrial creatures.
“The fact that it’s now searchable is just huge. There are 5000 hours of footage, and now you can go in and chase stuff yourself. That in itself is monumental.”
– A/Prof Jamie SeymourDirector, Tropical Australian Venom Research Unit
70|
Opening the door to use large datasets in a HPCenvironment
How RDSI is helping:
RDSI storage makes available data that was
previously locked within agencies. RDSI
storage within the NCI computational
environment opens the door for using HPC
to work with these large datasets.
The outcome:
The National Flood Risk Information Project
(NFRIP) is using the Data Cube to create a
portal showing areas of land where surface
water has been observed from satellites in
the past, to raise community awareness of
flood risks.
The challenge:
Geoscience Australia is bringing together 30
years of earth observation satellite images
into a Data Cube that creates a geographic
time machine, allowing scientists to apply
the data to big problems such as managing
flood and fire risk.
“We now have hundreds of terabytes of satellite data covering all of Australia going back 30 years, and we wanted to begin applying it to big problems.”
‒ Dr Adam LewisNational Earth & Marine Observations Group, Geoscience Australia
71|
Preservingcollections
How RDSI is helping:
A growing number of these collections have
been brought together by the human
communication science community, stored
through RDSI, and are now accessible
through the Alveo virtual laboratory, funded
by NeCTAR.
The outcome:
These collections are being preserved and
used in new ways by linguists, psychologists,
musicologists, and computational scientists.
The challenge:
Collections containing real examples of the
use of speech, language, and music were
stored in locations disparate from one
another. Accessibility was difficult, and
some collections were at risk of being lost.
72|
End-to-endresearch data management
The challenge:
The Australian Synchrotron needed to
protect, store, provide access to, and allow
researchers to share, reuse, and validate
data from Synchrotron beamline
experiments.
How RDSI is helping:
Technical staff from the Monash eResearch
Centre and VicNode worked with the
Synchrotron to develop a solution, which
uses RDSI storage to store and provide
access to the data. It also uses the NeCTAR
Research Cloud and DOIs from ANDS.
The outcome:
Store.Synchrotron is the first persistent,
open data store in the world for a
synchrotron. Thousands of datasets have
been stored in the permanent, accessible
archive.
“Ours is going to be the only system in the world where all of the primary data from the beamlines, every frame, will go into the store. And it will be there. This is an absolute world first.”
‒ Dr Tom Caradoc-DaviesPrincipal Scientist for Macromolecular Crystallography, Australian Synchrotron
73|
Data reuse and new collaborations
How RDSI is helping:
RDSI storage integrated with the NeCTAR
Research Cloud allows these models to be
available and easily run by other
researchers. As a result, use of the models
by other groups is growing rapidly.
The outcome:
A group of National Resource Management
Regions (NRMs) has found them so
beneficial they are funding Dr VanDerWal’s
group to include fresh water species
information.
The challenge:
Dr Jeremy VanDerWal at JCU creates
models to study how climate change will
affect bird and animal populations. The
models were previously behind university
firewalls. Providing access to others was
difficult.
“Previously my thinking was limited by the small amount of storage and computing that was available to me. I always had to summarise down and minimise the data. I don’t have to do that now. I don’t have to worry about the live disk limitation or the compute resources. Now I can keep doing the research as I’d like to see it done.”
‒ Dr Jeremy VanDerWalCentre for Tropical Biodiversity and Climate Change, James Cook University
Looking back and looking ahead
Chapter XI
75|
Project Success
The RDSI Project set out to transform the way in which research data in Australia is stored and made available to its potential users. By any measure, the project has been successful in achieving this.
When the contract for the project was signed between the University of Queensland and the Commonwealth Government on Christmas Eve 2010, it has been estimated that there was a total of about 5 Petabytes of research data stored throughout the sector and that much of this was inaccessible to most researchers. By the end of the project in 2015, it is expected that over 55 Petabytes of data will be available in over 70 Petabytes of storage. Even more importantly, this will be stored in facilities that are able to make it collaboratively available to researchers.
‒ Dr Nick Tate RDSI Project Director
76|
Are researchers finding it valuable?
Researchers are voting with their data. They’re bringing it to the Nodes, they’re putting it on. It’s a great leap forward.
‒ Peter Elford Director, Government Relations, AARNet
77|
Cultural change
‒ In addition to putting the tin on the ground to store data, a key success of the RDSI Project has been to facilitate cultural change around collaboration and sharing.
‒ Brian Anker Chair, RDSI Project Board
78|
Supporting research activities
Coming from an IT background, I learned so much from working on this project about supporting research activities in the next decade. It’s not just about compute and store, it’s about collaboration. It’s about connecting, about access, about identity. It’s about protecting the work. It’s about curation of data. It’s about making it available for groups, not just for now but in the future. It’s just so big, it takes a while to get your head around it.
‒ Peter NikoletatosRDSI Project Board
79|
Fingers into the future
With a lot of projects there is a start, a finish, and you move onto the next thing. This thing really has fingers into the future in being able to act as a building block for other initiatives to build on.
‒ Brian AnkerChair, RDSI Project Board
80|
Data access
Our views of data have changed as RDSI has evolved. When the project started, everyone was talking about data storage. But as you start storing the data, you realise the real problem is access. In the early days, the access mechanisms available were extremely primitive. Now we have the data stored, we have the Mediaflux tool to make data easy to find and access, we have Asperafor moving large quantities of data. These weren’t even really imagined when we started the RDSI project. As we go forward, people will begin to realise that the real value that’s been delivered by the system is the organisation of the data into a way that people can find it, access it, and manage it. That’s a big change that RDSI has made.
‒ Rob CookCEO, QCIF
81|
Thinking outsidethe box
A lot of people say to me, ‘This is great. I can now go to one location, I can have access to this dataset and to this other dataset right alongside, whereas before I’d have to go to multiple locations to get all the data I needed.’ And what I’m hearing in the wider research community is that people are starting to think outside the box. They can now suddenly combine two datasets from different disciplines together and potentially do new science. So it’s quite exciting.
‒ Brendan Davey Deputy Director, TPAC
82|
Focuson research
Once you’ve solved the problem of knowing how to move data around and knowing where to put it, you can start to focus on other things. And that’s really where RDSI will continue to change research in Victoria. You won’t need to focus on where to put the data and whether or not it’s a good idea to put it there. You can move on. As an operator, the best thing is when you do get people to use these services and they never, ever call you again. Because it means the service is humming away.
‒ Dr Steven ManosManager Research Services in ITS, The University of Melbourne
83|
The valueof data
Researchers are beginning to realise the value of their data. A professor in pathology from The University of Melbourne was one of our first major consumers of data storage. We had gone in to speak to her. Their archiving solution was a set of hard drives on a shelf, and she wanted advice on a better solution. She said, ‘Well, I pay $70,000 a year in liquid nitrogen to preserve my physical tissue samples. Why wouldn’t I pay the equivalent to look after my digital assets?’ For her, the value proposition was obvious.
‒ Dr Steven ManosManager Research Services in ITS, The University of Melbourne
84|
A million-fold increase in scale
In 1996 I became the Chairman of the World Ocean Circulation Experiment Data Products Committee, which was all about assembling the data from this billion dollar international project. I had 12 organisations working for me and in those 12, probably 35 effective full-time staff delivering up data on a regular basis. To give you a sense of the scale, the final product in 2002 from this billion dollar experiment fit onto a single DVD. It was just 4.7 gigabytes, but we won accolades because that dataset, huge at the time, was delivered across the Internet. Now in 2014, we have over 30 petabytes of data approved for ingestion into the RDSI Nodes across Australia. So that’s nearly than a million-fold increase in data, with fewer people involved, and with the additional challenge that the datasets come from a much more diverse research community.
‒ Prof Nathan BindoffDirector of TPAC and Professor of Physical Oceanography
85|
Putting in a chair lift
Sharing research data for everyone’s benefit involves taking a risk. You have to climb a hill. With the RDSI investment, the government has put in a chair lift to help the research sector get up the hill to see the benefit on the other side.
‒ Prof Liz SonenbergRDSI Project Board
86|
Enhanced research capacity
The success of RDSI is unparalleled and its legacy is a substantially enhanced research capacity. To keep Australia competitive in international research we need more assured funding to support not just the large capital investment needed to continually build our computational capacity and data storage but to fund the people to ensure these investments are worthwhile and that they serve the needs of our cutting edge researchers.
‒ Prof Doug McEachernRDSI Projet Board
87|
Moving out of the data ice age
An interesting lesson that wasn’t clear to me when we started is that we’re really at a very early stage of maturity in dealing with data. We didn’t realise that we were in the ice age; it was all frozen. You can get a real sense of excitement from recognising what people will be able to do with data when it becomes so easy to use. And it will, you know.
‒ Rob CookCEO, QCIF
88|
Evolution
I think the RDSI Project has taken us through a significant learning curve. Data is a tricky thing. It’s so multi-dimensional, and it has so many owners. Working with the multiplicity of interests is quite challenging. And so the place we’ve ended up is by evolution. You would not have been able to write it down on a piece of paper on Day One.
‒ Prof Lindsay BottenDirector, National Computational Infrastructure
89|
The pathway is real
It’s important to understand that it’s not just about where we have ended up. It’s about the pathway. The pathway is real.
‒ Dr Rhys Francis RDSI Project Board
90|
Passing the Baton
It has been quite a journey over the past 4 years as together we have created this extraordinary infrastructure for the research sector. The project has tackled a rich tapestry issues and challenges, but with the help of all our stakeholders we now have a result to be proud of.
Researchers have made significant gains in the way they interact with their data by being able to concentrate on their research rather than worrying about the volume of data they are producing or the mechanisms to store that data.
We now pass the baton for continued development and support of this national infrastructure to the RDSI Node Operators, who will lead the next step in its evolution. We wish them luck and long lasting sustainability.
‒ Dr Nick Tate RDSI Project Director
The team
Chapter XII
92|
Project Office Communications
Project Director
Dr Nick Tate
+61 7 3365 2019 | +61 412 674 010
Communications Officer
Asher Vennell
+61 408 517 376
Project Manager
Viviani Paz
+61 7 3365 2033 | +61 402 280 257
Storyteller
Patricia McMillan
+61 434 602 050
Office Manager
Toni Walkinshaw [email protected]
+61 7 3365 2030 | +61 419 477 490
Solutions Specialist
Loretta Davis
+61 407 370 474
93|
Data Sharing (DaSh)
Vendor Engagement Specialist
Paul Campbell
+61 7 3878 2666 | +61 402 002 266
Security Policy Manager
Mark McPherson
+61 418 425 872
Node Development (NoDe)
NoDe Manager
Richard Northam
+61 417 044 625
Research Data Services (ReDS)
ReDS Programme Manager
Peter Hicks
+61 401 103 640
Research Data Manager
Dr Markus Buchhorn
+61 417 281 429
Research Data Manager
Dr Frankie Stevens
+61 435 657 730
94|
Interacting with the community
I’ve really enjoyed the breadth of interaction we’ve been able to have as a project team across the stakeholders– the research communities, the universities and science agencies, the Nodes. It’s been wonderful to be able to talk to all of those stakeholder groups. One of the things that’s been most interesting for me has been to see the different approaches those groups bring to data—how research data management might be viewed at the institutional level, versus the state level, versus a person working in a laboratory.
‒ Dr Frankie StevensResearch Data Manager, RDSI Project
95|
From principle to practice
When I first started with RDSI, communications were focused on the overall brand awareness of the project. As the Nodes capabilities improved and became operational, there was a fundamental shift towards their accomplishments. The metamorphosis from principle to practice has seen collective available data go from zero to over 16 petabytes.
‒ Asher Vennell Senior Communications Officer, RDSI Project
96|
Exceeding expectations
Having worked in distributed research environments, I had an understanding of research data needs. As I was welcomed into the project team it became apparent that RDSI was not only meeting data storage requirements, but exceeding all researcher expectations.
‒ Toni Walkinshaw Office Manager, RDSI Project
97|
From principle to practice
The early stages of a major project like this are hard work, and you don’t get to see the real value until later when people begin to use it. I had the opportunity towards the end of the project to talk with researchers who were adding data or using data via the Nodes. Their response to this new capability was overwhelming. They described how it was enabling new collaborations, giving them access to data that had previously been locked away, and fuelling new research they would not have been able to do without it.
‒ Patricia McMillan Storyteller, RDSI Project
98|
Nodes
Board MembersBrian Anker
Independent Chair
Dr Rhys Francis
Director - eResearch Futures P/L
Professor Doug McEachern
Former Pro Vice Chancellor Research and Innovation -The University of Western Australia
Professor Anton Middelberg
Deputy Vice Chancellor (Research) - The University of Queensland
Former Board Members
Peter Nikoletatos
Executive Director and Chief Information Officer - La Trobe University
John Shipp
Vice-President - Australian Library and Information Association
Professor Liz Sonenberg
Pro Vice-Chancellor (Research Collaboration) - The University of Melbourne
Professor Max Lu
Provost and Senior Vice-President - The University of
Queensland
Professor Jill Trewhella
Deputy Vice-Chancellor (Research) - The University of
Sydney
The end