services for sensitive data and ebiobanks at university of oslo · 2015-05-06 · services for...
TRANSCRIPT
![Page 1: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/1.jpg)
Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research Support Services Group Leader of the TSD project University Center for Information Technology (USIT) University of Oslo
![Page 2: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/2.jpg)
Outline • TSD promo J • What is sensitive data • Laws and regulations • TSD overview • TSD nice-to-know • TSD services • TSD opportunities and • Risk
Gard Thomassen,TSD 2.0
![Page 3: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/3.jpg)
Computerworld 16/5-14
![Page 4: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/4.jpg)
Norsk KreftGenom Konsortium Sammenliknet med den hardware vi benyttet fram til overgangen til TSD, som vel kan karakteriseres som en middels brukbar tjenermaskin, med 64 kjerner, kan vi med TSD oppnå en teoretisk hastighetsforbedring på 30X. I tillegg til dette kommer at vi har opitmalisert vår analysepipeline, ved at vi har parallellisert flere trinn. Tidligere ville en sekvenseringsanalyse på 48 svulst/normal-par resultert i kjøringstid på to-tre måneder minimun. Vi kjørte nå denne uka på TSD det samme på to dager og noen timer. Altså forsiktig sagt en dramatisk forbedring. Prof Eivind Hovig, NCGC
![Page 5: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/5.jpg)
Teknisk ukeblad & e24, 5/5-14
![Page 6: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/6.jpg)
Uniforum
![Page 7: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/7.jpg)
What is sensitive data?
• Personal Data Act §2, point 8 – race/ethnic data, political opinion, philosophical
and religious beliefs, the fact that a person has been suspected of, charged with, indicted for or convicted a criminal act, health, sex life and trade-union membership
• Biotechnology Act • Health Registry Act • And so on..
Gard Thomassen,TSD 2.0
![Page 8: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/8.jpg)
System requirements • Security, isolation and access control as given by law • Large storage capacity • Multi tenant (multiple users) • High performance computing (HPC) resource • High bandwidth • Easy to maintain and operate • Easy to use and “practical” (also for audio and video) • Some freedom within confined user space • Accessible from anywhere through proper mechanisms • A variety of software and public data-sources must be available • Windows and Linux support (server/host-side) • Data collection services • Data sharing services
Gard Thomassen,TSD 2.0
![Page 9: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/9.jpg)
Tough requirements, tough project
Services for Sensitive Data – TSD (Norwegian: Tjenester for Sensitive Data)
Started initial work with a pilot in 2009 Full fledged services in production spring 2014
Gard Thomassen,TSD 2.0
![Page 10: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/10.jpg)
System outline
Gateway
HPC - Colossus VM-server
Storage
Internet
Secure encrypted network to special high volume data production sites
1 (project)
1 (storage area)
n 1
Gard Thomassen,TSD 2.0
![Page 11: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/11.jpg)
Using TSD
VM U1 S1
S1
TSD disk
VM U2 S1
GW User1 Study1
Colossus disk
Colossus
Front end Colossus
Gard Thomassen,TSD 2.0
User2 Study1
TSD S1 DB
![Page 12: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/12.jpg)
Data import and export using TSD
“Sluice-server”
Virtual “sluice- server”
Virtual project-server
“Sluice HD”
Project HD
TSD
NFS mount
2
Data copied here by ssh+scp or web-drive (2-factor authentication) encrypted data if sensitive
1 4
3
Gard Thomassen,TSD 2.0
![Page 13: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/13.jpg)
Data collection using TSD
“Nettskjema-minID” Nettskjem hjemmeside
Gard Thomassen,TSD 2.0
minID
Project VM
Project disk
Import mechanism
Encrypted XML (PGP)
TSD
![Page 14: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/14.jpg)
What TSD offers at present
• Secure storage • Secure data analysis • Linux or windows hosts • Secure import and export • Web-based data harvesting • HPC cluster • Postgres DBs
![Page 15: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/15.jpg)
HPC resource – Colossus • At present about 1500 cores (~30 TFLOPs) • No project users are to log in on any nodes • One global job daemon to control data integrity
(to ensure project data separation) • $SCRATCH will be on a per project basis and
cleaned after each job finishes • As similar to Abel (the non-sensitive HPC
resource in Oslo) as possible • Separate disk system for parallel file-system • Huge-mem nodes and Infiniband interconnect
16
Gard Thomassen,TSD 2.0
![Page 16: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/16.jpg)
Practical things to remember
• How to get onboard • Login • Where is my data • What is backed up • What needs to be encrypted • Where can I access TSD from • How to get HPC access • What does it cost • How to use Nettskjema • Where do I send my questions :
![Page 17: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/17.jpg)
Technical details • KVM for virtualization (RedHat Linux) • Cerebrum as provisioning (a USIT application) • AD system administration guided by the provisioning
system (duplicated) • FreeBSD firewall and gateway (duplicated) • Integration with IDporten (Norwegian governmental
eID system) for www-enquiries and applications • Storage with separation between projects (Hitachi
disc system and encrypted backup to tape) • IPv6 on the inside (… and private IPv4) • Free Radius for 2-factor auth • Separate console server (physical)
18
Gard Thomassen,TSD 2.0
![Page 18: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/18.jpg)
Security details
• OATH TOTP 2-factor authentication – Smart phones or programmable hardware tokens
• Import/export is under strict control • No open connection to the internet • Strong separation between projects (VLAN) • Hardened FreeBSD gateway and firewall • Encrypted backup, one key per project • Sys-admins are single users (traceability) • Sys-admins have to use same authentication process • Hardware is physically separated from other UiO hardware
Gard Thomassen,TSD 2.0
![Page 19: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/19.jpg)
Future of TSD - main topics • How to handle video and sound
– harvesting – management – metadata – analysis
• Journal system for Psychologists (Univ of Umeå collaboration) • Biobanks • VMware and VDI infrastructure (BLAST or Thinlinc for Linux, PCoIP for
windows) • Galaxy inside TSD in full scale • Elixir helpdesk connected to TSD • Running Docker containers • Hosting of user-defined VMs -> no! at least not now
![Page 20: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/20.jpg)
Risk-analysis
• System har been discussed with Datatilsynet – no major worries
• Risk analysis has been performed by USIT and no serious issues detected as of February 2015.
• OUS and AHUS and VVHF and several orthers are on board as users
• We have a board of advisory for all changes • Backup has been
![Page 21: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/21.jpg)
Main collaborators on TSD
Collaborators • Norwegian Storage Infrastructure (NorStore) • Norwegian Genetics Analysis Platform (GenAp) • Norwegian Dietary Registry (Medical Faculty) • Institute of Psychology (Faculty of Social Sciences) • Norwegian Cancer Sequencing Consortium (NCGC) Reference group Oslo University Hospital, NorStore, Regional Ethical Committee, National Institute of Public Health, Norwegian Cancer Registry, Research Network at OUS, Elixir Norway, NCGC, GenAP, Institute of Psychology,
Gard Thomassen,TSD 2.0
![Page 22: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/22.jpg)
Capabilities enabled by TSD
• Large scale NGS research on human genomes • Large scale medical imaging studies • Large scale population studies with web-based
data collection • Off-site analysis of sensitive data • Secure storage for verification of published
research • eBiobank hosting • Electronic consent
Gard Thomassen,TSD 2.0
![Page 23: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/23.jpg)
Nordic collaboration opportunities • Laws are fairly similar (Norway very strict) • Difficult to exchange sensitive data for research • One should learn from each other as these systems
demands very special IT-knowledge • Services development and system-administration
know-how is non-sensitive and may be shared • Building TSD addressed many novel security
questions in a University setting to be learnt from • Large DBs/registeries of health data may enable very
interesting research in the future • TSD is involved in the NeIC-based Tryggve project • We are happy to collaborate!
Gard Thomassen,TSD 2.0
![Page 24: Services for Sensitive Data and eBiobanks at University of Oslo · 2015-05-06 · Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research](https://reader033.vdocuments.mx/reader033/viewer/2022060405/5f0f0f487e708231d4424a20/html5/thumbnails/24.jpg)
People involved
• tsd-core@usit • virt-core@usit • storage-core@usit • postgres-core@usit • network-core@usit • hpc-core@usit • windows-core@usit • unix-core@usit • IT-security@usit
Project group / developers • IT-dir Lars Oftedal • Hans A. Eide • Märtha Felton
Administration / associated
Gard Thomassen,TSD 2.0