services for sensitive data and ebiobanks at university of oslo · 2015-05-06 · services for...

Post on 27-Jun-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Services for Sensitive Data and eBiobanks at University of Oslo Gard Thomassen, PhD Head of Research Support Services Group Leader of the TSD project University Center for Information Technology (USIT) University of Oslo

Outline •  TSD promo J •  What is sensitive data •  Laws and regulations •  TSD overview •  TSD nice-to-know •  TSD services •  TSD opportunities and •  Risk

Gard Thomassen,TSD 2.0

Computerworld 16/5-14

Norsk KreftGenom Konsortium Sammenliknet med den hardware vi benyttet fram til overgangen til TSD, som vel kan karakteriseres som en middels brukbar tjenermaskin, med 64 kjerner, kan vi med TSD oppnå en teoretisk hastighetsforbedring på 30X. I tillegg til dette kommer at vi har opitmalisert vår analysepipeline, ved at vi har parallellisert flere trinn. Tidligere ville en sekvenseringsanalyse på 48 svulst/normal-par resultert i kjøringstid på to-tre måneder minimun. Vi kjørte nå denne uka på TSD det samme på to dager og noen timer. Altså forsiktig sagt en dramatisk forbedring. Prof Eivind Hovig, NCGC

Teknisk ukeblad & e24, 5/5-14

Uniforum

What is sensitive data?

•  Personal Data Act §2, point 8 –  race/ethnic data, political opinion, philosophical

and religious beliefs, the fact that a person has been suspected of, charged with, indicted for or convicted a criminal act, health, sex life and trade-union membership

•  Biotechnology Act •  Health Registry Act •  And so on..

Gard Thomassen,TSD 2.0

System requirements •  Security, isolation and access control as given by law •  Large storage capacity •  Multi tenant (multiple users) •  High performance computing (HPC) resource •  High bandwidth •  Easy to maintain and operate •  Easy to use and “practical” (also for audio and video) •  Some freedom within confined user space •  Accessible from anywhere through proper mechanisms •  A variety of software and public data-sources must be available •  Windows and Linux support (server/host-side) •  Data collection services •  Data sharing services

Gard Thomassen,TSD 2.0

Tough requirements, tough project

Services for Sensitive Data – TSD (Norwegian: Tjenester for Sensitive Data)

Started initial work with a pilot in 2009 Full fledged services in production spring 2014

Gard Thomassen,TSD 2.0

System outline

Gateway

HPC - Colossus VM-server

Storage

Internet

Secure encrypted network to special high volume data production sites

1 (project)

1 (storage area)

n 1

Gard Thomassen,TSD 2.0

Using TSD

VM U1 S1

S1

TSD disk

VM U2 S1

GW User1 Study1

Colossus disk

Colossus

Front end Colossus

Gard Thomassen,TSD 2.0

User2 Study1

TSD S1 DB

Data import and export using TSD

“Sluice-server”

Virtual “sluice- server”

Virtual project-server

“Sluice HD”

Project HD

TSD

NFS mount

2

Data copied here by ssh+scp or web-drive (2-factor authentication) encrypted data if sensitive

1 4

3

Gard Thomassen,TSD 2.0

Data collection using TSD

“Nettskjema-minID” Nettskjem hjemmeside

Gard Thomassen,TSD 2.0

minID

Project VM

Project disk

Import mechanism

Encrypted XML (PGP)

TSD

What TSD offers at present

•  Secure storage •  Secure data analysis •  Linux or windows hosts •  Secure import and export •  Web-based data harvesting •  HPC cluster •  Postgres DBs

HPC resource – Colossus •  At present about 1500 cores (~30 TFLOPs) •  No project users are to log in on any nodes •  One global job daemon to control data integrity

(to ensure project data separation) •  $SCRATCH will be on a per project basis and

cleaned after each job finishes •  As similar to Abel (the non-sensitive HPC

resource in Oslo) as possible •  Separate disk system for parallel file-system •  Huge-mem nodes and Infiniband interconnect

16

Gard Thomassen,TSD 2.0

Practical things to remember

•  How to get onboard •  Login •  Where is my data •  What is backed up •  What needs to be encrypted •  Where can I access TSD from •  How to get HPC access •  What does it cost •  How to use Nettskjema •  Where do I send my questions :

–  tsd-contact@usit.uio.no –  tsd-drift@usit.uio.no

Technical details •  KVM for virtualization (RedHat Linux) •  Cerebrum as provisioning (a USIT application) •  AD system administration guided by the provisioning

system (duplicated) •  FreeBSD firewall and gateway (duplicated) •  Integration with IDporten (Norwegian governmental

eID system) for www-enquiries and applications •  Storage with separation between projects (Hitachi

disc system and encrypted backup to tape) •  IPv6 on the inside (… and private IPv4) •  Free Radius for 2-factor auth •  Separate console server (physical)

18

Gard Thomassen,TSD 2.0

Security details

•  OATH TOTP 2-factor authentication –  Smart phones or programmable hardware tokens

•  Import/export is under strict control •  No open connection to the internet •  Strong separation between projects (VLAN) •  Hardened FreeBSD gateway and firewall •  Encrypted backup, one key per project •  Sys-admins are single users (traceability) •  Sys-admins have to use same authentication process •  Hardware is physically separated from other UiO hardware

Gard Thomassen,TSD 2.0

Future of TSD - main topics •  How to handle video and sound

–  harvesting –  management –  metadata –  analysis

•  Journal system for Psychologists (Univ of Umeå collaboration) •  Biobanks •  VMware and VDI infrastructure (BLAST or Thinlinc for Linux, PCoIP for

windows) •  Galaxy inside TSD in full scale •  Elixir helpdesk connected to TSD •  Running Docker containers •  Hosting of user-defined VMs -> no! at least not now

Risk-analysis

•  System har been discussed with Datatilsynet – no major worries

•  Risk analysis has been performed by USIT and no serious issues detected as of February 2015.

•  OUS and AHUS and VVHF and several orthers are on board as users

•  We have a board of advisory for all changes •  Backup has been

Main collaborators on TSD

Collaborators •  Norwegian Storage Infrastructure (NorStore) •  Norwegian Genetics Analysis Platform (GenAp) •  Norwegian Dietary Registry (Medical Faculty) •  Institute of Psychology (Faculty of Social Sciences) •  Norwegian Cancer Sequencing Consortium (NCGC) Reference group Oslo University Hospital, NorStore, Regional Ethical Committee, National Institute of Public Health, Norwegian Cancer Registry, Research Network at OUS, Elixir Norway, NCGC, GenAP, Institute of Psychology,

Gard Thomassen,TSD 2.0

Capabilities enabled by TSD

•  Large scale NGS research on human genomes •  Large scale medical imaging studies •  Large scale population studies with web-based

data collection •  Off-site analysis of sensitive data •  Secure storage for verification of published

research •  eBiobank hosting •  Electronic consent

Gard Thomassen,TSD 2.0

Nordic collaboration opportunities •  Laws are fairly similar (Norway very strict) •  Difficult to exchange sensitive data for research •  One should learn from each other as these systems

demands very special IT-knowledge •  Services development and system-administration

know-how is non-sensitive and may be shared •  Building TSD addressed many novel security

questions in a University setting to be learnt from •  Large DBs/registeries of health data may enable very

interesting research in the future •  TSD is involved in the NeIC-based Tryggve project •  We are happy to collaborate!

Gard Thomassen,TSD 2.0

People involved

•  tsd-core@usit •  virt-core@usit •  storage-core@usit •  postgres-core@usit •  network-core@usit •  hpc-core@usit •  windows-core@usit •  unix-core@usit •  IT-security@usit

Project group / developers •  IT-dir Lars Oftedal •  Hans A. Eide •  Märtha Felton

Administration / associated

Gard Thomassen,TSD 2.0

top related