across international borders internet2 global …...building virtual research organizations across...
TRANSCRIPT
Building Virtual Research Organizations across International Borders
Internet2 Global Summit - 2015
Office of CyberInfrastructure and Computational Biology, NIAID/NIH
April 30, 2014
Chris Whalen - Int’l Program Manager OCICBScott Koranda - Spherical Cow Group
Office of Cyberinfrastructure and Computational Biology
▪ National Institute of Allergy and Infectious Diseases – NIAID• One of 27 institutes of the NIH whose mission is to conduct and
support basic and applied research to better understand, treat, and ultimately prevent infectious, immunologic, and allergic diseases
▪ OCICB – IT service desk, infrastructure, Software Development, and computational bioscience consulting and support
▪ OCICB International Program began in 1998 to support scientific collaborations of the NIAID to support malaria research in West Africa
2
Overview virtual research organizations
1. The core services of a network supporting scientists2. Importance of collaborative networks in life sciences3. Increasing and changing data types4. The International Centers of Excellence for Research5. Changing network and support topologies for International science6. Using Trust Federations for service delivery7. The Virtual Research Organization 8. Using COmanage as a tool to support VROs
3
Core network services for life science (and any organization)
4
• User authentication verification and identity
• email• File Sharing• Voice over IP• Video Conferencing• Data Transfer• Patch management• Anti-malware management• Intrusion Detection• Firewall• Access to services – journals,
databases, workflows
The traditional Local-Wide-Area-Network
5
Collaborative Networks are fundamental to Science
▪ “The days of the lone scientist, immersed in their laboratory, locked in their disciplinary silo, narrowly focused on basic research problems is rapidly becoming a thing of the past. In their place, we see the emergence of a new breed of “Team Science”; where large, cross-disciplinary teams focus on complex, applied and translational problems.” Simon Williams
• http://blogs.nature.com/soapboxscience/2013/03/15/team-science-the-science-of-collaborative-research
▪ Technology needs to adapt to support these changes
6
Collaboration Example
7
Collaborative networks▪ Medical Research Council (MRC) Centre for Genomics and Global Health, University of Oxford, Oxford, UK.▪ Mahidol-Oxford Tropical Medicine Research Unit, Mahidol University, Bangkok, Thailand▪ Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK.▪ National Institute of Allergy and Infectious Diseases, US National Institutes of Health, Bethesda, Maryland, USA.▪ National Center for Parasitology, Entomology and Malaria Control, Phnom Penh, Cambodia.▪ Department of Immunology and Medicine, US Army Medical Component, Armed Forces Research Institute of Medical Sciences (USAMC-AFRIMS), Bangkok, Thailand.▪ USAMC-AFRIMS, Phnom Penh, Cambodia.▪ Armed Forces Health Surveillance Center, Silver Spring, Maryland, USA.▪ Navrongo Health Research Centre, Navrongo, Ghana.▪ Department of Molecular Tropical Medicine and Genetics, Faculty of Tropical Medicine, Mahidol University, Bangkok, Thailand.▪ Howard Hughes Medical Institute, University of Maryland School of Medicine, Baltimore, Maryland, USA.▪ Shoklo Malaria Research Unit, Mae Sot, Tak, Thailand.▪ Centre for Tropical Medicine, University of Oxford, Oxford, UK.▪ Global Malaria Programme, World Health Organization, Geneva, Switzerland.▪ Unité d'Immunologie Moléculaire des Parasites, Institut Pasteur, Paris, France.▪ Oxford University Clinical Research Unit, Wellcome Trust Major Overseas Programme, Ho Chi Minh City, Vietnam.▪ MRC Laboratories, Fajara, The Gambia.▪ London School of Hygiene and Tropical Medicine, London, UK.▪ Malaria Research and Training Center, Faculty of Pharmacy, University of Science, Techniques and Technologies of Bamako, Bamako,
Mali.▪ Institut de Recherche en Sciences de la Santé, Direction Régionale de l'Ouést, Bobo-Dioulasso, Burkina Faso.▪ Menzies School of Health Research, Charles Darwin University, Darwin, Northern Territory, Australia.▪ Department of Statistics, University of Oxford, Oxford, UK.▪ Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK.
8
Globalization of Research
9
Sample Sequencing: Boston
High Performance Computing: Chicago
Specimen collected and logged: Bamako
Patient (volunteer) demographics and labs: Bethesda
Data Management and Validation: Kampala
Bioinformaticians: Cape Town
Globalization of Research
10
Sample Sequencing: Boston
High Performance Computing: Chicago
Specimen collected and logged: Bamako
Patient (volunteer) demographics and labs: Bethesda
Data Management and Validation: Kampala
Bioinformaticians: Cape Town
Investigators: Oxford, Boston, Bethesda, Chicago, Bamako
Challenges: Global Scientific Collaboration Networks
11
By Olivier H. Beauchesne
Challenges:Research data formats and tools
▪ Text – patient demographics, lab results, GPS, etc. ▪ Genomics
• Next Generation Sequencing▪ Medical Imaging
• CT/PET/CT, HRCT• Ultrasound
▪ Proteomics▪ High Performance Computing Tools▪ Flow Cytometry▪ Audit Trails
12
Challenges:Science supporting tools provided by the “cloud”
▪ Asana, Redbooth, and others for project management▪ Dropbox, Google Drive, Box, and others for data synchronization▪ Skype, Hangouts, Jabber, WebEx, GotoMeeting, etc. for voice and video▪ Amazon Web Services, Rackspace, Azure, are examples of
Infrastructure for HPC and other uses to support science▪ Communication and collaboration – from email and IM to Social –
Facebook, YouTube, Vimeo, Twitter,
13
Proliferation of cloud services
"In some categories, the fragmentation of cloud services impedes collaboration across teams, introduces friction and creates cost inefficiencies. In addition, employees may not fully understand the risk of cloud services before using them in the workplace."
SkyHigh Networks Q1 2015, Cloud Adoption and Risk in Government Report
14
International Centers for Excellence in Research
• Faculty of Medicine, University of Bamako, Mali– Malaria and its Vector– Lassa Fever (and other hemorrhagic fevers such as Ebola)– Human Immunodeficiency Virus– Tuberculosis– Leishmaniasis, Filariasis
• Ugandan Virus Research Institute, Entebbe, Uganda– Human Immunodeficiency Virus– Tuberculosis– Malaria
• National Institute for Research in Tuberculosis, Chennai, India– Human Immunodeficiency Virus– Tuberculosis
15
16
Office of Cyberinfrastructure and Computational Biology International Support
From satellite to terrestrial bandwidth
17
Traffic Aggregation modelVPN Tunnels through internet
18
Disaggregation modelUsing cloud services with distributed networks
19
Distributed services – the LAN…inside-out
▪ Cloud platform offers many of the services • Network Infrastructure Management Tools• Project Management• Service Desk • Service monitoring• Anti-malware• Application and Operating system patch management• File Sharing• Communications
20
The problem with moving services from the LAN to the cloud is Identity and Groups
▪ Every cloud service has its own user database and directory• Network Infrastructure Management Tools • Project Management• Service Desk • Service monitoring• Anti-malware• Application and Operating system patch management• File Sharing• Communications
▪ External collaborators need an account in each application they use as well
21
Identity Aggregation and Grouping
▪ Many applications now provide single-signon using Facebook, Google, Twitter, and other identity providers• some problems with private IdPs include commercial ownership and
less assurance of credentials and PRIVACY
▪ Academic and Research institutions have created Trust Federations• InCommon• Canarie• EduGAIN
22
The EduGAIN Trust Interfederation
23
Solution:Leverage Trust federations
▪ Most researchers are based at institutions that are members of a Academic and Research Trust federation, such as InCommon and EduGAIN (inter-federation)
▪ Use this solution for multiple services to build a collaboration platform at the ICER (Service Providers = SP)
▪ NIH sponsored the two African ICERs into InCommon so we can make each of them an Identity Provider (IdP) to other federated institutions and service providers
24
Collecting identities into collaborative organizations
▪ Security Assertion Markup Language (SAML) is the technology used to federate in Academic and Research world
▪ SAML uses Attributes▪ InCommon requires no attributes▪ International Research and Scholarship Attributes
• Email• Name• eduPPN• eduPSA (scope affiliation)• eduPTI targeted ID [unique ID]
25
26
SAML Single sign-on for InCommon SPs and IdPs
CreatingVirtual Research Organizations (VRO)
27
AfricanVirtual Research Organization Requirements
▪ Need to use the VRO infrastructure to provide organizational grouping for COTS software solutions• Sharepoint, Aspera, & SlipStream
▪ Use for Bioinformatics HPC applications (Galaxy and command line)▪ The Mali and Uganda ICER Data Centers have specific challenges
• Connectivity– Frequent ISP service interruptions– Low Bandwidth
• Basic infrastructure– Cooling– Power– Emergency power
28
Phase 1 VRODefault configuration includes
• SharePoint Site – Versioning (Backups) and AntiVirus– Document Library– Image Library– Scientific Participant Directory (email, IM, SIP URI, phone, etc)– Calendar– Wiki
• Mailing List (sympa)• File Replication Sync using Aspera• Audit logging
29
The VRO needs to provide scientists with the tools for self management/provisioning
▪ Let the scientists drive the use, the membership, the management• Reduce IT bureaucracy• Support the use of the collaboration infrastructure for third party
applications ▪ Aggregate identities
• Science is a highly mobile profession• Multiple institutions• Some questions about identity provider of last resort – Social & Google
▪ Privacy Concerns of mixing Federated identities with commercial identities
30
VRO Identity Lifecycle
31
What is a Collaboration Management Platform?
● A collection of applications and identity services to support a collaboration
● Based on federated identity and/or social identity
32
What is COmanage?
● An effort to design and build a Collaboration Management Platform○ Technology○ Reference Materials○ Best Practices
● Funded out of the NSF “Bedrock” SDCI grant for 3 years (ending soon)
33
COmanage Domain
● Federated and Social Identity● Identity Lifecycle Management● Onboarding and Offboarding Workflows● Attribute Management● Group Management● Provisioning / Application Integration
34
COmanage Registry
● An Open Source (Apache 2 licensed) Identity Registry for VOs and Collaborations
● One component of and Identity Management System designed specifically for collaboration across boundaries
● Internationally aware, I18n capable
35
36
COmanage: supporting NIAID Research Center
37
Problem: One Researcher Multiple Identities
One researcher may have multiple identities▪ Grad students become postdocs become faculty▪ Postdocs move from institution to institution▪ Faculty may have joint appointments
Different campuses assert different identifiers to services
Researcher needs to be “seen” as the same individual by the service
38
Solution: Single Project Identifier for Services
▪ Leverage COmanage to create VO specific identities▪ VO identity can be linked to multiple external identities▪ COmanage autogenerates project identifier during enrollment
▪ Configurable “recipes” for identifier details▪ COmanage provisions project identifier to LDAP person record
▪ Person record indexed by one or more campus identifiers
▪ SAML Attribute Authority (AA) uses LDAP as data store
▪ Services consume the unique VO or project identifier(s)
39
40
41
Solution: Single Project Identifier for Services
Campus 1
Service
COmanageRegistry
LDAP
SAML Attribute Authority
Campus 2
ePPN
ePPN
VO identifier
ePPN
Problem: IdP Won’t Release Name or Email
42
Research applications often require more than opaque identifier▪ Different than library or journal access▪ Researchers want record of with whom they are collaborating▪ Legacy application integrations often need name and email
106 InCommon IdPs support R&S entity category▪ 106 is great!▪ 106 of 395 is not so great!▪ World class research happening everywhere
Problem: IdP Won’t Release Name or Email
International collaboration and attribute release still less uncertain▪ In general EU release outside of EU not currently possible▪ Sweden and Switzerland support for REFEDs R&S▪ UK support is coming quickly▪ Still evolving in Australia, Japan, others…
eduGAIN “only” supports exchange of metadata▪ Critically important piece of full solution▪ But requires more policy and technology layers▪ Much work still to be done to support international collaboration
43
Solution: Collect Attributes During Enrollment
COmanage supports flexible enrollment or onboarding▪ Multiple enrollment flows per VO (or VO unit)▪ Point and click configuration▪ Collect name, email, most any other attribute VO needs
▪ Usually user asserted▪ Flows can include email verification
▪ Invitation, self-service, conscription flows all supported
Expose the attributes to applications via LDAP and SAML AA▪ Use ePPN as key to index the user
44
45
Solution: Collect Attributes During Enrollment
Not all SAML SPs can leverage SAML (secondary) attribute query▪ Shibboleth SP makes it easy▪ SimpleSAMLphp makes it easy▪ ADFS does not…
SharePoint is important target application▪ Best when federation managed by ADFS (as opposed to Shib)▪ Build custom “shim” between ADFS and Shibboleth SP to accomplish
the secondary attribute query▪ Working to be able to open source that code...
(Thanks to Chris Phillips from CANARIE for consultation on shim design)
46
Problem: Hard Service Requirement on Identifier
Research applications often have identifier constraints▪ Legacy applications especially difficult▪ No notion of mapping external identifiers to internal representation▪ “Domesticating” these applications difficult and time consuming
Command line and UNIX applications still greater challenge▪ Bioinformatic apps often need terminal session▪ Gateways and portals not always available or rejected by users▪ SlipStream Appliance from BioTeam is hybrid
▪ Galaxy web front end for some apps▪ Terminal session for others
47
Solution: Per-application Per-person Custom Identifiers
COmanage offers extensible, customizable identifiers▪ Same functionality supporting VO or project identifier▪ May be (usually is) auto-generated
▪ Sequential or random▪ Minimum, maximum ranges▪ Uniqueness or not▪ May use name information as input▪ Include prefixes, suffixes, …
ICER_kora_7368
48
Solution: SSH Key Management and Provisioning
COmanage supports SSH key management & related provisioning▪ Users upload one or more public SSH keys
▪ First authenticate using federated identity(ies)▪ Provision keys to LDAP▪ SSH server configured to read SSH keys from LDAP▪ Also includes “home directory” provisioner (experimental)
▪ management of uid, gid, homeDirectory
One of a suite of COmanage provisioner plugins▪ LDAP, Grouper, Changelog, GitHub, Homedir, ...
49
50
Problem: Remote sites, low bandwidth, unreliable circuits
Researchers at Mali and Uganda sites▪ Shibboleth IdPs local to each site▪ Some services local to each site▪ Need access even if local site becomes disconnected▪ Requires SPs/services be able to query for attributes
51
Solution: Replicate LDAP and Attribute Authority
(See diagram on next slide)
52
53
Problem: Some services integrate with one IdP only
Often seen with hosted services▪ Assume contract maps to one and only one security realm▪ Consume SAML but only provide for integration with one login server▪ Research projects cross organizational boundaries▪ Still want to leverage federated identity
54
55
Solution: IdP/SP Proxy or Bridge
Problem: IdP won’t release global identifier
56
Common persistent non-targeted identifier is eduPersonPrincipalName▪ Not all IdPs will release ePPN▪ Usually released by REFEDs Research & Scholarship IdPs
Some only provide per-SP targeted persistent identifier▪ Goal is to prevent correlation across SPs and protect privacy▪ Admirable goal when SPs are unrelated▪ Research projects have collections of SPs▪ Correlation is collaboration and is essential!
57
58
▪ VOs can greatly benefit from leveraging federated identity▪ Higher Ed federations especially attractive
▪ That’s where the users often are▪ Trust model with many, many years of relationship building
▪ Barriers to adoption remain▪ Wrong balance between privacy and fostering collaboration▪ Assumptions about the relationship between IdP and SP▪ All SPs are not vendor SPs nor campus SPs
Promise of Higher Ed identity federations to transform research and scholarship collaboration is enormous!
Virtual Organizations and Federated Identity
Acknowledgements
Michael Tartakovsky - CIO NIAID
Jeff Erickson - IAM Manager NIH
Heather Flanagan & Benn Oshrin
Ann West, Tom Scavo, John Krienke & InCommon
Ken Klingenstein & Internet2
Matthew Economou
59
Participants at ACAMP 2014 Internet2 Technical Exchange who put up with us from NIAID.