uniting i2b2 and cagrid

25
Uniting i2b2.org and caGrid National scale data sharing networks for Biomedical Informatics research Rob Wynden – UCSF A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U, and Partner’s Health

Upload: yachi

Post on 19-Jan-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Uniting i2b2.org and caGrid. National scale data sharing networks for Biomedical Informatics research Rob Wynden – UCSF A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U, and Partner’s Health. Challenges. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Uniting i2b2 and caGrid

Uniting i2b2.org and caGrid

National scale data sharing networks for Biomedical Informatics research

Rob Wynden – UCSF

A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U,

and Partner’s Health

Page 2: Uniting i2b2 and caGrid

Challenges• Several challenges impede the task of

launching an IDR (integrated data repository) and sharing that information for research purposes

– Data Governance and Standardization– Meeting the needs of researchers– Semantic Interoperability

Page 3: Uniting i2b2 and caGrid

Data Governance• It is very difficult to get approval to import data

into an IDR installation• If we were also to require that data be encoded

at the source in a particular standard format then approval would be even more difficult

• Data translation during ETL (extract transform and load) is also hard because not all data needs to be so encoded and data must often be translated into multiple standard formats

Page 4: Uniting i2b2 and caGrid

Meeting the needs of Researchers

• Researchers need data to be encoded in the format which is appropriate for their research specialty. No single data encoding is appropriate for all purposes

• Researchers will also require access to the source information in un-modified form for verification purposes

Page 5: Uniting i2b2 and caGrid

Semantic Interoperability

• In order for researchers within the same domain of study to share information and work together that information must be encoded in a consistent format

• Each research institution has information encoded in a unique fashion which is dependent on a particular mix of the source software environments used in clinical, clinical research and bench science.

Page 6: Uniting i2b2 and caGrid

Ontology Mapper

• The Ontology Mapper Maps local data (which is usually not formally encoded) into formally encoded based on ISO/IEC 111-79 data models which have been checked into the caDSR (Data Standards Repository). (It is an Instance Mapper.)

• XML based instance map definitions can be shared between institutions both under Creative Commons License or under a Commercial License after purchase.

Page 7: Uniting i2b2 and caGrid

Benefits of i2b2• An open source translational informatics

warehouse platform (an IDR)• An active open source based user community• Industry support (Sybase, HP, Sun …)• A relatively easy platform into which to import

source data regardless of it’s encoding• Availability of a general purpose instance

mapper for the translation of source data into standard encodings

Page 8: Uniting i2b2 and caGrid

Problems with i2b2 related to data sharing

• I2b2 lacks a mature data sharing capability which includes both general purpose semantic interoperability and security

• I2b2 cannot interoperate with other IDR’s which may not be on the same platform

Page 9: Uniting i2b2 and caGrid

Benefits of caGRID• Developed as part of the caBIG translational informatics

effort caGRID is a mature data sharing network• caGRID offers secure user authentication• caGRID offers data sharing over a semantically

interoperable network• caGRID is platform agnostic and can be used to

interconnect IDR environments regardless of the underlying technology (the design of caGRID is NOT specific to caBIG related systems)

• caGRID will eventually interoperate with Science Commons for accessing legal data access agreements

Page 10: Uniting i2b2 and caGrid

Problems with caGRID

• It is currently difficult to use caGRID on IDR projects. The caBIG project does not currently offer a general purpose IDR software environment

• It is currently difficult to translate data into a format suitable for publication over caGRID

• All caGRID based systems require that shared data be encoded within standard format(s) which usually does not match the format of our data sources.

Page 11: Uniting i2b2 and caGrid

The best of both worlds

• By combining the advantages of i2b2.org and caGRID we will provide a comprehensive solution to national scale data sharing

• I2b2.org provides a relatively easy way of importing source data and translating that information into a standard format(s)

• caGRID supplies a secure and semantically interoperable national scale network.

Page 12: Uniting i2b2 and caGrid

CTSA Collaborative Development

• The effort to combine i2b2.org with caGRID is a collaborative effort involving several CTSA sites

• I2b2.org was first launched into open source by Partner’s Health and includes many CTSA award sites including, Harvard Med, UCSF, UCD, U Washington, Cincinnati Children’s, UT Houston, Rochester, UPenn etc, etc…

Page 13: Uniting i2b2 and caGrid

Ontology Mapper Cell• The Ontology Mapper Cell within i2b2 is a general purpose

instance mapper which can translate messy local data into one or more standard formats. In other words, the Ontology Mapper maps local data into Ontologies

• Maps will be created and annotated in a Protégé Prompt plug-in and can be shared over HL7 CTS II both as open source or as commercially sold assets

• Maps contain routing, provenance information and a scriptlet payload of SQL, Perl, SparQL, Horn or R

• The Ontology Mapper Cell within i2b2 is a collaborative effort involving UCSF, UCD, Rochester, UPenn, and U Washington

• This has been a highly active collaborative effort which is now in an Alpha release cycle

Page 14: Uniting i2b2 and caGrid

caGRID Cell

• The caGRID Cell is a development project which is a collaboration of OSU (Ohio State) and UCSF

• This component allows any i2b2 data mart, which has been translated into standard format by the Ontology Mapper, to share data over caGRID

• This system will allow i2b2 to share data (a federated query) across any caGRID based data source (not just between other i2b2 instances)

Page 15: Uniting i2b2 and caGrid

Query Interfaces

• caGRID based query: Work is under way to create a caGRID based query interface for the HSDB project (Wash U)

• I2b2 based query: This environment will be implemented as a plug-in for the i2b2 SHRINE environment

Page 16: Uniting i2b2 and caGrid
Page 17: Uniting i2b2 and caGrid

Five pilot projects under way• There are currently FIVE data sharing projects which have all based their architectures on this

work

• HSDB (Human Studies Database – Ida Sim) – The project for which this i2b2-caGRID architecture was first developed shares clinical research metadata – UCSF, Mayo Clinic, Wash U, UTSW, UCD

• QSN (The Quality Safety Network – Andy Auerbach) – A national network of payer, and IDR derived quality data - UCSF, Tufts, Northwestern, Kaiser, Michigan and 17 Payers

• STIRS (Cardiovascular Imaging Research Grid - Max Wintermark) : UCSF, GeorgeTown, UCLA, Sutter Health Corp

• CHORI (Collab for Oral Health-Related Informatics - Joel White) : UCSF, Harvard, UT Houston• DBRD (Distributed Biobank for Rare Diseases - Jennifer Puck) : UCSF, UT Southwestern,

Emory, Duke

Total number of unique sites: 37Number of sites already involved with the CTSA: 20 (almost all of these sites are heavily involved with at least one of these grid projects)

Page 18: Uniting i2b2 and caGrid

So how does it work?• STEP 1

– First data is ETL’ed (extract transform load) into the i2b2 schema

– The i2b2 schema is based on Concept Table design which is a derivative of fact table design.

– In concept table design each ‘name’ in the fact table is a hierarchical string of concepts

– This architecture can be used to import (ETL) source data in any encoding without the requirement for data standardization as a data governance task

Page 19: Uniting i2b2 and caGrid

Concept Table Design

Page 20: Uniting i2b2 and caGrid

So how does it work?• STEP 2

– As data is imported it is then translated into one or more standard formats with the Ontology Mapper Cell.

– The Ontology Mapper uses HL7 CTSII shareable data translation rules to translate local data into standard format(s). (it’s a general purpose instance mapper).

– One-to-one maps, aggregates and archetype generation are all supported.

– The Ontology Mapper then publishes data into a data mart. Ontology Mapper data marts are database Views which can be ‘materialized’ into physical data marts if required.

Page 21: Uniting i2b2 and caGrid
Page 22: Uniting i2b2 and caGrid

So how does it work?• STEP 3

– The Ontology Mapper translates data into an IEC11179 compliant data model

– The Ontology Mapper Cell then publishes that data as a data mart (a View within the underlying database) with permission within i2b2 aligned with the study protocol

– Each data model is checked into the caDSR (data standards repository) to serve as a common standard reference

– The caGRID Cell then provides a grid data service which automatically provides the necessary EAV to object relational transform in order for i2b2 based data to be interoperable over the caGRID (created based on the Introduce tool)

– Data can then be queried via standard caGRID tools or via custom caGRID query environments if required (permissions are handled via Grid Grouper)

– Queries can be both intra and inter institutional

Page 23: Uniting i2b2 and caGrid
Page 24: Uniting i2b2 and caGrid

Combining i2b2 and caGRID

• By combining these techniques we can achieve the goal of a national scale semantically interoperable data sharing network within the CTSA

• This is a national collaborative effort involving many CTSA and caBIG based sites around the country

• By all working together as a team we are better equipped to achieve our goals of launching IDR’s and sharing research information.

Page 25: Uniting i2b2 and caGrid

Thank you• Questions please

• A collaborative effort of UCSF, OSU, UCD, Rochester U, UPenn, U Washington, Wash U, Partner’s Health and many others. If you are interested in becoming a contributing member to this effort please contact [email protected]