introduction
DESCRIPTION
The eMERGE Network: Challenges and Lessons Learned Lin Gyi 1 , Cathy A. McCarty 2 , Rex L. Chisholm 3 , Chris G. Chute 4 , Paul K. Crane 5 , Gail Jarvik 5 , Iftikhar J. Kullo 4 , Eric Larson 5 , Daniel R. Masys 6 , Dan M. Roden 6 , Rongling Li 1 - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction](https://reader037.vdocuments.mx/reader037/viewer/2022103102/56815b81550346895dc98076/html5/thumbnails/1.jpg)
Widespread adoption of the electronic medical record (EMR), though expensive and logistically challenging, can potentially establish new frontiers in personalized medicine. The electronic Medical Records and Genomics (eMERGE) Network (www.gwas.net) is a consortium of five participating sites (Group Health Seattle, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University) funded by the NHGRI to investigate synergies between EMR and genomic research. The goal of eMERGE is to conduct genome-wide association studies in approximately 19,000 individuals using EMR-derived phenotypes and DNA from linked biorepositories. While eMERGE is still underway, dissemination of important challenges and lessons learned from the network can benefit the scientific community. Challenges faced by eMERGE include development of EMR-based algorithms requiring expertise in clinical care, genetics/genomics, and biomedical informatics; implementation and validation of algorithms among different types of EMR data; informed consent or re-consent of patients for genomic research; and the return of incidental findings. The lessons learned include improving model consent language to better inform patients participating in genomic research, and designing EMR-based algorithms to be transportable to different institutions with varying data structures. However, a standardized EMR system would be more efficient for cost-effective research. The key strengths of eMERGE include its collaborative nature, potential for external Network initiatives, the ability to rapidly extract phenotypes from the EMR, transportability of algorithms, and cost-effectiveness for longitudinal clinical research. The eMERGE Network is uniquely poised to develop novel strategies for leveraging the EMRs in genomic research and thereby facilitate personalized medicine.
The eMERGE Network: Challenges and Lessons LearnedLin Gyi1, Cathy A. McCarty2, Rex L. Chisholm3, Chris G. Chute4, Paul K. Crane5, Gail Jarvik5, Iftikhar J. Kullo4, Eric Larson5, Daniel R. Masys6, Dan M.
Roden6, Rongling Li11National Human Genome Research Institute, Bethesda, MD, 2Marshfield Clinic Research Foundation, Marshfield, WI, 3Northwestern University, Chicago, IL, 4Mayo Clinic, Rochester, MN,
5Group Health Cooperative/University of Washington, Seattle, WA, 6Vanderbilt University, Nashville, TN
IntroductionThe electronic Medical Records and Genomics (eMERGE) Network is a five-member site national consortium formed to develop, disseminate, and apply approaches to research that combines DNA biorepositories with electronic medical record (EMR) systems for large-scale, high-throughput genetic research to identify genetic risk factors for clinical disease.
Specific Aims• Develop and validate electronic phenotyping algorithms for phenotype
classification in genomic research• Identify genetic variants related to complex traits through genome-wide
association (GWA) analyses• Develop, implement and evaluate the process of consent and community
consultation for genomic research• Develop best practices to protect patients to maximize data sharing, and to
benefit societyGenotyping
• Genotyping facilities: Broad Institute and Center for Inherited Disease Research (CIDR)
• Platforms: Illumina 1M for individuals of African American ancestry and lllumina 660W Quad for individuals of European ancestry and other race/ethnicity
• Quality control (QC): genotyping QC and centralized data cleaning QC
Phenotyping • Identification: electronic algorithms for structured data extraction (i.e. ICD-9
code), free-text data mining, and/or natural language processing (NLP)• Validation: manual chart review• Phenotypes: 6 primary site-specific phenotypes, 2 network phenotypes, 5
secondary network phenotypes, and approximately 776 network phenotypes identified by ICD-9 codes only
AcknowledgementsThe eMERGE Network was initiated and funded by NHGRI, in conjunction with additional funding from NIGMS through the following grants: U01-HG-004610 (Group Health Cooperative/University of Washington, PI: Eric Larson); U01-HG-004608 (Marshfield Clinic, PI: Cathy McCarty); U01-HG-04599 (Mayo Clinic, PI: Chris Chute); U01-HG-004609 (Northwestern University, PI: Rex Chisholm); U01-HG-04603 (Vanderbilt University, also serving as the Administrative Coordinating Center, PI: Dan Roden). Members of the Expert Scientific Panel include: Gerardo Heiss, PhD (University of North Carolina), Stan Huff, PhD (University of Utah) Howard McLeod, PhD (University of North Carolina, Chair), Jeff Murray, PhD (University of Iowa), and Lisa Parker, PhD (University of Pittsburgh).
Abstract
Network Structure
Sex Chromosome Anomalies
eMERGE Network
Phenotypes and Sample Sizes
Network Challenges• Phenotyping: development of EMR-based algorithms requiring expertise in clinical care, genetics/genomics, and biomedical
informatics; implementation and validation of algorithms among different types of EMR data• Genotyping: addressing quality control of pre-genotyping, genotyping and post-genotyping, data cleaning for site-specific data
and combined network data• Protecting human subjects: ensuring adequate human subject protections and addressing patients’ concerns regarding such
research• Return of incident findings: requirements for CLIA certification, re-consent, IRB approvals,
Lessons Learned:• Improved model consent language to better inform patients participating in genomic research based on EMRs; eMERGE’s model
consent language has been posted on NHGRI’s Informed Consent website at http://www.genome.gov/27526660• Developed methods to extract potentially identifiable clinical characteristics and modify them (by grouping or suppression) to
minimize threats to the confidentiality of a patient’s genomic information, while maximizing the EMR information preserved• Designed and improved EMR-based algorithms to be transportable to different institutions with varying data structures• Realized the key strengths of the eMERGE Network including its collaborative nature, potential for external Network initiatives,
the ability to rapidly extract phenotypes from the EMR, and cost-effectiveness for longitudinal research
Sex Chromosome
Anomaly
Site A
Site B
Site C
Site D
Site E* Total
XX/XO mosaic 3 2 0 4 - 9XO
(Turner’s Syndrome)
0 1 0 0 - 1
XXY/XY mosaic 1 0 0 0 - 1
XXY(Klinefelter’s Syndrome)
1 1 5 1 - 8
XX, large LOH blocks on X 1 9 0 1 - 11
XXX (normal phenotype) 0 1 0 0 - 1
XYY (not reportable) 0 0 0 1 - 1
LOH (Loss of heterozygosity) 2 0 0 12 - 14
*Site E’s data are not yet available.**Sex chromosome anomaly events recorded out of ~ 13,800 genotyped samples
Total: 46**
Institution PrimaryPhenotype
Network Phenotype*
Repository Size
GWA Study Size EMR Description Phenotyping Methods
Group Health, University of Washington
(GHC Biobank)
Alzheimer's Disease and Dementia
WBC (White Blood Cell
Count)~4000;
>96% EA3,370; 97% EA
Vendor-based EMR since 2004;20+ yrs pharmacy15+ yrs ICD9
Structured data extraction, Mining free-text via regular expressions, Manual chart review
Marshfield Clinic(Personalized
Medicine Research Project)
Cataracts and HDL-Cholesterol
Diabetic Retinopathy
~20,000;98% EA
3,968; 99% EA
Internally developed EMR since 1985;75% ppts have 20+ yrs medical history
Structured data extraction, NLP,Intelligent Character Recognition
Mayo Clinic Peripheral Arterial Disease
Red Blood Cell(RBC) indices
15,000;>96% EA
3,412; 99% EA
Internally developed EMR since 1995;40 yrs data extraction
Structured data extraction, NLP
Northwestern University,
(NUgene Project)Type 2 Diabetes Lipids & Height
9,200;12% AA
8% Hispanic
3,564; 52% AA
Vendor based Input and Output EMR since 2000;20+ yrs ICD9
Structured data extraction, text searches
Vanderbilt University
(BioVU)QRS Duration
PheWAS**(Phenome-Wide
Association Study)
100,000;11% AA
3,061; 16% AA
Internally developed EMR since 2000;35+ yrs medical history
Structured data extraction, NLP
eMERGE NetworkHypothyroidism and Resistant Hypertension ~20,000
The cross network phenotypes were chosen based on the importance of the scientific question, whether GWAS had been performed for the trait before, and the effort required to develop accurate electronic phenotyping algorithms
ICD9 = Ninth International Classification of Diseases; NLP = Natural Language Processing; EA = European Americans; AA = African AmericansStructured data extraction = retrieving data that have been stored in a predefined format*Network phenotyping is not limited to the repository size of the parent institution**Phenome-wide association study: using prevalent ICD9 codes to identify a significant amount of clinical phenotypes that may associated with select risk genetic markers
The genotyping facilities collected sex chromosome anomaly data from the eMERGE Network sites; the Return of Results Oversight committee is working to determine which of these genotypes are reportable and how they should be discussed.
Network Structure• The Steering Committee is the governing body
for the consortium and is composed of the Principal Investigators from each institution and the NIH Project Scientist
• An Expert Scientific Panel provides input to the NHGRI director about the progress and direction of the Network
• Vanderbilt University Medical Center is the site of the Administrative Coordinating Center (ACC) which provides support to the network including coordination, organization of committees, and support for the Expert Scientific Panel
• The Informatics group works to determine the validity, reliability, and comprehensiveness of EMR data for Genome-Wide Association Studies
• The Community Consultation and Consent (CC&C) group leverages community engagement activities at each site; each site has some form of a Community Advisory Board to provide guidance to study leadership
• The Genomics group facilitates the GWAs timeline and sample quality control for the network
• The Return of Results Oversight Committee is one of the focus areas of the CC&C working group. The committee focuses on the return of medically actionable results and works primarily with local investigators and IRBs