division information type title › ctr-gsk-7381 › 205878 › 900191f4-95...effective date :...
TRANSCRIPT
CONFIDENTIALThe GlaxoSmithKline group of companies 205878
1
Division : Worldwide Development
Information Type : Reporting and Analysis Plan (RAP)
Title : PGX7647 (205878): Exploratory Pharmacogenetic GWAS meta-analysis of efficacy response to GSK1278863 by using subjects with CKD from phase 2 studies; 116581, 116582, 112844, 113633, 113747, and 116099
Compound Number : GSK1278863
Effective Date : 16-AUG-2016
Description :
This document defines the analysis plan for eTrack study ID 205878. It describes thegenetic analysis using data pooled from six clinical phase II studies and genome-wide germline genetic data to test for association between genetic variants and differential response to GSK1278863.
The study objective: Evaluate the effect of genome-wide genetic variation on differential response to GSK1278863
Keywords: Pharmacogenetics (PGx), analysis, genome-wide association study (GWAS),single nucleotide polymorphisms (SNPs), anemia, chronic kidney disease (CKD), GSK1278863, daprodustat, prolyl hydroxylases inhibitor (PHI), hemoglobin, reticulocyte, dose
Author’s Name and Functional Area:
12-AUG-2016PAREXEL Genomic Medicine
12-AUG-2016PAREXEL Genomic Medicine
Approved by:
16-AUG-2016GSK Statistical Genetics, Target Sciences
Copyright 2016 the GlaxoSmithKline group of companies. All rights reserved. Unauthorised copying or use of this information is prohibited
PPD
PPD
PPD
CONFIDENTIALThe GlaxoSmithKline group of companies 205878
2
Distribution List:
GSK Genetics, Target Sciences
GSK Genetics, Target Sciences
PAREXEL Genomic Medicine
PAREXEL Genomic Medicine
GSK Metabolic, Pulmonary Pathways and Cardiovascular
GSK QSci Clinical Statistics
GSK QSci Clinical Statistics
PPD
CONFIDENTIAL205878
3
TABLE OF CONTENTS
PAGE
1. GENETIC REPORTING & ANALYSIS PLAN SUMMARY ........................................4
2. SUMMARY OF KEY INFORMATION .......................................................................52.1. Introduction and Rationale ............................................................................52.2. Study Objectives and Endpoints ...................................................................52.3. Study Design ................................................................................................62.4. Statistical Hypotheses...................................................................................6
3. SAMPLE SIZE CONSIDERATIONS AND POWER ESTIMATES FOR THE GENETIC ANALYSIS...............................................................................................6
4. GENETIC ANALYSIS POPULATIONS.....................................................................7
5. CONSIDERATIONS FOR DATA ANALYSES...........................................................8
6. DATA HANDLING CONVENTIONS .........................................................................9
7. GENETIC ANALYSES .............................................................................................97.1. Primary Analyses..........................................................................................9
7.1.1. Map of Analyses ..........................................................................107.2. General Genetic Analysis Conventions.......................................................10
8. REFERENCES.......................................................................................................12
9. APPENDICES ........................................................................................................139.1. APPENDIX 1: Data Display Standards & Handling Conventions.................149.2. APPENDIX 2: Derived and Transformed Data ............................................159.3. APPENDIX 3: Premature Withdrawals & Handling of Missing Data ............16
9.3.1. Premature Withdrawals................................................................169.3.2. Handling of Missing Data in Statistical Analysis ...........................169.3.3. Handling of Missing Genetic Data................................................16
9.4. APPENDIX 4: Genotype/Subject Quality Control ........................................179.4.1. Subject Quality Control ................................................................179.4.2. Genotype Quality Control.............................................................17
9.5. APPENDIX 5: Multiple Comparisons & Multiplicity ......................................189.5.1. GWAS analysis............................................................................18
9.6. APPENDIX 6: Hardy-Weinberg (HW) Analysis ...........................................199.7. APPENDIX 7: Linkage Disequilibrium Analysis ...........................................209.8. APPENDIX 8: Characterizing Ancestry Using Principal Components
Analysis ......................................................................................................219.9. APPENDIX 9: Genotype Imputation............................................................229.10. APPENDIX 10: Genomic Control ................................................................239.11. APPENDIX 11: Independent Signal Identification .......................................249.12. APPENDIX 12: Reporting and Interpretation...............................................259.13. APPENDIX 13: Abbreviations & Trade Marks .............................................26
9.13.1. Abbreviations...............................................................................269.13.2. Trademarks .................................................................................27
9.14. APPENDIX 14: A list of genes involved in daprodustat metabolism, disposition or mode of action ......................................................................28
CONFIDENTIAL205878
4
1. GENETIC REPORTING & ANALYSIS PLAN SUMMARY
RAP Area Description
Purpose The purpose of this reporting and analysis plan (RAP) is to describe the planned analyses and output for the analysis of genetic associations with differentialresponse to GSK1278863, using data from six phase II studies (PHI113633, PHI113747, PHI116581, PHI116582, PHI112844, and PHI116099) and a genome-wide association study (GWAS) approach.
Protocols This is a non-interventional genetic study with no corresponding protocol. Samples used in this study were collected under studies: PHI113633, PHI113747, PHI116581, PHI116582, PHI112844, and PHI116099
Objective / Endpoint
Objective: Evaluate the effect of genome-wide genetic variation on differential response to GSK1278863
Endpoints:
Change in Hemoglobin (g/dL) at 4 weeks
Change in Reticulocytes (1012/L) at 4 weeks
Final dose (mg) at week 24
Study Design The data used in this analysis will be derived from six phase II clinical studies: PHI113633, PHI113747, PHI116581, PHI116582, PHI112844, and PHI116099.
Analysis Populations
PGx analysis population: Subjects who have given PGx consent and a PGx sample, have been successfully genotyped and passed the GWAS data quality control (QC), and have valid clinical endpoint data available.
Patients included in analyses of change from baseline endpoints will have received a fixed treatment dose of GSK1278863 for the 4 weeks being assessed.
Patients included in the analyses of final dose will have received 24 weeks of treatment.
Hypothesis Null hypothesis: None of the genome-wide genetic variants analysed are associated with differential response to GSK1278863
Analyses Clinical response endpoints and genome-wide (genotyped and imputed) variants
Linear regression of hemoglobin change from baseline at week 4
Linear regression of reticulocyte change from baseline at week 4
Linear regression of final dose at week 24
Analyses will be conducted using all subjects, adjusting for principle components in the model to account for ancestry, and will be conducted by study. Studies will be analyzed individually as they differ by several key factors including genotyping platform used, prior rhEPO treatment, and hemodialysis dependency. Meta-analyses will be conducted using effect estimates obtained from each study analyzed.
CONFIDENTIAL205878
5
2. SUMMARY OF KEY INFORMATION
2.1. Introduction and Rationale
GSK1278863 (daprodustat) is a novel small molecule agent that has demonstrated the ability to stimulate erythropoiesis in preclinical models. GSK1278863 inhibits prolyl hydroxylases [egg-laying deficiency protein nine-like protein (EGLNs)] associated with hypoxia inducible factor (HIF). Inhibition of EGLNs prevents breakdown of HIFα. This activity results in the accumulation of HIFα transcription factors, which leads to increased transcription of HIF responsive genes. This biological activity simulates components of the natural response to exposure to hypoxia. Erythropoiesis is thus induced by increased production of natural erythropoietin (EPO) and enhanced iron metabolism.
Currently, oral daprodustat is being studied for treatment of anemia associated with CKD. Topical GSK1278863 is being studied for treatment of diabetic foot ulcers(GlaxoSmithKline Document Number RM2008/00267/07).
A previous focused pharmacogenetic study (201099, also known as PGx7532) assessed relationships between genetic variants in CYP2C8 and ABCG2 and pharmacokinetic (PK) variability of GSK1278863 using dose normalized AUC and explored the effects of genetic variants in the genes involved in drug absorption, distribution, metabolism and excretion (ADME), and related to mode of action (MOA), on pharmacodynamic (PD) variability of GSK1278863 using change in hemoglobin (Hgb). These analyses used subjects from 4 Phase 1 and 4 Phase 2 studies (PHI116581, PHI116582, PHI112844, PHI115385, PHI113634, PHI114703, PHI116008, and PVD114272). No genetic markers met the pre-specified criteria for association from both PK and PD PGx analyses.
This pharmacogenetic (PGx) study has been undertaken to evaluate genetic effects of genome-wide variation on differential response in subjects with anemia from CKD receiving daprodustat from 6 PII studies Note that 3 studies overlap with those used in the previous focussed PGx analyses. One Phase 2 study used in the previous PGx analyses (PVD114272) was not included as it was conducted in subjects with peripheral artery disease.
2.2. Study Objectives and Endpoints
Objectives Endpoints
Evaluate genetic effects on change in hemoglobin
Change in Hemoglobin (g/dL) at 4 weeks
Evaluate genetic effects on change in reticulocytes
Change in Reticulocytes (1012/L) at 4 weeks
Evaluate genetic effects on dose selection Final dose (mg) at week 24
CONFIDENTIAL205878
6
2.3. Study Design
The data used in this analysis will be derived from PII studies with oral GSK1278863administration at the doses as the study designed. A brief description of each study is provided as:
Study ITTPGx
subjects* Description
PHI116581 74 61A 4-week phase II study to evaluate the safety, efficacy and PK of GSK1278863 in subjects with anemia associated with CKD who were not taking rhEPO and NDD
PHI116582 86 68A 4-week phase II study to evaluate the safety, efficacy and PK of switching subjects from a stable dose of rhEPO to GSK1278863 in HDD subjects with anemia associated with CKD
PHI112844 107 71A phase IIA study of Safety, PK, and Efficacy of 28-day repeat oral doses of GSK1278863 in anemic pre-dialysis and HDD patients
PHI113633 217 197
A phase IIB study to evaluate the dose response relationship of GSK1278863 over the first 4 weeks of treatment and evaluate the safety and efficacy of GSK1278863 over 24 weeks in HDD subjects with anemia associated with CKD who switched from rhEPO
PHI113747# 252 198A 24-week phase IIB study to evaluate the safety and efficacy of GSK1278863 in NDD subjects with anemia associated with CKD
PHI116099 99 76A 4-week phase II study to evaluate the safety, efficacy, and PK of GSK1278863 in Japanese HDD subjects with anemia associated with CKD
*PGx subjects with genotype data available.#Starting GSK1278863 dose was based on data from previous studies, as well as baseline Hgb concentration
2.4. Statistical Hypotheses
This PGx study is designed to evaluate genetic effects on GSK1278863’s differential response via an exploratory analysis using a hypothesis-free GWAS approach.
Null hypothesis: None of the genome-wide genetic variants analyzed are associated with differential response to GSK1278863
3. SAMPLE SIZE CONSIDERATIONS AND POWER ESTIMATES FOR THE GENETIC ANALYSIS
Note that these clinical studies were not designed with power considerations for genetic analyses. All analyses described herein are exploratory. Power calculations are not needed prior to this exploratory GWAS analysis where potential effect size estimates are unknown.
CONFIDENTIAL205878
7
4. GENETIC ANALYSIS POPULATIONS
PGx analyses Definition / Criteria for PGx analysis population Clinical studies included
Change in hemoglobinorreticulocytes
Subjects have met all the criteria:
Provided PGx consent and a PGx sample
Successfully genotyped and passed QC
Treated with randomized, fixed dose of GSK1278863 for four weeks and hadlab measurements of Hgb and/or reticulocytes available to determine change from baseline
PHI116581
PHI116582
PHI112844
PHI113633
PHI116099
Final dose Subjects have met all the criteria:
Provided PGx consent and a PGx sample
Successfully genotyped and passed QC
Treated with GSK1278863 for 24 weeks and have final dose information available
PHI113633
PHI113747
CONFIDENTIAL205878
8
5. CONSIDERATIONS FOR DATA ANALYSES
It is anticipated that after an initial review of the results of the analyses described here, there may be a need for additional follow-up analyses. These will be discussed, defined, and agreed to by the authors of this RAP and other relevant parties at that time. This RAP will not be updated to include any additional follow-up analyses, but these will be described in the PGx study report.
Genetic Variants Genome-wide genetic variants included in this analysis are SNPs genotyped directly via Affymetrix Axiom® Biobank Genotyping Arrays with GSK’s modification v1(PHI116581, PHI116582, PHI112844) or v2 (PHI113633, PHI113747, PHI116099), and imputed by using haplotype reference panels from the 1000 Genomes (1000G) Project.
Variants with an imputation quality R2>0.3 and minor allele frequency >0.01 will be analyzed.
Type I Error Due to exploratory nature of this analysis, no correction will be made for multiple endpoints.
An analysis-wide type 1 error of 0.05 will be maintained accounting for the number of genetic variants analyzed. The threshold for declaring significance for the meta-analysis results will be 5x10-8 (conventional genome-wide analysis threshold, See APPENDIX 5).
Examination of Subgroups No subgroup analyses have been planned at this time.
CONFIDENTIAL205878
9
6. DATA HANDLING CONVENTIONS
Table 1 provides an overview of appendices within this RAP for outlining data handling conventions.
Table 1 Overview of Appendices
Section Component
9.1 APPENDIX 1: Data Display Standards & Handling Conventions
9.2 APPENDIX 2: Derived and Transformed Data
9.3 APPENDIX 3: Premature Withdrawals & Handling of Missing Data
9.4 APPENDIX 4: Genotype/Subject Quality Control
7. GENETIC ANALYSES
7.1. Primary Analyses
+++ Primary Statistical Analysis +++
Endpoint / Covariates / Model Specification
Endpoints:
Change in Hemoglobin (g/dL) at 4 weeks
Change in Reticulocytes (1012/L) at 4 weeks
Final dose (mg) at week 24
Potential Covariates: The following variables, which are known or potential covariates of one or more of the endpoint variables, may be evaluated in the statistical analyses as independent variables and/or individually assessed for association with genetic marker genotypes
Treatment Arm
Age (years)
BMI (kg/m2) at Baseline
Weight (kg) at Baseline
Gender
Prior rhEPO dose Stratification Factor (Low, High)
Disease status (non-dialysis, hemodialysis)
Baseline measurements for the specified endpoints
Self-reported race and ethnicity
Ancestry principal components (will be defined from available genetic data)
Model:
The association analysis will use a normal linear model for continuous endpoints, after appropriate transformation(s), if necessary. Genome-wide common variants will be tested assuming an additive genetic model. Genotype dosage will be used for imputed genetic variants. The reported results will include the estimated genetic effect with its corresponding standard error, and P-value. The analyses will include ancestry principal components in the model to adjust for potential confounding due to population structure [Price, 2006], and may include other potential covariates as noted above. Genomic control methods will be used to assess and correct for test statistic inflation within each analysis due to residual uncorrected population structure and relatedness [Devlin, 1999]. Regional association plots (APPENDIX 11) may be generated to visualize genetic associations in regions of interest, including regions where significant associations are observed and in genes involved in daprodustat metabolism, disposition, or mode of action (APPENDIX 14).
CONFIDENTIAL205878
10
Analysis Population
PGx analysis populations (detailed in Section 4)
Genetic Variants
Genome-wide variants (described in Section 5)
Effects to be Modeled (Main or Interaction Effect; Dominant/Additive/Recessive Genetic Model)
Main effect of genotype on endpoint with an additive genetic model. Additional genetic models may be explored if warranted. Only treated subjects will be analyzed; no interactions effects will be analyzed.
Meta-Analysis
A fixed effect inverse variance weighted meta-analysis of the effect size estimate from each underlying study analyzed will be conducted for each endpoint. We anticipate there will likely be heterogeneity between studies and this may be further explored, especially in any significant results.
Statement Regarding What Constitutes a Significant Result
This is an exploratory PGx analysis. No correction for multiple endpoints will be applied and p-values will be reported for each meta-analysis. Thresholds for declaring statistical significance are detailed in Section 5. Any findings of interest from this analysis will require further evaluation and confirmation in an independent dataset. When more than one genetic variant is associated with a given endpoint, the number of distinct association signals will be determined using established approaches, including physical map positions of the variants, conditional analyses, and linkage disequilibrium (Yang. 2011).
7.1.1. Map of Analyses
Analysis #
Meta-Analysis
#Endpoint Study Genotypes* N with Genotypes
11
Final dose (mg) at week 24
PHI113633 BB2 197
2 PHI113747 BB2 198
3
2Change in Hemoglobin
(g/dL) at 4 weeks
PHI116581 BB1 61
4 PHI116582 BB1 71
5 PHI112844 BB1 68
6 PHI113633 BB2 197
7 PHI116099 BB2 76
8
3Change in
Reticulocytes (1012/L) at 4 weeks
PHI116581 BB1 61
9 PHI116582 BB1 71
10 PHI112844 BB1 68
11 PHI113633 BB2 197
12 PHI116099 BB2 76
*Genotyped on the Affymetrix Axiom® Biobank Genotyping Arrays with GSK’s modification v1 (BB1) or v2 (BB2)
7.2. General Genetic Analysis Conventions
Table 2 provides an overview of appendices within the RAP for outlining general geneticanalysis conventions.
CONFIDENTIAL205878
11
Table 2 Overview of Appendices
Section Component
Section 9.5 APPENDIX 5: Multiple Comparisons & Multiplicity
Section 9.6 APPENDIX 6: Hardy-Weinberg (HW) Analysis
Section 9.7 APPENDIX 7: Linkage Disequilibrium Analysis
Section 9.8 APPENDIX 8: Characterizing Ancestry Using Principal Components Analysis
Section 9.9 APPENDIX 9: Genotype Imputation
Section 9.10 APPENDIX 10: Genomic Control
Section 9.11 APPENDIX 11: Independent Signal Identification
Section 9.12 APPENDIX 12: Reporting and Interpretation
CONFIDENTIAL205878
12
8. REFERENCES
Devlin B, Roeder K. (1999) Genomic Control for Association Studies. Biometrics 55(4): 997-1004
Dudbridge F, Gusnanto A. (2008) Estimation of Significant Thresholds for Genomewide Association Scans. Genetic Epidemiology 32:227-34.
GlaxoSmithKline Document Number RM2008/00267/07 Study ID GSK1278863. Investigator's Brochure. Report Date 14-OCT-2015.
Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics. 44:955-9.
Kutalik Z, Johnson T, Bochud M, Mooser V, Vollenweider P, Waeber G, Waterworth D, Beckmann JS, Bergmann S. (2011) Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12:1-17.
Li Y, Willer C, Sanna S, Abecasis G (2009) Genotype Imputation. Annu Rev Genomics Hum Genet. 2009; 10: 387–406.
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26(22):2867-73.
McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics 9(5):356-69
Novembre J, Johnson T, Bryc K, Kuralik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, Stephens M, Bustamante CD. (2008) Nature 456:98-101.
Patterson V, Price AL, Reich D. (2006) Population structure and eigenanalysis. PLoSGenet. 2(12): e190
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38(8):904-909.
Yang J, Ferreira T, Morris AP, Medland SE; Genetic Investigation of Anthropometric Traits (GIANT) Consortium; DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium, Madden PA, Heath AC, Martin NG, Montgomery GW, Weedon MN, Loos RJ, Frayling TM, McCarthy MI, Hirschhorn JN, Goddard ME, Visscher PM. (2011) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature Genetics 44(4):369-375
CONFIDENTIAL205878
13
Zheng X, Shen J, Cox C, Wakefield J, Ehm M, Nelson M, Weir BS. (2014) HIBAG –HLA genotype imputation with attribute bagging. Pharmacogenomics Journal 14:192-200.
9. APPENDICES
Appendix Number Appendix Description
RAP Section 6: Data Handling Conventions
Section 9.1 APPENDIX 1: Data Display Standards & Handling Conventions
Section 9.2 APPENDIX 2: Derived and Transformed Data
Section 9.3 APPENDIX 3: Premature Withdrawals & Handling of Missing Data
Section 9.4 APPENDIX 4: Genotype/Subject Quality Control
RAP Section 7.1.1: General Genetic Analysis Conventions
Section 9.5 APPENDIX 5: Multiple Comparisons & Multiplicity
Section 9.6 APPENDIX 6: Hardy-Weinberg (HW) Analysis
Section 9.7 APPENDIX 7: Linkage Disequilibrium Analysis
Section 9.8 APPENDIX 8: Characterizing Ancestry Using Principal Components Analysis
Section 9.9 APPENDIX 9: Genotype Imputation
Section 9.10 APPENDIX 10: Genomic Control
Section 9.11 APPENDIX 11: Independent Signal Identification
Section 9.12 APPENDIX 12: Reporting and Interpretation
Other RAP Appendices
Section 9.13 APPENDIX 13: Abbreviations & Trade Marks
Section 9.4 APPENDIX 14: A list of genes involved in daprodustat metabolism, disposition or mode of action
CONFIDENTIAL205878
14
9.1. APPENDIX 1: Data Display Standards & Handling Conventions
The number of patients included in each analysis population will be summarized by endpoints and may be further characterized by baseline data. In general, categorical data will be summarized using frequency counts and percents, and continuous data will be summarized using means, standard deviations, percentiles (e.g. minimum, 1st quartile, median 3rd quartile and maximum). Summaries will be calculated for each analysis population overall, and if appropriate in relevant subgroups.
CONFIDENTIAL205878
15
9.2. APPENDIX 2: Derived and Transformed Data
Should the distribution of any dependent variable deviate substantially from that assumed for a particular analysis method, an appropriate transformation will be applied or a robust method used.
Of note, continuous endpoints may have distributions where the errors are sufficiently close to normal that analyses with balanced explanatory groups, such as comparisons of treatment groups, may use a normal linear model (or special cases such as ANOVA or a t-test). Yet, when the same continuous endpoint is used for GWAS analyses, where millions of tests are conducted and where many explanatory variables are highly imbalanced (genetic variants with low minor allele frequency), even a slight departure from the assumed normality of association test statistics may be sufficient to generate substantial numbers of false positive associations (albeit at an extremely low rate), and eclipsing the signal from any true positive associations. Therefore, GWAS analyses may require more aggressive transformations, such as a normal quantile transform, that were not required for the primary clinical analyses.
When significant associations with specific genetic variants are identified using aggressive transformations, additional analyses may be conducted using simpler transformations (such as logarithmic or square root), in order to obtain more clinically interpretable effect size estimates.
CONFIDENTIAL205878
16
9.3. APPENDIX 3: Premature Withdrawals & Handling of Missing Data
9.3.1. Premature Withdrawals
Patients who withdrew consent for the optional genetics research component of the clinical studies prior to genetics consent reconciliation for this genetics study are not included in this analysis.
9.3.2. Handling of Missing Data in Statistical Analysis
Missing data points will not be imputed and subjects missing specific endpoint data will be excluded from those analyses.
9.3.3. Handling of Missing Genetic Data
The endpoint, covariates, key demographic/baseline variables, and time on study may be compared between the subsets of individuals included in the genetic analysis population and the subset of individuals excluded (due to lack of consent, or failure to provide a sample, or failure of genotyping or QC). Appropriate summary statistics for each variable may be inspected for any concerning imbalances. If any imbalances that may affect the analysis are identified, these factors may be explored further and/or accounted for in the analysis models.
CONFIDENTIAL205878
17
9.4. APPENDIX 4: Genotype/Subject Quality Control
9.4.1. Subject Quality Control
Subjects will be excluded according to the following criteria: (i) subjects with arrays where genotyping failed, as identified in the manufacturer’s genotype calling software and following manufacturer’s guidelines; (ii) subjects with low call rate (threshold to be determined based on the data); (iii) subjects for whom sex inferred from sex chromosome genotypes cannot be reconciled with sex recorded on the CRF (e.g. sample swap); (iv) subjects with identical genotypes (e.g. identical twins, multiple participation for same individual or sample plating errors); (v) subjects with high-degree of cryptic relatedness. Following subject exclusions and before the statistical analysis, SNP exclusions will be applied as part of genotype imputation as described in Section 9.4.2.
Cryptic relatedness refers to a situation where multiple individuals in a study sample are genetically related to one another, which if present to a substantial degree could bias analysis results. A robust algorithm for relationship inference [Manichaikul, 2010] will be used to check family relationship by estimating all kinship coefficients for all pairwise relationships. For pairs of DNA samples that have 3rd-degree relationship or more closer, one sample in each pair will be excluded from the analysis.
9.4.2. Genotype Quality Control
Prior to genotype imputation (see Section 9.9), variants in each GWAS dataset will be excluded if they have low call rate, if they have poor calling metrics, if they show deviations from Hardy-Weinberg proportions within subgroups of any given ancestry, if they are monomorphic, if they show gross and irreconcilable differences in alleles or allele frequency with reference panel genotypes from the HapMap or 1000 Genome projects. After phasing and alignment, QC metrics will be examined to identify strand flip errors (e.g. correlation between measured and imputed genotype close to r=-1) and if necessary these variants will be removed. Post-imputation, there will be no missing genotype data. Variants will not be excluded post-imputation on the basis of minor allele frequency/count or imputation quality metrics, unless inspection of association statistic QQ and Manhattan plots suggests excess false positive associations [Kutalik, 2011]
CONFIDENTIAL205878
18
9.5. APPENDIX 5: Multiple Comparisons & Multiplicity
9.5.1. GWAS analysis
The conventional P≤5x10-8 threshold for declaring genome-wide significance for common variants (MAF>=1%) will be used [McCarthy, 2008; Dudbridge, 2008]. No correction will be made for multiple endpoints or for any exploratory analyses.
CONFIDENTIAL205878
19
9.6. APPENDIX 6: Hardy-Weinberg (HW) Analysis
Hardy-Weinberg (HW) proportions is a historic term for the notion that alleles are inherited from each parent independently, and thus expected genotype frequencies can be predicted from allele frequencies. Departure from HW proportions can have several causes, including genotyping error, and admixture of subjects with different ancestries. HW analysis will be conducted for all genotyped variants and will be conducted within race and ethnicity groups that have sufficient sample sizes. For variants significantly associated with any endpoint, substantial evidence of departure from HW proportions will be investigated for possibility of genotyping error (e.g. by manual examination of cluster plots, and by examination of variants that should be in linkage disequilibrium with the focal variant).
CONFIDENTIAL205878
20
9.7. APPENDIX 7: Linkage Disequilibrium Analysis
Linkage Disequilibrium (LD) measures the association between alleles at different loci. It can help in understanding if association signals in the same region are independent from each other or due to correlation among the variants. LD analysis (measured as D’/r2) may be conducted for interesting variants, if appropriate, using subjects from the population of interest. Pairwise LD will be limited to variants located within a particular gene or gene region of interest.
CONFIDENTIAL205878
21
9.8. APPENDIX 8: Characterizing Ancestry Using Principal Components Analysis
Principal component analysis (PCA) of large numbers of genetic variants (typically genome-wide) can be used to characterize ancestry for each genotyped subject [Price, 2006; Patterson, 2006; Novembre, 2008]. The principal components may be used as covariates in tests of genetic association (e.g. regression of an endpoint onto each individual genetic variant in turn), to correct for confounding due to population stratification [Price, 2006]. All subjects will be analyzed by study and PCs will be used to adjust for differences in ancestry. Further clustering based on the principal components may also be used to refine self-reported race and ethnicity to facilitate investigation of genetic effects specific to certain ancestry groups.
CONFIDENTIAL205878
22
9.9. APPENDIX 9: Genotype Imputation
Genotype imputation for genetic variants that were not directly genotyped (“untyped variants”) will be performed using a cosmopolitan haplotype reference panel from the 1000 Genomes Project, and using Hidden Markov Model methods as implemented in MaCH and minimac [Li, 2009; Howie, 2012]. Subject and SNP exclusions will be applied prior to imputation.Directly genotyped variants that are not present in the reference panel will be converted to an imputation format and included in analysis.
Of note, for each genetic variant that was directly genotyped, called genotypes will typically not be available for a small fraction of subjects (e.g. <3% of subjects when the call rate QC threshold is 97%), and some genotypes will be called in error. Genotype imputation provides information about no-call genotypes, and may also alter a small fraction of called genotypes when likely errors can be detected using linkage disequilibrium with other genetic variants.
HLA genotype imputation will be performed using the HIBAG algorithm and published
parameter estimates [Zheng, 2014].
CONFIDENTIAL205878
23
9.10. APPENDIX 10: Genomic Control
Due to the presence of uncorrected population structure or subject relatedness, the association between genotype and endpoint may be confounded and association test statistics inflated. Under likely scenarios for population structure or subject relatedness, distribution of association chi-square statistics or standard errors genome-wide may be inflated by a constant multiplicative factor [Devlin, 1999]. The genomic control inflation factor λ is defined as the ratio of the median of the empirically observed distribution of chi-square test statistic to the expected median (0.455 for a 1 d.f. test). The value of λthus quantifies the extent of the test statistic inflation.
Due to the possibility of other causes of test statistic inflation for genetic variants that have low minor allele frequency or are poorly imputed, the genomic control inflation factor λ will be estimated using the empirical distribution of chi-square test statistics for a subset of genetic variants with minor allele frequency ≥1% and imputation efficiency ≥0.3.
Test statistics will be corrected genome-wide (including for genetic variants not used in the estimation λ), by multiplying the SE by √λ, and by dividing the LRT statistic by λ, and by calculating P-values using the (unchanged) effect size estimate along with the corrected SE and LRT statistic.
CONFIDENTIAL205878
24
9.11. APPENDIX 11: Independent Signal Identification
When more than one genetic variant is significantly associated with a given endpoint, the number of distinct association signals may be determined using physical map positions of the variants, conditional analyses, and/or linkage disequilibrium.
Significantly associated variants that are physically 200kb or closer, on the human reference sequence used for the analysis, will be recursively grouped into “associated regions”. This approach allows a potentially large number of significantly associated variants to be visualised in a smaller number of distinct regional association plots. Each region will be characterized by the number of significantly associated variants contained, the characteristics of the index variant (defined as the variant with the smallest P-value in that associated region), and summary characteristics of the significantly associated variants in region.
Conditional analyses will be conducted for multiple genetic variants, using the same model as used for analyses of single genetic variants. For each endpoint, forwards model selection may be used to obtain a fitted model in which each genetic variant is included only if it is significantly associated with the endpoint (at the given threshold), conditional on other genetic variants included in the model. All genetic variants associated with the endpoint in single variant analyses (at the given threshold) will be analysed in this way.
Genetic variants that are not significant (at a given threshold) in single variant analyses, may be significant (at the same threshold) in conditional analyses, if the co-occurring alleles have opposite directions of true effect on the endpoint. Such variants may be detected by analysis of linkage disequilibrium, which approximates conditional analyses of all variants genome-wide using both forwards and backwards model selection [Yang, 2011]. When this approach is used, the genetic variants identified will be subsequently included in an exact conditional analysis as described above.
CONFIDENTIAL205878
25
9.12. APPENDIX 12: Reporting and Interpretation
Genetic associations will be summarized by regression model effect size estimates and standard errors, adjusted for covariates. Effect size estimates and confidence interval endpoints may be transformed from the analysis scale to an alternative scale to facilitate interpretation. Associations may be displayed using an appropriate plot or table of endpoint versus genotype (such as dotplot or boxplot for continuous endpoints). Manhattan plots and Quantile-Quantile (QQ) plots may be used to visualize P-values at the whole-genome scale. Results may be annotated by whether the genetic variant was typed or imputed, and a metric for quality of imputation. Genotype or endpoint categories may be combined to generate 2x2 contingency tables where calculation of genotype test sensitivity, specificity, and positive or negative predictive values may facilitate interpretation.
All determinations of statistical significance will be subject to assessment of the sensitivity of the results to deviations from modelling assumptions.
CONFIDENTIAL205878
26
9.13. APPENDIX 13: Abbreviations & Trade Marks
9.13.1. Abbreviations
Abbreviation Description
ADME absorption, distribution, metabolism, and excretion
ANOVA analysis of variance
CI confidence interval
CKD chronic kidney disease
CRF case report form
DNA deoxyribonucleic acid
EGLN egg-laying deficiency protein nine-like protein
EPO erythropoietin
GSK GlaxoSmithKline
GWAS genome-wide association study
HDD hemodialysis-dependent
Hgb hemoglobin
HLA human leukocyte antigen
HW Hardy-Weinberg
ITT intent to treat
kb kilo base pairs
LD linkage disequilibrium
LRT likelihood ratio test
MOA mode of action
NDD non-dialysis-dependent
PCA principal component analysis
PD pharmacodynamic
PGx pharmacogenetics
PHI prolyl hydroxylase inhibitor
PK pharmacokinetic
QC quality control
QQ quantile-quantile
RAP reporting and analysis plan
rhEPO Recombinant human erythropoietin
SE standard error
SNP single nucleotide polymorphism
CONFIDENTIAL205878
27
9.13.2. Trademarks
Trademarks of the GlaxoSmithKline Group of Companies
Trademarks not owned by the GlaxoSmithKline Group of Companies
NONE None
CONFIDENTIAL205878
28
9.14. APPENDIX 14: A list of genes involved in daprodustat metabolism, disposition or mode of action
Category Gene
Directly targeted EGLN1
EGLN2
EGLN3
Erythropoiesis pathway HIF1A
EPAS1
HIF3A
HIF1AN
VHL
EPO
EPOR
Metabolism and disposition CYP2C8
ABCG2