division information type title › ctr-gsk-7381 › 205878 › 900191f4-95...effective date :...

CONFIDENTIALThe GlaxoSmithKline group of companies 205878

1

Division : Worldwide Development

Information Type : Reporting and Analysis Plan (RAP)

Title : PGX7647 (205878): Exploratory Pharmacogenetic GWAS meta-analysis of efficacy response to GSK1278863 by using subjects with CKD from phase 2 studies; 116581, 116582, 112844, 113633, 113747, and 116099

Compound Number : GSK1278863

Effective Date : 16-AUG-2016

Description :

This document defines the analysis plan for eTrack study ID 205878. It describes thegenetic analysis using data pooled from six clinical phase II studies and genome-wide germline genetic data to test for association between genetic variants and differential response to GSK1278863.

The study objective: Evaluate the effect of genome-wide genetic variation on differential response to GSK1278863

Keywords: Pharmacogenetics (PGx), analysis, genome-wide association study (GWAS),single nucleotide polymorphisms (SNPs), anemia, chronic kidney disease (CKD), GSK1278863, daprodustat, prolyl hydroxylases inhibitor (PHI), hemoglobin, reticulocyte, dose

Author’s Name and Functional Area:

12-AUG-2016PAREXEL Genomic Medicine

12-AUG-2016PAREXEL Genomic Medicine

Approved by:

16-AUG-2016GSK Statistical Genetics, Target Sciences

Copyright 2016 the GlaxoSmithKline group of companies. All rights reserved. Unauthorised copying or use of this information is prohibited

PPD

PPD

PPD

CONFIDENTIALThe GlaxoSmithKline group of companies 205878

2

Distribution List:

GSK Genetics, Target Sciences

GSK Genetics, Target Sciences

PAREXEL Genomic Medicine

PAREXEL Genomic Medicine

GSK Metabolic, Pulmonary Pathways and Cardiovascular

GSK QSci Clinical Statistics

GSK QSci Clinical Statistics

PPD

CONFIDENTIAL205878

3

TABLE OF CONTENTS

PAGE

1. GENETIC REPORTING & ANALYSIS PLAN SUMMARY ........................................4

2. SUMMARY OF KEY INFORMATION .......................................................................52.1. Introduction and Rationale ............................................................................52.2. Study Objectives and Endpoints ...................................................................52.3. Study Design ................................................................................................62.4. Statistical Hypotheses...................................................................................6

3. SAMPLE SIZE CONSIDERATIONS AND POWER ESTIMATES FOR THE GENETIC ANALYSIS...............................................................................................6

4. GENETIC ANALYSIS POPULATIONS.....................................................................7

5. CONSIDERATIONS FOR DATA ANALYSES...........................................................8

6. DATA HANDLING CONVENTIONS .........................................................................9

7. GENETIC ANALYSES .............................................................................................97.1. Primary Analyses..........................................................................................9

7.1.1. Map of Analyses ..........................................................................107.2. General Genetic Analysis Conventions.......................................................10

8. REFERENCES.......................................................................................................12

9. APPENDICES ........................................................................................................139.1. APPENDIX 1: Data Display Standards & Handling Conventions.................149.2. APPENDIX 2: Derived and Transformed Data ............................................159.3. APPENDIX 3: Premature Withdrawals & Handling of Missing Data ............16

9.3.1. Premature Withdrawals................................................................169.3.2. Handling of Missing Data in Statistical Analysis ...........................169.3.3. Handling of Missing Genetic Data................................................16

9.4. APPENDIX 4: Genotype/Subject Quality Control ........................................179.4.1. Subject Quality Control ................................................................179.4.2. Genotype Quality Control.............................................................17

9.5. APPENDIX 5: Multiple Comparisons & Multiplicity ......................................189.5.1. GWAS analysis............................................................................18

9.6. APPENDIX 6: Hardy-Weinberg (HW) Analysis ...........................................199.7. APPENDIX 7: Linkage Disequilibrium Analysis ...........................................209.8. APPENDIX 8: Characterizing Ancestry Using Principal Components

Analysis ......................................................................................................219.9. APPENDIX 9: Genotype Imputation............................................................229.10. APPENDIX 10: Genomic Control ................................................................239.11. APPENDIX 11: Independent Signal Identification .......................................249.12. APPENDIX 12: Reporting and Interpretation...............................................259.13. APPENDIX 13: Abbreviations & Trade Marks .............................................26

9.13.1. Abbreviations...............................................................................269.13.2. Trademarks .................................................................................27

9.14. APPENDIX 14: A list of genes involved in daprodustat metabolism, disposition or mode of action ......................................................................28

CONFIDENTIAL205878

4

1. GENETIC REPORTING & ANALYSIS PLAN SUMMARY

RAP Area Description

Purpose The purpose of this reporting and analysis plan (RAP) is to describe the planned analyses and output for the analysis of genetic associations with differentialresponse to GSK1278863, using data from six phase II studies (PHI113633, PHI113747, PHI116581, PHI116582, PHI112844, and PHI116099) and a genome-wide association study (GWAS) approach.

Protocols This is a non-interventional genetic study with no corresponding protocol. Samples used in this study were collected under studies: PHI113633, PHI113747, PHI116581, PHI116582, PHI112844, and PHI116099

Objective / Endpoint

Objective: Evaluate the effect of genome-wide genetic variation on differential response to GSK1278863

Endpoints:

Change in Hemoglobin (g/dL) at 4 weeks

Change in Reticulocytes (1012/L) at 4 weeks

Final dose (mg) at week 24

Study Design The data used in this analysis will be derived from six phase II clinical studies: PHI113633, PHI113747, PHI116581, PHI116582, PHI112844, and PHI116099.

Analysis Populations

PGx analysis population: Subjects who have given PGx consent and a PGx sample, have been successfully genotyped and passed the GWAS data quality control (QC), and have valid clinical endpoint data available.

Patients included in analyses of change from baseline endpoints will have received a fixed treatment dose of GSK1278863 for the 4 weeks being assessed.

Patients included in the analyses of final dose will have received 24 weeks of treatment.

Hypothesis Null hypothesis: None of the genome-wide genetic variants analysed are associated with differential response to GSK1278863

Analyses Clinical response endpoints and genome-wide (genotyped and imputed) variants

Linear regression of hemoglobin change from baseline at week 4

Linear regression of reticulocyte change from baseline at week 4

Linear regression of final dose at week 24

Analyses will be conducted using all subjects, adjusting for principle components in the model to account for ancestry, and will be conducted by study. Studies will be analyzed individually as they differ by several key factors including genotyping platform used, prior rhEPO treatment, and hemodialysis dependency. Meta-analyses will be conducted using effect estimates obtained from each study analyzed.

CONFIDENTIAL205878

5

2. SUMMARY OF KEY INFORMATION

2.1. Introduction and Rationale

GSK1278863 (daprodustat) is a novel small molecule agent that has demonstrated the ability to stimulate erythropoiesis in preclinical models. GSK1278863 inhibits prolyl hydroxylases [egg-laying deficiency protein nine-like protein (EGLNs)] associated with hypoxia inducible factor (HIF). Inhibition of EGLNs prevents breakdown of HIFα. This activity results in the accumulation of HIFα transcription factors, which leads to increased transcription of HIF responsive genes. This biological activity simulates components of the natural response to exposure to hypoxia. Erythropoiesis is thus induced by increased production of natural erythropoietin (EPO) and enhanced iron metabolism.

Currently, oral daprodustat is being studied for treatment of anemia associated with CKD. Topical GSK1278863 is being studied for treatment of diabetic foot ulcers(GlaxoSmithKline Document Number RM2008/00267/07).

A previous focused pharmacogenetic study (201099, also known as PGx7532) assessed relationships between genetic variants in CYP2C8 and ABCG2 and pharmacokinetic (PK) variability of GSK1278863 using dose normalized AUC and explored the effects of genetic variants in the genes involved in drug absorption, distribution, metabolism and excretion (ADME), and related to mode of action (MOA), on pharmacodynamic (PD) variability of GSK1278863 using change in hemoglobin (Hgb). These analyses used subjects from 4 Phase 1 and 4 Phase 2 studies (PHI116581, PHI116582, PHI112844, PHI115385, PHI113634, PHI114703, PHI116008, and PVD114272). No genetic markers met the pre-specified criteria for association from both PK and PD PGx analyses.

This pharmacogenetic (PGx) study has been undertaken to evaluate genetic effects of genome-wide variation on differential response in subjects with anemia from CKD receiving daprodustat from 6 PII studies Note that 3 studies overlap with those used in the previous focussed PGx analyses. One Phase 2 study used in the previous PGx analyses (PVD114272) was not included as it was conducted in subjects with peripheral artery disease.

2.2. Study Objectives and Endpoints

Objectives Endpoints

Evaluate genetic effects on change in hemoglobin


Evaluate genetic effects on change in reticulocytes


Evaluate genetic effects on dose selection Final dose (mg) at week 24

CONFIDENTIAL205878

6

2.3. Study Design

The data used in this analysis will be derived from PII studies with oral GSK1278863administration at the doses as the study designed. A brief description of each study is provided as:

Study ITTPGx

subjects* Description

PHI116581 74 61A 4-week phase II study to evaluate the safety, efficacy and PK of GSK1278863 in subjects with anemia associated with CKD who were not taking rhEPO and NDD

PHI116582 86 68A 4-week phase II study to evaluate the safety, efficacy and PK of switching subjects from a stable dose of rhEPO to GSK1278863 in HDD subjects with anemia associated with CKD

PHI112844 107 71A phase IIA study of Safety, PK, and Efficacy of 28-day repeat oral doses of GSK1278863 in anemic pre-dialysis and HDD patients

PHI113633 217 197

A phase IIB study to evaluate the dose response relationship of GSK1278863 over the first 4 weeks of treatment and evaluate the safety and efficacy of GSK1278863 over 24 weeks in HDD subjects with anemia associated with CKD who switched from rhEPO

PHI113747# 252 198A 24-week phase IIB study to evaluate the safety and efficacy of GSK1278863 in NDD subjects with anemia associated with CKD

PHI116099 99 76A 4-week phase II study to evaluate the safety, efficacy, and PK of GSK1278863 in Japanese HDD subjects with anemia associated with CKD

*PGx subjects with genotype data available.#Starting GSK1278863 dose was based on data from previous studies, as well as baseline Hgb concentration

2.4. Statistical Hypotheses

This PGx study is designed to evaluate genetic effects on GSK1278863’s differential response via an exploratory analysis using a hypothesis-free GWAS approach.

Null hypothesis: None of the genome-wide genetic variants analyzed are associated with differential response to GSK1278863

3. SAMPLE SIZE CONSIDERATIONS AND POWER ESTIMATES FOR THE GENETIC ANALYSIS

Note that these clinical studies were not designed with power considerations for genetic analyses. All analyses described herein are exploratory. Power calculations are not needed prior to this exploratory GWAS analysis where potential effect size estimates are unknown.

CONFIDENTIAL205878

7

4. GENETIC ANALYSIS POPULATIONS

PGx analyses Definition / Criteria for PGx analysis population Clinical studies included

Change in hemoglobinorreticulocytes

Subjects have met all the criteria:

Provided PGx consent and a PGx sample

Successfully genotyped and passed QC

Treated with randomized, fixed dose of GSK1278863 for four weeks and hadlab measurements of Hgb and/or reticulocytes available to determine change from baseline

PHI116581

PHI116582

PHI112844

PHI113633

PHI116099

Final dose Subjects have met all the criteria:

Provided PGx consent and a PGx sample

Successfully genotyped and passed QC

Treated with GSK1278863 for 24 weeks and have final dose information available

PHI113633

PHI113747

CONFIDENTIAL205878

8

5. CONSIDERATIONS FOR DATA ANALYSES

It is anticipated that after an initial review of the results of the analyses described here, there may be a need for additional follow-up analyses. These will be discussed, defined, and agreed to by the authors of this RAP and other relevant parties at that time. This RAP will not be updated to include any additional follow-up analyses, but these will be described in the PGx study report.

Genetic Variants Genome-wide genetic variants included in this analysis are SNPs genotyped directly via Affymetrix Axiom® Biobank Genotyping Arrays with GSK’s modification v1(PHI116581, PHI116582, PHI112844) or v2 (PHI113633, PHI113747, PHI116099), and imputed by using haplotype reference panels from the 1000 Genomes (1000G) Project.

Variants with an imputation quality R2>0.3 and minor allele frequency >0.01 will be analyzed.

Type I Error Due to exploratory nature of this analysis, no correction will be made for multiple endpoints.

An analysis-wide type 1 error of 0.05 will be maintained accounting for the number of genetic variants analyzed. The threshold for declaring significance for the meta-analysis results will be 5x10-8 (conventional genome-wide analysis threshold, See APPENDIX 5).

Examination of Subgroups No subgroup analyses have been planned at this time.

CONFIDENTIAL205878

9

6. DATA HANDLING CONVENTIONS

Table 1 provides an overview of appendices within this RAP for outlining data handling conventions.

Table 1 Overview of Appendices

Section Component

9.1 APPENDIX 1: Data Display Standards & Handling Conventions

9.2 APPENDIX 2: Derived and Transformed Data

9.3 APPENDIX 3: Premature Withdrawals & Handling of Missing Data

9.4 APPENDIX 4: Genotype/Subject Quality Control

7. GENETIC ANALYSES

7.1. Primary Analyses

+++ Primary Statistical Analysis +++

Endpoint / Covariates / Model Specification

Endpoints:




Potential Covariates: The following variables, which are known or potential covariates of one or more of the endpoint variables, may be evaluated in the statistical analyses as independent variables and/or individually assessed for association with genetic marker genotypes

Treatment Arm

Age (years)

BMI (kg/m2) at Baseline

Weight (kg) at Baseline

Gender

Prior rhEPO dose Stratification Factor (Low, High)

Disease status (non-dialysis, hemodialysis)

Baseline measurements for the specified endpoints

Self-reported race and ethnicity

Ancestry principal components (will be defined from available genetic data)

Model:

The association analysis will use a normal linear model for continuous endpoints, after appropriate transformation(s), if necessary. Genome-wide common variants will be tested assuming an additive genetic model. Genotype dosage will be used for imputed genetic variants. The reported results will include the estimated genetic effect with its corresponding standard error, and P-value. The analyses will include ancestry principal components in the model to adjust for potential confounding due to population structure [Price, 2006], and may include other potential covariates as noted above. Genomic control methods will be used to assess and correct for test statistic inflation within each analysis due to residual uncorrected population structure and relatedness [Devlin, 1999]. Regional association plots (APPENDIX 11) may be generated to visualize genetic associations in regions of interest, including regions where significant associations are observed and in genes involved in daprodustat metabolism, disposition, or mode of action (APPENDIX 14).

CONFIDENTIAL205878

10

Analysis Population

PGx analysis populations (detailed in Section 4)

Genetic Variants

Genome-wide variants (described in Section 5)

Effects to be Modeled (Main or Interaction Effect; Dominant/Additive/Recessive Genetic Model)

Main effect of genotype on endpoint with an additive genetic model. Additional genetic models may be explored if warranted. Only treated subjects will be analyzed; no interactions effects will be analyzed.

Meta-Analysis

A fixed effect inverse variance weighted meta-analysis of the effect size estimate from each underlying study analyzed will be conducted for each endpoint. We anticipate there will likely be heterogeneity between studies and this may be further explored, especially in any significant results.

Statement Regarding What Constitutes a Significant Result

This is an exploratory PGx analysis. No correction for multiple endpoints will be applied and p-values will be reported for each meta-analysis. Thresholds for declaring statistical significance are detailed in Section 5. Any findings of interest from this analysis will require further evaluation and confirmation in an independent dataset. When more than one genetic variant is associated with a given endpoint, the number of distinct association signals will be determined using established approaches, including physical map positions of the variants, conditional analyses, and linkage disequilibrium (Yang. 2011).

7.1.1. Map of Analyses

Analysis #

Meta-Analysis

#Endpoint Study Genotypes* N with Genotypes

11


PHI113633 BB2 197

2 PHI113747 BB2 198

3

2Change in Hemoglobin

(g/dL) at 4 weeks

PHI116581 BB1 61

4 PHI116582 BB1 71

5 PHI112844 BB1 68

6 PHI113633 BB2 197

7 PHI116099 BB2 76

8

3Change in

Reticulocytes (1012/L) at 4 weeks

PHI116581 BB1 61

9 PHI116582 BB1 71

10 PHI112844 BB1 68

11 PHI113633 BB2 197

12 PHI116099 BB2 76

*Genotyped on the Affymetrix Axiom® Biobank Genotyping Arrays with GSK’s modification v1 (BB1) or v2 (BB2)

7.2. General Genetic Analysis Conventions

Table 2 provides an overview of appendices within the RAP for outlining general geneticanalysis conventions.

CONFIDENTIAL205878

11

Table 2 Overview of Appendices

Section Component

Section 9.5 APPENDIX 5: Multiple Comparisons & Multiplicity

Section 9.6 APPENDIX 6: Hardy-Weinberg (HW) Analysis

Section 9.7 APPENDIX 7: Linkage Disequilibrium Analysis

Section 9.8 APPENDIX 8: Characterizing Ancestry Using Principal Components Analysis

Section 9.9 APPENDIX 9: Genotype Imputation

Section 9.10 APPENDIX 10: Genomic Control

Section 9.11 APPENDIX 11: Independent Signal Identification

Section 9.12 APPENDIX 12: Reporting and Interpretation

CONFIDENTIAL205878

12

8. REFERENCES

Devlin B, Roeder K. (1999) Genomic Control for Association Studies. Biometrics 55(4): 997-1004

Dudbridge F, Gusnanto A. (2008) Estimation of Significant Thresholds for Genomewide Association Scans. Genetic Epidemiology 32:227-34.

GlaxoSmithKline Document Number RM2008/00267/07 Study ID GSK1278863. Investigator's Brochure. Report Date 14-OCT-2015.

Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR. (2012) Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nature Genetics. 44:955-9.

Kutalik Z, Johnson T, Bochud M, Mooser V, Vollenweider P, Waeber G, Waterworth D, Beckmann JS, Bergmann S. (2011) Methods for testing association between uncertain genotypes and quantitative traits. Biostatistics 12:1-17.

Li Y, Willer C, Sanna S, Abecasis G (2009) Genotype Imputation. Annu Rev Genomics Hum Genet. 2009; 10: 387–406.

Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM. (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26(22):2867-73.

McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, Hirschhorn JN. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics 9(5):356-69

Novembre J, Johnson T, Bryc K, Kuralik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, Stephens M, Bustamante CD. (2008) Nature 456:98-101.

Patterson V, Price AL, Reich D. (2006) Population structure and eigenanalysis. PLoSGenet. 2(12): e190

Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38(8):904-909.

Yang J, Ferreira T, Morris AP, Medland SE; Genetic Investigation of Anthropometric Traits (GIANT) Consortium; DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium, Madden PA, Heath AC, Martin NG, Montgomery GW, Weedon MN, Loos RJ, Frayling TM, McCarthy MI, Hirschhorn JN, Goddard ME, Visscher PM. (2011) Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature Genetics 44(4):369-375

CONFIDENTIAL205878

13

Zheng X, Shen J, Cox C, Wakefield J, Ehm M, Nelson M, Weir BS. (2014) HIBAG –HLA genotype imputation with attribute bagging. Pharmacogenomics Journal 14:192-200.

9. APPENDICES

Appendix Number Appendix Description

RAP Section 6: Data Handling Conventions

Section 9.1 APPENDIX 1: Data Display Standards & Handling Conventions

Section 9.2 APPENDIX 2: Derived and Transformed Data

Section 9.3 APPENDIX 3: Premature Withdrawals & Handling of Missing Data

Section 9.4 APPENDIX 4: Genotype/Subject Quality Control

RAP Section 7.1.1: General Genetic Analysis Conventions

Section 9.5 APPENDIX 5: Multiple Comparisons & Multiplicity

Section 9.6 APPENDIX 6: Hardy-Weinberg (HW) Analysis

Section 9.7 APPENDIX 7: Linkage Disequilibrium Analysis

Section 9.8 APPENDIX 8: Characterizing Ancestry Using Principal Components Analysis

Section 9.9 APPENDIX 9: Genotype Imputation

Section 9.10 APPENDIX 10: Genomic Control

Section 9.11 APPENDIX 11: Independent Signal Identification

Section 9.12 APPENDIX 12: Reporting and Interpretation

Other RAP Appendices

Section 9.13 APPENDIX 13: Abbreviations & Trade Marks

Section 9.4 APPENDIX 14: A list of genes involved in daprodustat metabolism, disposition or mode of action

CONFIDENTIAL205878

14

9.1. APPENDIX 1: Data Display Standards & Handling Conventions

The number of patients included in each analysis population will be summarized by endpoints and may be further characterized by baseline data. In general, categorical data will be summarized using frequency counts and percents, and continuous data will be summarized using means, standard deviations, percentiles (e.g. minimum, 1st quartile, median 3rd quartile and maximum). Summaries will be calculated for each analysis population overall, and if appropriate in relevant subgroups.

CONFIDENTIAL205878

15

9.2. APPENDIX 2: Derived and Transformed Data

Should the distribution of any dependent variable deviate substantially from that assumed for a particular analysis method, an appropriate transformation will be applied or a robust method used.

Of note, continuous endpoints may have distributions where the errors are sufficiently close to normal that analyses with balanced explanatory groups, such as comparisons of treatment groups, may use a normal linear model (or special cases such as ANOVA or a t-test). Yet, when the same continuous endpoint is used for GWAS analyses, where millions of tests are conducted and where many explanatory variables are highly imbalanced (genetic variants with low minor allele frequency), even a slight departure from the assumed normality of association test statistics may be sufficient to generate substantial numbers of false positive associations (albeit at an extremely low rate), and eclipsing the signal from any true positive associations. Therefore, GWAS analyses may require more aggressive transformations, such as a normal quantile transform, that were not required for the primary clinical analyses.

When significant associations with specific genetic variants are identified using aggressive transformations, additional analyses may be conducted using simpler transformations (such as logarithmic or square root), in order to obtain more clinically interpretable effect size estimates.

CONFIDENTIAL205878

16

9.3. APPENDIX 3: Premature Withdrawals & Handling of Missing Data

9.3.1. Premature Withdrawals

Patients who withdrew consent for the optional genetics research component of the clinical studies prior to genetics consent reconciliation for this genetics study are not included in this analysis.

9.3.2. Handling of Missing Data in Statistical Analysis

Missing data points will not be imputed and subjects missing specific endpoint data will be excluded from those analyses.

9.3.3. Handling of Missing Genetic Data

The endpoint, covariates, key demographic/baseline variables, and time on study may be compared between the subsets of individuals included in the genetic analysis population and the subset of individuals excluded (due to lack of consent, or failure to provide a sample, or failure of genotyping or QC). Appropriate summary statistics for each variable may be inspected for any concerning imbalances. If any imbalances that may affect the analysis are identified, these factors may be explored further and/or accounted for in the analysis models.

CONFIDENTIAL205878

17

9.4. APPENDIX 4: Genotype/Subject Quality Control

9.4.1. Subject Quality Control

Subjects will be excluded according to the following criteria: (i) subjects with arrays where genotyping failed, as identified in the manufacturer’s genotype calling software and following manufacturer’s guidelines; (ii) subjects with low call rate (threshold to be determined based on the data); (iii) subjects for whom sex inferred from sex chromosome genotypes cannot be reconciled with sex recorded on the CRF (e.g. sample swap); (iv) subjects with identical genotypes (e.g. identical twins, multiple participation for same individual or sample plating errors); (v) subjects with high-degree of cryptic relatedness. Following subject exclusions and before the statistical analysis, SNP exclusions will be applied as part of genotype imputation as described in Section 9.4.2.

Cryptic relatedness refers to a situation where multiple individuals in a study sample are genetically related to one another, which if present to a substantial degree could bias analysis results. A robust algorithm for relationship inference [Manichaikul, 2010] will be used to check family relationship by estimating all kinship coefficients for all pairwise relationships. For pairs of DNA samples that have 3rd-degree relationship or more closer, one sample in each pair will be excluded from the analysis.

9.4.2. Genotype Quality Control

Prior to genotype imputation (see Section 9.9), variants in each GWAS dataset will be excluded if they have low call rate, if they have poor calling metrics, if they show deviations from Hardy-Weinberg proportions within subgroups of any given ancestry, if they are monomorphic, if they show gross and irreconcilable differences in alleles or allele frequency with reference panel genotypes from the HapMap or 1000 Genome projects. After phasing and alignment, QC metrics will be examined to identify strand flip errors (e.g. correlation between measured and imputed genotype close to r=-1) and if necessary these variants will be removed. Post-imputation, there will be no missing genotype data. Variants will not be excluded post-imputation on the basis of minor allele frequency/count or imputation quality metrics, unless inspection of association statistic QQ and Manhattan plots suggests excess false positive associations [Kutalik, 2011]

CONFIDENTIAL205878

18

9.5. APPENDIX 5: Multiple Comparisons & Multiplicity

9.5.1. GWAS analysis

The conventional P≤5x10-8 threshold for declaring genome-wide significance for common variants (MAF>=1%) will be used [McCarthy, 2008; Dudbridge, 2008]. No correction will be made for multiple endpoints or for any exploratory analyses.

CONFIDENTIAL205878

19

9.6. APPENDIX 6: Hardy-Weinberg (HW) Analysis

Hardy-Weinberg (HW) proportions is a historic term for the notion that alleles are inherited from each parent independently, and thus expected genotype frequencies can be predicted from allele frequencies. Departure from HW proportions can have several causes, including genotyping error, and admixture of subjects with different ancestries. HW analysis will be conducted for all genotyped variants and will be conducted within race and ethnicity groups that have sufficient sample sizes. For variants significantly associated with any endpoint, substantial evidence of departure from HW proportions will be investigated for possibility of genotyping error (e.g. by manual examination of cluster plots, and by examination of variants that should be in linkage disequilibrium with the focal variant).

CONFIDENTIAL205878

20

9.7. APPENDIX 7: Linkage Disequilibrium Analysis

Linkage Disequilibrium (LD) measures the association between alleles at different loci. It can help in understanding if association signals in the same region are independent from each other or due to correlation among the variants. LD analysis (measured as D’/r2) may be conducted for interesting variants, if appropriate, using subjects from the population of interest. Pairwise LD will be limited to variants located within a particular gene or gene region of interest.

CONFIDENTIAL205878

21

9.8. APPENDIX 8: Characterizing Ancestry Using Principal Components Analysis

Principal component analysis (PCA) of large numbers of genetic variants (typically genome-wide) can be used to characterize ancestry for each genotyped subject [Price, 2006; Patterson, 2006; Novembre, 2008]. The principal components may be used as covariates in tests of genetic association (e.g. regression of an endpoint onto each individual genetic variant in turn), to correct for confounding due to population stratification [Price, 2006]. All subjects will be analyzed by study and PCs will be used to adjust for differences in ancestry. Further clustering based on the principal components may also be used to refine self-reported race and ethnicity to facilitate investigation of genetic effects specific to certain ancestry groups.

CONFIDENTIAL205878

22

9.9. APPENDIX 9: Genotype Imputation

Genotype imputation for genetic variants that were not directly genotyped (“untyped variants”) will be performed using a cosmopolitan haplotype reference panel from the 1000 Genomes Project, and using Hidden Markov Model methods as implemented in MaCH and minimac [Li, 2009; Howie, 2012]. Subject and SNP exclusions will be applied prior to imputation.Directly genotyped variants that are not present in the reference panel will be converted to an imputation format and included in analysis.

Of note, for each genetic variant that was directly genotyped, called genotypes will typically not be available for a small fraction of subjects (e.g. <3% of subjects when the call rate QC threshold is 97%), and some genotypes will be called in error. Genotype imputation provides information about no-call genotypes, and may also alter a small fraction of called genotypes when likely errors can be detected using linkage disequilibrium with other genetic variants.

HLA genotype imputation will be performed using the HIBAG algorithm and published

parameter estimates [Zheng, 2014].

CONFIDENTIAL205878

23

9.10. APPENDIX 10: Genomic Control

Due to the presence of uncorrected population structure or subject relatedness, the association between genotype and endpoint may be confounded and association test statistics inflated. Under likely scenarios for population structure or subject relatedness, distribution of association chi-square statistics or standard errors genome-wide may be inflated by a constant multiplicative factor [Devlin, 1999]. The genomic control inflation factor λ is defined as the ratio of the median of the empirically observed distribution of chi-square test statistic to the expected median (0.455 for a 1 d.f. test). The value of λthus quantifies the extent of the test statistic inflation.

Due to the possibility of other causes of test statistic inflation for genetic variants that have low minor allele frequency or are poorly imputed, the genomic control inflation factor λ will be estimated using the empirical distribution of chi-square test statistics for a subset of genetic variants with minor allele frequency ≥1% and imputation efficiency ≥0.3.

Test statistics will be corrected genome-wide (including for genetic variants not used in the estimation λ), by multiplying the SE by √λ, and by dividing the LRT statistic by λ, and by calculating P-values using the (unchanged) effect size estimate along with the corrected SE and LRT statistic.

CONFIDENTIAL205878

24

9.11. APPENDIX 11: Independent Signal Identification

When more than one genetic variant is significantly associated with a given endpoint, the number of distinct association signals may be determined using physical map positions of the variants, conditional analyses, and/or linkage disequilibrium.

Significantly associated variants that are physically 200kb or closer, on the human reference sequence used for the analysis, will be recursively grouped into “associated regions”. This approach allows a potentially large number of significantly associated variants to be visualised in a smaller number of distinct regional association plots. Each region will be characterized by the number of significantly associated variants contained, the characteristics of the index variant (defined as the variant with the smallest P-value in that associated region), and summary characteristics of the significantly associated variants in region.

Conditional analyses will be conducted for multiple genetic variants, using the same model as used for analyses of single genetic variants. For each endpoint, forwards model selection may be used to obtain a fitted model in which each genetic variant is included only if it is significantly associated with the endpoint (at the given threshold), conditional on other genetic variants included in the model. All genetic variants associated with the endpoint in single variant analyses (at the given threshold) will be analysed in this way.

Genetic variants that are not significant (at a given threshold) in single variant analyses, may be significant (at the same threshold) in conditional analyses, if the co-occurring alleles have opposite directions of true effect on the endpoint. Such variants may be detected by analysis of linkage disequilibrium, which approximates conditional analyses of all variants genome-wide using both forwards and backwards model selection [Yang, 2011]. When this approach is used, the genetic variants identified will be subsequently included in an exact conditional analysis as described above.

CONFIDENTIAL205878

25

9.12. APPENDIX 12: Reporting and Interpretation

Genetic associations will be summarized by regression model effect size estimates and standard errors, adjusted for covariates. Effect size estimates and confidence interval endpoints may be transformed from the analysis scale to an alternative scale to facilitate interpretation. Associations may be displayed using an appropriate plot or table of endpoint versus genotype (such as dotplot or boxplot for continuous endpoints). Manhattan plots and Quantile-Quantile (QQ) plots may be used to visualize P-values at the whole-genome scale. Results may be annotated by whether the genetic variant was typed or imputed, and a metric for quality of imputation. Genotype or endpoint categories may be combined to generate 2x2 contingency tables where calculation of genotype test sensitivity, specificity, and positive or negative predictive values may facilitate interpretation.

All determinations of statistical significance will be subject to assessment of the sensitivity of the results to deviations from modelling assumptions.

CONFIDENTIAL205878

26

9.13. APPENDIX 13: Abbreviations & Trade Marks

9.13.1. Abbreviations

Abbreviation Description

ADME absorption, distribution, metabolism, and excretion

ANOVA analysis of variance

CI confidence interval

CKD chronic kidney disease

CRF case report form

DNA deoxyribonucleic acid

EGLN egg-laying deficiency protein nine-like protein

EPO erythropoietin

GSK GlaxoSmithKline

GWAS genome-wide association study

HDD hemodialysis-dependent

Hgb hemoglobin

HLA human leukocyte antigen

HW Hardy-Weinberg

ITT intent to treat

kb kilo base pairs

LD linkage disequilibrium

LRT likelihood ratio test

MOA mode of action

NDD non-dialysis-dependent

PCA principal component analysis

PD pharmacodynamic

PGx pharmacogenetics

PHI prolyl hydroxylase inhibitor

PK pharmacokinetic

QC quality control

QQ quantile-quantile

RAP reporting and analysis plan

rhEPO Recombinant human erythropoietin

SE standard error

SNP single nucleotide polymorphism

CONFIDENTIAL205878

27

9.13.2. Trademarks

Trademarks of the GlaxoSmithKline Group of Companies

Trademarks not owned by the GlaxoSmithKline Group of Companies

NONE None

CONFIDENTIAL205878

28

9.14. APPENDIX 14: A list of genes involved in daprodustat metabolism, disposition or mode of action

Category Gene

Directly targeted EGLN1

EGLN2

EGLN3

Erythropoiesis pathway HIF1A

EPAS1

HIF3A

HIF1AN

VHL

EPO

EPOR

Metabolism and disposition CYP2C8

ABCG2

division information type title › ctr-gsk-7381 › 205878 › 900191f4-95...effective date :...

Documents