the impact of multifactorial genetic disorders on …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19...

185
THE IMPACT OF MULTIFACTORIALGENETIC DISORDERS ON LONG-TERM INSURANCE By Pradip Tapadar Submitted for the Degree of Doctor of Philosophy at Heriot-Watt University on Completion of Research in the School of Mathematical and Computer Sciences January 2007. This copy of the thesis has been supplied on the condition that anyone who consults it is understood to recognise that the copyright rests with its author and that no quo- tation from the thesis and no information derived from it may be published without the prior written consent of the author or the university (as may be appropriate).

Upload: others

Post on 18-Apr-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

THE IMPACT OF MULTIFACTORIAL GENETIC

DISORDERS ON LONG-TERM INSURANCE

By

Pradip Tapadar

Submitted for the Degree of

Doctor of Philosophy

at Heriot-Watt University

on Completion of Research in the

School of Mathematical and Computer Sciences

January 2007.

This copy of the thesis has been supplied on the condition that anyone who consults

it is understood to recognise that the copyright rests with its author and that no quo-

tation from the thesis and no information derived from it may be published without

the prior written consent of the author or the university (as may be appropriate).

Page 2: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

I hereby declare that the work presented in this the-

sis was carried out by myself at Heriot-Watt University,

Edinburgh, except where due acknowledgement is made,

and has not been submitted for any other degree.

Pradip Tapadar (Candidate)

Professor Angus S. Macdonald (Supervisor)

Date

ii

Page 3: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Contents

Acknowledgements xiii

Abstract xv

Introduction 1

1 Genetics and Insurance 9

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2 Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.3 Genetic Disorders and Insurance . . . . . . . . . . . . . . . . . . . . . 15

1.3.1 Huntington’s Disease . . . . . . . . . . . . . . . . . . . . . . . 15

1.3.2 Alzheimer’s Disease . . . . . . . . . . . . . . . . . . . . . . . . 16

1.3.3 Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.3.4 Cardiovascular disease . . . . . . . . . . . . . . . . . . . . . . 19

1.4 Genetics and Insurance Regulations . . . . . . . . . . . . . . . . . . . 20

1.5 The UK Biobank Project . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.6 A UK Biobank Simulation Model . . . . . . . . . . . . . . . . . . . . 24

2 A Model for Heart Attack 27

2.1 Specification of the Model . . . . . . . . . . . . . . . . . . . . . . . . 27

2.2 The Heart Attack Transition Intensity . . . . . . . . . . . . . . . . . 28

2.3 Mortality After First Heart Attacks . . . . . . . . . . . . . . . . . . . 29

2.3.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.3.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.3.3 Fitting a Parametric Function . . . . . . . . . . . . . . . . . . 33

2.3.4 Discussion of the Fitted Model . . . . . . . . . . . . . . . . . 37

2.4 Mortality Before First Heart Attacks . . . . . . . . . . . . . . . . . . 40

3 Gene-Environment Interaction 47

3.1 Definition of Strata: A Simple Example . . . . . . . . . . . . . . . . . 47

3.2 A Sample Realisation of UK Biobank . . . . . . . . . . . . . . . . . . 50

3.3 Epidemiological Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.4 An Actuarial Investigation . . . . . . . . . . . . . . . . . . . . . . . . 54

3.5 Premium Rating for Critical Illness Insurance . . . . . . . . . . . . . 56

3.5.1 A Critical Illness Model . . . . . . . . . . . . . . . . . . . . . 56

3.5.2 Premium Rating for Critical Illness Insurance . . . . . . . . . 58

iii

Page 4: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

4 UK Biobank Simulation Results 61

4.1 Varying the Genetic and Environment Model . . . . . . . . . . . . . . 61

4.2 Outcomes of 1,000 Simulations: The Base Scenario . . . . . . . . . . 64

4.3 A Measure of Confidence . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

5 Adverse Selection and Utility Theory 79

5.1 Risk and Insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5.2 Underwriting Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5.3 Multifactorial Disorders . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.4 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.5 Adverse Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.6 Utility of Wealth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.7 Coefficients of Risk-aversion . . . . . . . . . . . . . . . . . . . . . . . 89

5.8 Families of Utility Functions . . . . . . . . . . . . . . . . . . . . . . . 90

5.9 Estimates of Absolute and Relative Risk-aversion . . . . . . . . . . . 92

6 Adverse Selection in a 2-state Insurance Model 95

6.1 A Simple Gene-environment Interaction Model . . . . . . . . . . . . . 95

6.2 Single Premiums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.3 Threshold Premium . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

6.4 The Additive Epidemiological Model . . . . . . . . . . . . . . . . . . 98

6.5 Immunity From Adverse Selection . . . . . . . . . . . . . . . . . . . . 100

6.6 The Multiplicative Epidemiological Model . . . . . . . . . . . . . . . 104

7 Adverse Selection in a Critical Illness Insurance Model 107

7.1 A Heart Attack Model . . . . . . . . . . . . . . . . . . . . . . . . . . 107

7.2 Threshold Premium for Critical Illness Insurance . . . . . . . . . . . 109

7.3 Premium Rates for Critical Illness Insurance . . . . . . . . . . . . . . 109

7.4 High Relative Risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

7.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

8 Conclusions 125

8.1 UK Biobank Simulation Study . . . . . . . . . . . . . . . . . . . . . . 125

8.2 Adverse Selection Issues . . . . . . . . . . . . . . . . . . . . . . . . . 128

A Epidemiology 131

A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

A.2 Measuring risks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

A.3 Models of Disease Association . . . . . . . . . . . . . . . . . . . . . . 137

A.4 Relative Risk and Odds Ratio . . . . . . . . . . . . . . . . . . . . . . 139

A.5 Analysis of Grouped Data . . . . . . . . . . . . . . . . . . . . . . . . 140

A.6 Analysis of Matched Studies . . . . . . . . . . . . . . . . . . . . . . . 142

A.7 Effects of Combined Exposures . . . . . . . . . . . . . . . . . . . . . 146

iv

Page 5: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

B Numerical Methods 151B.1 Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

B.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151B.1.2 Euler Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 152B.1.3 Runge-Kutta Method . . . . . . . . . . . . . . . . . . . . . . . 152

B.2 Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153B.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153B.2.2 Uniform Deviates . . . . . . . . . . . . . . . . . . . . . . . . . 154B.2.3 The Transformation Method . . . . . . . . . . . . . . . . . . . 155B.2.4 The Rejection Method . . . . . . . . . . . . . . . . . . . . . . 157

References 165

v

Page 6: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

vi

Page 7: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

List of Tables

2.1 Survival probabilities after first heart attack. . . . . . . . . . . . . . . 33

2.2 Parameter estimates. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

2.3 Odds of dying within first 30 days, one year and two years followinga first heart attack. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

2.4 Adjusted odds ratios and the corresponding 95% confidence intervalsof dying within first 30 days, one year and two years following a firstheart attack according to Goldberg et al. (1998). . . . . . . . . . . . . 40

3.5 The factor ρs, in Equation (3.14), for each gene-environment combi-nation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.6 The multipliers ks × ρuv for each stratum. . . . . . . . . . . . . . . . 49

3.7 The true relative risks for each stratum, relative to the baseline gestratum. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.8 The simulated life histories of the first 20 (of 500,000) individualsshowing their genders, exposure to environmental factors, genotypesand the times and types of all transitions made within 10 years. . . . 51

3.9 Number of individuals in each state at the end of the 10-year follow-upperiod. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

3.10 Odds ratios with respect to the ge stratum as baseline, based on a 1:5matching strategy using all cases and 5-year age groups. Approximate95% Confidence intervals are shown in brackets. There were no casesamong females age 45–49 in stratum GE. . . . . . . . . . . . . . . . . 53

3.11 The age-adjusted odds ratios calculated for both males and females. . 54

3.12 The estimated multipliers cs for each stratum. . . . . . . . . . . . . . 55

3.13 28-Day mortality rates, qm01(x) = 1− pm

01(x), for males following heartattacks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.14 The true critical illness insurance premiums for different strata as apercentage of those for stratum ge. . . . . . . . . . . . . . . . . . . . 59

3.15 The actuary’s estimated critical illness insurance premiums for differ-ent strata as a percentage of those for stratum ge. . . . . . . . . . . . 60

4.16 The model parameters for different scenarios. Odds ratios are alsoshown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.17 The correlation matrix of the strata-specific premium rates for malesaged 45 and policy term 15 years under the Base scenario, all casesincluded. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.18 The correlation matrix of the premium ratings for males aged 45 andpolicy term 15 years under the Base scenario, all cases included. . . . 67

vii

Page 8: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

4.19 The measure of overlap O for CI insurance premium ratings for malesaged 45, with policy term 15 years, for different scenarios. . . . . . . 69

4.20 The measure of overlap O for CI insurance premium ratings for fe-males aged 45 with policy term 15 years, for different scenarios. . . . 74

4.21 The measure of overlap O for CI insurance premium ratings for malesaged 45, with policy term 15 years, for different scenarios and a 1:1matching strategy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.22 The number of simulations rejected due to the inability to calculatethe odds ratios for a 1:1 matching strategy. . . . . . . . . . . . . . . . 76

6.23 The relative risk k above which persons in stratum ge with initialwealth W = £100, 000 will not buy insurance, using ω = 0.5 and anadditive model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6.24 The proportions ω exposed to each low-risk factor above which per-sons in the baseline stratum will buy insurance at the average pre-mium regardless of the relative risk k, using different utility functions. 102

6.25 The relative risk k above which persons in stratum ge with initialwealth W = £100, 000 will not buy insurance, using ω = 0.9 and anadditive model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.26 The relative risk k above which persons in stratum ge with initialwealth W = £100, 000 will not buy insurance, using ω = 0.9 and amultiplicative model. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

7.27 The premium rates of critical illness contracts of duration 15 years. . 1107.28 P † for males, which solves Z(P ) = 0, for different combinations of

utility functions and losses, using initial wealth W = £100,000. . . . . 1127.29 P † for females, which solves Z(P ) = 0, for different combinations of

utility functions and losses, using initial wealth W = £100,000. . . . . 1137.30 The population average premium rate for CI insurance, P0, as if heart

attack risk were absent (λ12 = 0). . . . . . . . . . . . . . . . . . . . . 1147.31 The relative risk k above which males of different ages in stratum

ge with initial wealth W = £100, 000 will not buy critical illnessinsurance policies of term 15 years, where ω = 0.9. . . . . . . . . . . . 115

7.32 The relative risk k above which females of different ages in stratumge with initial wealth W = £100, 000 will not buy critical illnessinsurance policies of term 15 years, where ω = 0.9. . . . . . . . . . . . 115

7.33 The loss L0 in £,000 above which adverse selection cannot occur.Initial wealth W = £100,000. . . . . . . . . . . . . . . . . . . . . . . 116

7.34 q, the probability that a healthy person aged x has a heart attackbefore age x+ t, for policy duration t = 15 years. . . . . . . . . . . . 119

7.35 The proportions ω exposed to each low-risk factor above which per-sons in the baseline stratum will buy insurance at the average pre-mium regardless of the relative risk k, using different utility functions,for males purchasing CI insurance. . . . . . . . . . . . . . . . . . . . 120

7.36 The proportions ω exposed to each low-risk factor above which per-sons in the baseline stratum will buy insurance at the average pre-mium regardless of the relative risk k, using different utility functions,for females purchasing CI insurance. . . . . . . . . . . . . . . . . . . . 121

A.37 List of odds ratios obtained from the 2× 4 table in Figure A.33. . . . 149

viii

Page 9: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

A.38 Other measures based on the 2× 4 table in Figure A.33. . . . . . . . 149

ix

Page 10: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

x

Page 11: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

List of Figures

2.1 A 4-state heart attack model. . . . . . . . . . . . . . . . . . . . . . . 28

2.2 The transition intensity of all first heart attacks, by gender. . . . . . 29

2.3 Subset of the model in Figure 2.1 to study survival after heart attacks. 30

2.4 The plots of the data from Table 2.1. . . . . . . . . . . . . . . . . . . 33

2.5 The plots of f(t) = 1/(1 + ta) against t for values of a = 0.25, 0.50,1.00, 2.00, 4.00. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.6 The plots of survival probabilities, P22(x, x+t), against duration afterheart attacks for age-groups <55, 55–64, 65–74, 75–84, ≥85 years. . . 36

2.7 The plots of transition intensities, λ24(x, t), against duration afterheart attacks for age-groups <55, 55–64, 65–74, 75–84, ≥85 years. . . 36

2.8 Graphs of λ24(x, t), assigned to representative ages for each age group,and the force of mortality of the ELT15 life tables. . . . . . . . . . . 37

2.9 The plots of survival probabilities of men aged 50, 60, 70, 80 and 90following ELT15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.10 The plots of survival probabilities of women aged 50, 60, 70, 80 and90 following ELT15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.11 The plots of survival probabilities, of individuals aged 50, 60, 70, 80and 90, over the first 30 days after a first heart attack. . . . . . . . . 39

2.12 The plots of survival probabilities of individuals aged 50, 60, 70, 80and 90, who survived the first 30 days after a first heart attack. . . . 39

2.13 4-state heart attack model - Grouping of states. . . . . . . . . . . . . 41

2.14 A 2-state mortality model. . . . . . . . . . . . . . . . . . . . . . . . . 42

2.15 The graph of the integrand in Equation 2.11. . . . . . . . . . . . . . . 44

2.16 The graph of the integrand in Equation 2.13. . . . . . . . . . . . . . . 45

2.17 Transition intensities of non-heart-attack deaths plotted along withELT15 for both males and females. . . . . . . . . . . . . . . . . . . . 46

3.18 A full critical illness model for gender s. . . . . . . . . . . . . . . . . 56

4.19 Scatter plots of CI insurance premium rates for strata gE, Ge andGE versus that of ge under the Base scenario for males aged 45 andpolicy term 15 years. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.20 The scatter plots of the premium ratings Ge/ge and GE/ge versusgE/ge and the corresponding density plots for males aged 45 andpolicy term 15 years under the Base scenario, all cases included. . . . 66

4.21 Marginal densities of premium ratings in the Base scenario (males)with different numbers of cases in the case-control study. . . . . . . . 71

xi

Page 12: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

4.22 The empirical cumulative distribution function of the premium rat-ings gE/ge, Ge/ge and GE/ge for males aged 45 and policy term 15years under the Base scenario. . . . . . . . . . . . . . . . . . . . . . . 72

4.23 Marginal densities of premium ratings in different scenarios (males),with 5,000 cases in the case-control study. . . . . . . . . . . . . . . . 73

5.24 Utility of wealth for a risk averse individual. . . . . . . . . . . . . . . 876.25 A two state model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 957.26 A full critical illness model. . . . . . . . . . . . . . . . . . . . . . . . 1087.27 The ratio of heart attack transition intensity to total critical illness

transition intensity, by gender. . . . . . . . . . . . . . . . . . . . . . . 110A.28 A schematic diagram of a case-control study. . . . . . . . . . . . . . . 134A.29 A 2-state model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135A.30 A 2× 2 table for stratum k with corresponding probabilities. . . . . . 139A.31 A 2× 2 table with data for stratum k. . . . . . . . . . . . . . . . . . 140A.32 The types of table for each case-control pair in a 1:1 matching. . . . . 143A.33 A 2× 4 table with data for stratum k. . . . . . . . . . . . . . . . . . 148B.34 The Exp(1) density and the majorising function with δ = 0.10. . . . . 161B.35 The Exp(1) density and the majorising function with δ = 0.01. . . . . 162B.36 The N(0,1) density and the majorising function with δ = 0.10. . . . . 162B.37 The N(0,1) density and the majorising function with δ = 0.01. . . . . 163B.38 Density estimates based on the simulated 50,000 random deviates

from Exp(1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163B.39 Density estimates based on the simulated 50,000 random deviates

from N(0, 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

xii

Page 13: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Acknowledgements

First of all, I would like to thank my supervisor, Professor Angus Macdonald, for

his continuous support, guidance and encouragement for the entire duration of this

project. However busy, he always found time to meet regularly and discuss my work.

I found his constructive criticisms and eye for technical detail absolutely invaluable

throughout the course of the study. I would also like to thank Dr Delme Pritchard

for his suggestions and advice for the first half of the thesis.

This work was carried out at the Genetics and Insurance Research Centre at

Heriot-Watt University. I would like to thank the sponsors for funding, and members

of the Steering Committee for helpful comments. It has also been a pleasure to work

with my colleagues: Lu Li, Tunde Akodu, Laura MacCalman and Tushar Chatterjee.

My parents and my sister have encouraged and inspired me to pursue knowledge

to the best of my abilities all through my life. Without their love, support and

guidance, I would not have come this far. A special thanks to Bruce Porteous, a

guide and a friend, for his enthusiastic support.

Finally, I dedicate this thesis to my wife, Vaishnavi, for her unfailing love, pa-

tience and support. At my insistence, she has had to endure reading numerous

versions of the thesis at various stages of development. She provided me uncon-

ditional support, without which this thesis would not have been a reality. Thank

you.

xiii

Page 14: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

xiv

Page 15: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Abstract

Rapid advances in genetic epidemiology and the setting up of large-scale cohort

studies, like the UK Biobank project, have shifted the focus from severe, but rare,

single gene disorders to less severe, but common, multifactorial disorders. This will

lead to the discovery of genetic risk factors for common diseases of major impor-

tance in insurance underwriting. Given this backdrop, we have two specific aims

for this thesis. In the first half of the thesis (also the subject matter of Macdonald

et al. (2006)), we analyse the impact of results emerging out of UK Biobank on the

insurance industry. In the second half (subject matter of Macdonald and Tapadar

(2006)), we consider the related adverse selection issues.

The UK Biobank project is a large-scale investigation of the combined effects of

genotype and environmental exposures on the risk of common diseases. It is intended

to recruit 500,000 subjects aged 40–69, to obtain medical histories and blood samples

from them at outset, and to follow them up for at least 10 years. This will have a

major impact on our knowledge of multifactorial genetic disorders, rather than the

rare but severe single-gene disorders that have been studied to date. The question

arises, what use may insurance companies make of this knowledge, particularly if

genetic tests can identify persons at different risk? We describe here a simulation

study of the UK Biobank project. We specify a simple hypothetical model of genetic

and environmental influences on the risk of heart attack. A single simulation of UK

Biobank consists of 500,000 life histories over 10 years; we suppose that case-control

studies are carried out to estimate age-specific odds ratios, and that an actuary uses

these odds ratios to parameterise a model of critical illness insurance. From a large

number of such simulations we obtain sampling distributions of premium rates in

different strata defined by genotype and environmental exposure. We conclude that

xv

Page 16: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

the ability of such a study reliably to discriminate between different underwriting

classes is limited, and depends on large numbers of cases being analysed.

As is the situation now in many countries, if genetic information continues to be

treated as private, adverse selection becomes possible. But it should occur only if

the individuals at lowest risk obtain lower expected utility by purchasing insurance

at the average price than by not insuring. We explore where this boundary may lie,

using a simple 2 × 2 gene-environment interaction model of epidemiological risk, in a

simplified 2-state insurance model and in a more realistic model of heart-attack risk

and critical illness insurance. Adverse selection does not appear unless purchasers

are relatively risk-seeking (compared with a plausible parameterisation) and insure

a small proportion of their wealth; or unless the elevated risks implied by genetic

information are implausibly high. In many cases adverse selection is impossible

if the low-risk stratum of the population is large enough. These observations are

strongly accentuated in the critical illness model by the presence of risks other than

heart attack, and the constraint that differential heart-attack risks must agree with

the overall population risk. We find no convincing evidence that adverse selection

is a serious insurance risk, even if information about multifactorial genetic disorders

remains private.

xvi

Page 17: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Introduction

Much of human genetics is concerned with studying the genetic contribution to

diseases, and this leads to a profound distinction between the single-gene disorders

and the multifactorial disorders.

(a) Single-gene disorders are caused, as their name suggests, by a defect in a single

gene. Because most genes are inherited in a simple way according to Mendel’s

laws, these diseases show characteristic patterns of inheritance from one gen-

eration to the next, known to geneticists and underwriters alike as a ‘family

history’. Single-gene disorders are quite rare but often severe.

(b) Multifactorial disorders are (mostly) common diseases, such as coronary heart

disease and cancers, whose onset or progression may be influenced by variations

in several genes, acting in concert with environmental differences. The effect

is likely to be quite slight, conferring an altered predisposition to the disease

rather than a radically different risk.

Most genetic epidemiology has, until now, concentrated on single-gene disorders.

One reason is that the clear patterns of Mendelian inheritance identified affected

families long before the direct examination of DNA, and location of the relevant

genes, became possible. So when these tools did emerge in the 1990s, geneticists

knew where to look; affected families were studied, genes were identified, and the

key epidemiological parameters were estimated. The parameter of most interest to

actuaries is the age-related penetrance, which is the probability that a person who

carries a risky version of the gene will have suffered onset of the disease by age x.

It is entirely analogous to the life table probability xq0. (Often, the risky versions

of the gene are called ‘mutations’, and a person carrying one is called a ‘mutation

carrier’ or just ‘carrier’.)

1

Page 18: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Studies of affected families are by definition retrospective in nature; families are

studied because they are known to be affected. Retrospective studies are subject to

uncontrolled sources of bias, precisely because they are based on a non-randomly

selected sample of the population; so they are, if possible, avoided in favour of

prospective studies, in which a properly randomised sample of healthy subjects is

followed forwards in time. Despite this health warning, retrospective studies of

single-gene disorders have been carried out for reasons of convenience, cost and ne-

cessity: the ready availability of known affected families was convenient and made

data collection relatively cheap; and the rarity of single-gene disorders made prospec-

tive studies impractical. Moreover, a prospective study would take many years to

yield results. Another consequence of the rarity of most single-gene disorders is that

most studies have had quite small sample sizes, but if the penetrance is high enough

this is tolerable. These studies have successfully led to many gene discoveries and a

lot of progress has been made in understanding a number of single-gene disorders.

Multifactorial disorders, which are influenced by more than one gene or by inter-

actions between genes and the environment, are not so well-studied. Many disorders,

including cancer, heart disease, diabetes and Alzheimer’s disease are believed to be

caused or influenced by complex interactions between multiple genes, environment

and lifestyle. The clear patterns of Mendelian inheritance are lost, and any familial

clustering of disease that may be observed could just as easily be caused by shared

environment as by shared genes. Therefore, there is no existing pool of known

affected families that can be studied straightaway. And, because the influence of

genetic variation may be slight (low penetrance) large samples will be needed to

detect such influence with any reliability.

At the risk of oversimplifying a little, single-gene disorders represent the genetical

research of the past, and multifactorial disorders represent the genetical research of

the future. Progress will need studies that are large-scale, prospective, and long-

term (and therefore very expensive). These studies must capture both genetic and

environmental variation (and interactions) and relate them to the risks of common

diseases. This is extremely ambitious.

2

Page 19: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

The proposed UK Biobank project aims to achieve this. This project will recruit

500,000 individuals aged 40 to 69, chosen as randomly as possible from the general

UK population, and collect data on them over a period of 10 years. We will discuss

the main features of UK Biobank in Section 1.5. A key point is that UK Biobank

aims only to collect data, not to analyse it. Its data will, in due course, be made

available to researchers interested in particular genes and particular diseases, who

will have to obtain separate funding for their studies in the usual way. This is

sensible because it is impossible to predict at outset just what combinations of

genes, environment and disease it will be most fruitful to study. Nevertheless, it

is necessary to have in mind the kinds of statistical studies that may, in future, be

carried out, so that UK Biobank can be set up to capture data of the correct form.

The presumption is that most studies will be case-control studies. We outline the

basics of case-control studies in Appendix A.

Given its size and significance, it is important to study the kind of results we

might expect to emerge out of UK Biobank. Our particular interest is in the impli-

cations of UK Biobank for insurance. There has been a lot of debate, often heated,

concerning genetics and insurance in the past 10 years, mainly focussed on single-

gene disorders. We refer to Daykin et al. (2003) or Macdonald (2004) as sources.

It seems plausible that awareness of genetic issues will be heightened by enrolling

500,000 people into a high-profile genetic study. Insofar as insurance questions arise,

answers obtained from past actuarial research, based on single-gene disorders, may

be wholly inapplicable. But, since the single-gene disorders provide all the easily

grasped examples and paradigms, there is a risk that these examples and paradigms

will be grafted onto UK Biobank, however inappropriately, by the media if not by

the genetics community. It will then be unfortunate that, by its nature, UK Biobank

will not provide the evidence to refute such errors for 5–10 years.

We devote the first half of the thesis to modelling UK Biobank itself, so that

before a single person has been recruited, or gene sequenced, we may quantify the

implications of its outcomes for insurance. We choose critical illness (CI) insurance

as the simplest type of coverage, because the insured event is generally onset of

disease, and we need not model post-onset events (although as we shall see in Section

3

Page 20: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

2, this is not entirely true in parameterising the model). We choose heart attack

(myocardial infarction) as the disease of interest, because this will certainly be a

major target of studies using UK Biobank data. Our approach is simple: simulate

500,000 random life histories, given an assumed model of genetic and environmental

influences on the hazard rate of heart attack. Then we may analyse these simulated

data just as an epidemiologist or an actuary may be expected to.

At this stage a further complication appears, one that is all too famil-

iar to actuarial researchers who have modelled single-gene disorders. Actu-

aries almost never have access to the original data upon which genetic stud-

ies are based. In the case of UK Biobank, Section 5.2 of the draft protocol

(www.ukbiobank.ac.uk/docs/draft protocol.pdf) says: “Data from the project

will not be accessible to the insurance industry or any other similar body.” This

means that actuarial researchers will have to rely on the published outcomes of

medical or epidemiological research projects that use the UK Biobank data. The

ideal, given the models actuaries typically use for pricing and reserving, would be

age-dependent onset rates or penetrances, corresponding to µx or qx in a life-table

study. Unfortunately, this is far in excess of what is usually published in a medical

study, because the questions addressed by such studies can often be answered by

much simpler statistics. And, it must be said, the estimation of µx or qx is very de-

manding of the data. Since we expect case-control studies to be the most common

approach to UK Biobank, we must take account of this in analysing our simulated

data. We may not, realistically, assume that the actuary can analyse directly the

500,000 simulated life histories. Instead, an epidemiologist must first carry out a

case control study and publish the results, which most often will be expressed as

odds ratios (see Appendix A). Then the actuary must take these odds ratios and,

using whatever approximate methods come to hand, estimate onset rates or pene-

trances suitable for use in an actuarial model. We will model this process, with two

results:

(a) We will be able to estimate the impact on CI insurance premiums of represen-

tative multifactorial modifiers of heart attack risk.

(b) Having simulated the data from a known model of our own choosing, we can

4

Page 21: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

assess the seriousness of the errors that must be made, in parameterising an

actuarial model from published odds ratios rather than from the raw data.

As mentioned before, previous actuarial studies have done exactly that (see

Macdonald and Pritchard (2000) for an example), but only in the context of

relatively high penetrances. We will be interested to see if robust actuarial

modelling of relatively low-penetrance disorders is possible using published case-

control studies.

The plan of the first half of the thesis is as follows. After a general introduction

in Section 1.1, we provide a basic overview of genetics in Section 1.2. Section 1.3

gives examples of a few well-known genetic disorders along with reviews of relevant

actuarial literature. The regulatory developments in the UK, concerning genetics

and insurance are covered in Section 1.4. In Section 1.5 we describe the main

features of UK Biobank. In Section 1.6, we will introduce our general approach to

simulating UK Biobank. A specific multiple-state model representing heart attack

will be introduced in Section 2.1. The transition intensities underlying the model

will be developed in Sections 2.2–2.4.

In Section 3.1, we will hypothecate a simple 2 × 2 gene-environment interaction

model affecting the risk of heart attack. In Section 3.2, we present (in summary

form) a set of simulated UK Biobank data, namely 500,000 life histories. Then,

we analyse these simulated data, in two stages as described above. First, a model

epidemiologist will carry out a case-control study (actually, we will look at several

different case-control studies that may be carried out). This is presented in Section

3.3. Then, our model actuary will use these ‘published’ figures to construct CI

insurance models allowing for genetic variability and environmental exposures. The

actuarial investigation is discussed in Section 3.4. In Section 3.5, premium rates

based on these critical illness models will be calculated and compared for these

different subgroups.

Despite its great size, UK Biobank is essentially an unrepeatable single sample.

Any estimated quantity based upon its data is subject to the usual statistical sam-

pling error — and a premium rate is just such an estimated quantity. It is to be

hoped that the large samples available from UK Biobank will reduce sampling error

5

Page 22: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

to a low level. In reality, the designers of UK Biobank can only estimate the statis-

tical power of representative case-control studies, which was certainly done before

choosing 500,000 lives as the sample size. We, however, with control over our sim-

ulated data, can assess directly the sampling properties of estimates based on UK

Biobank data. In particular, and of direct relevance to the criteria established in

the UK by the Genetics and Insurance Committee (GAIC) we can assess in statis-

tical terms the reliability of CI premium rates based on UK Biobank data. We do

this simply by repeating the simulation of 500,000 life histories as many times as

necessary, and constructing the empirical distributions of derived quantities such as

odds ratios and premiums. This is in Section 4.

The second half of the thesis addresses the issues relating to adverse selection in

the context of multifactorial disorders. Insurance companies have developed sophis-

ticated underwriting techniques to cope with the problems of adverse selection. The

principle behind underwriting is to identify key risk factors that stratify applicants

into reasonably homogeneous groups, for each of which the appropriate premium

rate can be charged. The risk of death or ill health is affected by, among other

things, age, gender, lifestyle and genotype. However, the use of certain risk factors

is sometimes controversial. In particular, this is true of factors over which individu-

als have no control, such as genotype. As a result, in many countries a ban has been

imposed, or moratorium agreed, limiting the use of genetic information. In the UK,

GAIC is providing guidance to insurers on the acceptable use of genetic test results.

As discussed earlier, disorders caused by mutations in single genes, which may

be severe and of late onset, but are rare, have been quite extensively studied in

the insurance literature. One reason is that the epidemiology of these disorders is

relatively advanced, because biological cause and effect could be traced relatively

easily. The conclusion has been that single-gene disorders, because of their rarity, do

not expose insurers to serious adverse selection in large enough markets. However,

this conclusion need not be valid for multifactorial disorders. The vast majority

of the genetic contribution to human disease, will arise from combinations of gene

varieties (called ‘alleles’) and environmental factors, each of which might be quite

common, and each alone of small influence but together exerting a measurable effect

6

Page 23: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

on the molecular mechanism of a disease. Although the epidemiology of multifacto-

rial disorders is not very advanced, this should make progress in the next 5–10 years

through the very large prospective studies now beginning in several countries, like

the UK Biobank project. If these studies are successful in capturing both genetic

and environmental variations and interactions, and relate them to the risks of com-

mon diseases, the genetics and insurance debate will, in the fairly near future, shift

from single-gene to multifactorial disorders.

Any model used to study adverse selection risk must incorporate the behaviour

of the market participants. Most of those applied to single-gene disorders in the

past did so in a very simple and exaggerated way, assuming that the risk implied by

an adverse genetic test result was so great that its recipient would quickly buy life

or health insurance with very high probability. These assumptions were not based

on any quantified economic rationale, but since they led to minimal changes in the

price of insurance this probably did not matter. The same is not true if we try

to model multifactorial disorders. Then ‘adverse’ genotypes may imply relatively

modest excess risk but may be reasonably common, so the decision to buy insurance

is more central to the outcome.

Most research on adverse selection concentrates primarily on providing a proper

economic rationale for the impact, on the insurance market, of genetic tests for,

mainly, rare diseases. In this thesis, we try to bring together plausible quantitative

models for the epidemiology and the economic issues, in respect of more common

disorders, therefore affecting a much larger proportion of the insurer’s customer

base. We wish to find out under what circumstances adverse selection is likely to

occur.

The plan of the second half of the thesis is as follows. In Sections 5.1–5.3, we

provide background information on risk, insurance and underwriting. In Section

5.4, we review existing literature. Adverse selection in the context of multifactorial

disorder is defined in Section 5.5. A basic introduction to utility theory and estimates

of risk-aversion is discussed in Sections 5.6–5.9.

In Chapter 6, we develop techniques to determine the conditions leading to ad-

verse selection for a 2×2 gene-environment interaction in a simple 2-state insur-

7

Page 24: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

ance model. We study the impact of additive and multiplicative impacts of gene-

environment interactions in Sections 6.4 and 6.6 respectively. The role played by

population proportion in each risk category is studied in Section 6.5.

In Chapter 7, we extend the results from the 2-state model to a CI insurance

model. We propose a simple model of a multifactorial disorder, with two genotypes

and two levels of environmental exposure, and either additive or multiplicative inter-

actions between them. These factors affect the risk of myocardial infarction (heart

attack), therefore the theoretical price of CI insurance. The situation here is slightly

different from the 2-state insurance model, in that there are risks, other than heart

attack, which affect CI insurance. Conclusions and suggestions for further work are

in Chapter 8.

We have also provided two appendices at the end for background information.

Appendix A gives a brief overview of epidemiology and Appendix B provides intro-

ductions to some relevant numerical methods.

8

Page 25: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Chapter 1

Genetics and Insurance

1.1 Introduction

With the discovery of genes, we are closer than ever before to a clearer understanding

of our biological roots; our place in the history of evolution. As it turns out, the

essence of life is embedded in the genes. Genes contain all the information necessary

to create a life form out of a single cell. They are the units of heredity passed

down from one generation to the next. They shape our physical characteristics and

behavioural patterns. In short, genes are key to life, the reasons for our existence.

But they cannot work alone, environment plays an active role too. It is increasingly

becoming obvious that it is the interplay between genes and the environment that

shapes what we are.

Genes thrive on diversity. We, human beings, are all distinct from one another

and not just mere clones, thus proving the existence of wide variations even within a

single species. But diversity also brings with it its own complications. For example,

although all variations of the same gene are supposed to perform the same function,

they all do it in slightly different ways. Inevitably, this leads to differences in their

performance. In particular, underperformance can produce unwanted side-effects in

the form of genetic disorders.

This has practical implications in all spheres of human life. Here we are inter-

ested in the impact of genetic disorders on the insurance industry. Insurance in its

basic form is a simple principle of cooperation, where each individual in a group

9

Page 26: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

contributes a small amount towards a common fund, which can be used to support

the few who suffer losses; a small price to pay for guaranteed support in times of

misfortune. However, even in this basic set-up, it is clear that insurance cannot be

provided to all at the same price. There will always be heterogeneity in risk profiles,

where a few individuals will be exposed to a greater risk of loss compared to the

rest. Then it will be unfair on the low-risk individuals to ask them to subsidise

the high-risk group. So, charging risk-based insurance premiums seems a sound

alternative.

The reasoning appears perfectly logical when smokers are charged a higher life

insurance premium. It can be argued that individuals who smoke choose to do so

of their own free will, fully understanding the related health hazards. However, the

same logic becomes untenable when applied to an individual who has inherited a

“faulty” gene from his parents. It might be obvious that he faces a greater risk, but

is it fair to penalise him for his own misfortune?

The answer is far from being straight-forward. Apart from ethical and moral

issues, there are economic and political angles to it as well. Governments and

insurance regulators might find it difficult to let market economics takes its own

course. But if they intervened, the outcome may not be entirely beneficial, as it will

ultimately be the general public who will pay for any market inefficiency.

Given this backdrop, our aim in this thesis will be to analyse the impact of gene-

environment interactions on insurance from the perspective of both insurers and

consumers. We ask, for what types of gene-environment interactions:

(a) Can an insurer justify charging different premiums for different groups?

(b) Does an insurer face the risk of adverse selection?

But, before tackling these questions, we will provide some background information

in the remainder of this chapter. In Section 1.2, we provide an overview of genetics.

In Section 1.3, we provide examples of a few well-known genetic disorders. In Section

1.4, we will give a brief history of how regulations on genetics and insurance have

been shaped in the UK. A brief description of UK Biobank project is in Section 1.5.

An outline of our proposed model of UK Biobank, to analyse the results that might

come out of the project, is given in Section 1.6.

10

Page 27: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

1.2 Genes

“ ..., as the earth and ocean were probably peopled with vegetable productions

long before the existence of animals; and many families of these animals long

before other families of them, shall we conjecture that one and the same kind

of living filaments is and has been the cause of all organic life?”

This is a bold conjecture by Erasmus Darwin, in his book Zoonomia (Darwin

(1794)), more than 60 years before his famous grandson Charles Darwin produced

his epic, On the Origin of Species (Darwin (1859)). The conjecture by Erasmus

Darwin has turned out to be fantastically close to what is reality. But to arrive there

we have to start with the theory proposed by his grandson Charles. Charles Darwin

coined the term “natural selection” which he used to mean that each individual

has to struggle to survive where resources are limited. Individuals with the “best”

characteristics will be more likely to survive and those desirable traits will be passed

down through generations and will eventually be dominant in the population over

time.

When Charles Darwin proposed his theory of natural selection, it was at odds

with the existing model of blending inheritance, which predicted that an offspring

is an average of its parents. This would mean that an offspring of a tall parent and

a short parent will be of medium height, who will then pass on the trait of medium

height to the next generation and so on. So the tall and short traits will be lost

in future generations, and this contradicted the theory of natural selection which

required accumulation of desirable traits.

At around the same time, Gregor Johann Mendel was conducting his revolution-

ary experiments on pea plants. He noticed that if he crossed two pure contrasting

traits, the next generation hybrids showed only one trait, the dominant one. And

if he crossed only hybrids, the recessive trait re-appeared in 25% of cases. Mendel

realised that offsprings inherit a pair of traits, one from each parent, of which the

dominant trait is expressed. This was a profound observation, which Mendel pub-

lished in Mendel (1866). Unfortunately, his work remained largely unnoticed for

more than three decades before it was re-discovered in 1900.

Following re-discovery, Mendel’s laws of inheritance and Darwin’s natural se-

11

Page 28: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

lection were hotly debated among the scientific community. While Darwinism de-

manded variety, Mendelism offered stability instead. The marriage between the

two theories happened only when Joseph Muller discovered mutation by subjecting

fruit-flies to X-rays. Once the conflict was resolved, scientists started wondering how

inherited traits are passed between generations. The breakthrough was finally made

by Watson and Crick (1953), who discovered the molecular structure of nucleic acids

and unravelled the role of deoxyribonucleic acid (DNA) in heredity.

In the rest of this section, we will provide a very brief introduction to molecular

genetics. This is not meant to be a comprehensive review of the subject but only

an overview of the fundamental concepts. For detailed discussions, Lewin (2000),

Pasternak (1999), Strachan and Read (1999) and Sudbery (1998) are standard text-

books on human genetics. For a popular exposition, please refer to Ridley (1999).

Unless specific references are provided, all material in this section, and the next, are

obtained from the above-mentioned sources.

All living creatures are made up of cells. The cell is the structural and functional

unit of all living beings and is sometimes called the building block of life. Some

organisms, like bacteria, are unicellular, while other complex life-forms, such as

human beings, are multicellular. A human body has an estimated 100 trillion cells.

Cells are made up of a number of subcellular components. Except red blood cells,

all cells in a human body contain a membrane-enclosed organelle called the nucleus.

Other subcellular components, like ribosomes, remain suspended outside the nucleus

in a jelly-like material called cytoplasm.

Leaving out the red blood cells along with the egg and sperm cells, each human

cell nucleus contains 23 pairs of filaments called chromosomes. As mentioned above

red blood cells do not have nuclei. The egg cells and the sperm cells contain only

one of each pair of chromosomes, i.e., they have 23 chromosomes instead of 23 pairs.

An offspring is produced by fertilisation of an egg cell by a sperm cell, whereby all

chromosomes become paired again.

Inside a chromosome there exists a paired molecule called DNA, with two long

strands of sugar and phosphate running parallel to each other. Embedded on each

strand is a sequence of nucleotides or bases, which come in four varieties – adenine

12

Page 29: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(A), cytosine (C), guanine (G) and thymine (T). The two strands of a DNA molecule

are structured in such a way that if nucleotide A is positioned in a particular location

of a strand, the opposite strand will have nucleotide T at the same location. Similarly

for C and G. Now, using the property that, A has great affinity for T while C likes

to pair with G, the nucleotides on opposite strands form bonds between them and

are called base-pairs. This produces the well-known structure of a double helix,

where the two strands of DNA stay intertwined with each other. Note that, as the

sequence of nucleotides in one strand is a complementary copy of the other, the

whole double-stranded sequence is described by the sequence of only one, chosen by

convention.

The sequence of nucleotides in DNA contains vital information on how to synthe-

sise different types of proteins necessary for the existence of living creatures. Almost

everything in a human body is made of protein or made by them. So an efficient

mechanism for protein synthesis is critical for survival. On one of the two strands of

a DNA molecule, each sequence of three consecutive nucleotides, e.g. ACT, CAG,

TTT, is called a codon. Except for a few codons (which are used as stop signals),

all codons correspond to particular amino acids, which are the building blocks of

any protein. There are 64 possible codons, whereas there are only 20 amino acids.

So there are multiple codons which refer to the same amino acid.

There are large stretches of DNA which do not contain any useful information;

only a small fraction of the complete DNA sequence appears to encode proteins. A

gene is a region of DNA that contains the code for synthesising a particular protein.

Even within a gene there are sections of meaningless information called introns, in

between sections of actual code, called exons.

When a cell needs to manufacture a particular protein, appropriate signals are

generated to identify the gene containing the recipe for the protein in question.

Then a complementary copy of that section of DNA is made to form a new single

stranded molecule called messenger ribonucleic acid (mRNA). This process is called

transcription. mRNA is very similar to a single strand of a DNA, except that the

nucleotide T in DNA is replaced by the nucleotide uracil (U) is mRNA. After tran-

scription, mRNA is stripped of its introns and the exons are spliced together to form

13

Page 30: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

a seamless code. The edited mRNA then moves out of the nucleus and approaches

a ribosome. Ribosomes translate the information contained in the mRNA into a

sequence of amino acids, which then folds up into a distinctive shape (depending

on the sequence) to form a protein. This is how a cell uses the code in DNA to

manufacture a protein it needs.

DNA can also replicate to produce two identical copies of itself. The technique

is similar to the one used for mRNA transcription. However, instead of working

only on a section of a particular strand, replication works on both strands of DNA

simultaneously. At first, the bonds between the base-pairs are broken to separate

the complimentary strands. Simultaneously, two new strands are constructed with

appropriate nucleotides to form two identical double-stranded DNA. This technique

is used to pass on genetic information from cell to cell (mitosis) and from generation

to generation (meiosis).

The discussion above depicts an idealised scenario. In reality, there are a number

of places where things can go wrong. For example, in the replication stage, one

nucleotide might get replaced by another by mistake. This can be critical if this

happens in the coding region of DNA. Unless the changed codon corresponds to the

same amino acid, the gene will not be able to synthesise the correct protein. This

can be disastrous depending on the function of the protein. Similar problems will

arise if one or more nucleotides are deleted from or inserted in the DNA sequence.

Any change to a DNA sequence is termed mutation.

Although the consequences can be catastrophic, not all mutations are deleterious.

In fact, multiple variations of the same gene is quite common. These are called

alleles. The variations between alleles explain simple differences, like hair colours.

However, for a particular gene, one allele might produce a slightly different version of

a protein from the other alleles. This might turn out to be slightly better or worse at

performing a specific function. One might ask: why aren’t inefficient alleles getting

purged by natural selection? One answer might be that these alleles might be better

at doing other things. We have to wait until we fully understand the implications

of all interactions between different genes and the environment to truly appreciate

all the nuances of human genetics.

14

Page 31: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Let us now look at a few well-known genetic disorders.

1.3 Genetic Disorders and Insurance

1.3.1 Huntington’s Disease

Huntington’s disease (HD) or Huntington’s chorea is a rare neurological disorder. It

got its name from physician George Huntington who studied the disorder in detail

in his paper in 1872. HD can strike at an age less than twenty and the early

symptoms include a slight deterioration of the intellectual faculties. Gradually,

physical symptoms appear in the form of jerky, uncontrollable, random movements,

collectively known as chorea. Patients also exhibit slowing of thought process, speech

impairment and inability to learn new skills. They descend into deep depression,

with occasional hallucinations and delusions.

The disorder has been traced to a particular gene in chromosome 4. As is the case

for many genes, this gene also has a large number of alleles. The alleles differ from

each other in the number of occurrences of a single codon CAG in the middle of the

gene. The number of CAG repeats can vary from six to over a hundred depending

on the allele. Individuals with 35 or fewer CAG repeats are safe from HD. For genes

with more than 35 copies of CAG, the DNA replication process becomes unstable

and the number of repeats can increase in successive generations. Because of the

progressive increase in repeat lengths, the disorder tends to increase in severity as

it passes from one generation to the next, and to trigger earlier onsets. Also, the

disorder is a dominant trait, so even a single affected allele from a parent is enough

to trigger HD. For individuals with 39 CAG repeats, there is a 90% probability of

first symptoms appearing before age 75. However with 50 CAG repeats, onset of

HD, on average, is at age 27. The disorder is incurable and takes 15-25 years to run

its full course.

The codon CAG corresponds to the amino acid glutamine. It is a necessary

ingredient for the production of a protein called huntingtin. However more than 39

CAG repeats produce a mutated form of the protein, which gradually accumulate

in neurone cells. This continuous aggregation causes the cells to die off in selected

15

Page 32: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

regions of the brain and trigger HD.

Even before the actual discovery of the gene responsible for HD, it was obvious

that the disorder was hereditary in nature. Insurance companies offering health

insurance, like CI insurance, used family histories as an underwriting tool to protect

themselves from adverse selection. With the better understanding of the genetics

behind HD, insurance companies will be interested to find out if their underwriting

techniques could be improved further. This has been studied in detail in Gutierrez

and Macdonald (2004).

The authors first estimated the age-dependant rates of onset of HD for males and

females with different CAG repeats. They had to take into account the severity of

the symptoms that would lead to a successful CI insurance claim. Then the authors

calculated the net level CI premium rates for both sexes with 36-50 CAG repeats.

They found that insurance companies, following standard underwriting guidelines,

will be unable to insure individuals with very long CAG repeats. This is particularly

true for younger individuals and longer policy durations. For comparison purposes,

the authors have also calculated premiums based on family history alone.

The authors then investigated the cost of adverse selection in case of a moratorium

on the use of genetic test results and also possibly family history. They found that

moratoria on genetic test results can lead to an increase of premiums of about 0.1%,

while including family history in the moratoria will increase premiums by 0.35%.

The whole exercise was repeated for a life insurance model. Although the results

show a discernible increase in the risk of mortality with increase in CAG repeats, the

impact is less severe than that in the context of CI insurance. The cost of adverse

selection arising from a moratorium on the use of genetic tests for HD was found to

be negligible for life insurance.

1.3.2 Alzheimer’s Disease

Alzheimer’s Disease (AD) got its name from a German psychiatrist Dr Alois

Alzheimer. In 1901, he interviewed a patient, Mrs Auguste D, who showed signs

of dementia, a medical term for progressive decline in cognitive functions affecting

memory, language and problem solving. The patient died in 1906, and Dr Alzheimer

16

Page 33: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

along with his colleagues examined her anatomy and neuropathology. He found de-

posits of plaques on the outside of the neurones and severance of the connections

between the neurones. These have been identified as classical pathological signs of

AD.

AD is a disorder of old age, rarely affecting people less than 60 years old. The

early symptoms include short-term memory loss with a tendency to become less en-

ergetic or spontaneous. With the progression of the disease, patients start forgetting

well-known skills or objects or persons. At a later stage, the patients find it difficult

to perform the simplest of tasks and require constant supervision.

The Apolipoprotein E (ApoE) gene on chromosome 19 has been identified as a

risk factor for development of AD. The ApoE gene has three alleles ε2, ε3 and ε4,

found in the general population in the proportions 0.09, 0.77 and 0.14 respectively.

Individuals with ε4 allele in their gene have a greater chance of developing AD; more

so if they have two ε4 alleles. In contrast, ε2 allele appears to have a protective effect

against AD.

The difference between the alleles is that at two locations, two A nucleotides in

ε4 are replaced by two Cs in ε2. ε3 is intermediate. As these alleles produce slightly

different proteins, the protein derived from ε4 allele appear to aid in the formation

of plaques in the neurones. Although the actual biochemical process is not well

understood, there is significant statistical evidence of a correlation.

Patients with AD can survive up to 15 years after the first symptoms are noticed.

This is of significant importance to the long-term care insurance market. Macdonald

and Pritchard (2000) and Macdonald and Pritchard (2001) are studies on the impact

of AD on long-term care insurance.

Macdonald and Pritchard (2000) proposed a multiple-state model for AD and

went on to estimate the transition intensities for different possible genotypes of

ApoE. Macdonald and Pritchard (2001) applied the model to calculate long-term

care insurance premiums. The authors found that insurers, if allowed to use ApoE

test results, would probably charge ratings of +25% and +50% for individuals with

one and two ε4 alleles in their genes, respectively. The authors also estimated the

cost of adverse selection if a moratorium is in place on using genetic test results.

17

Page 34: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

They found that the cost will not exceed 5% of premiums and can probably be

ignored.

1.3.3 Cancer

The two genetic disorders, discussed so far – HD and AD, are commonly known

as single-gene disorders. For each, there is a strong link between the disorder and

mutations in a particular gene. However, it is important to remember that with

advances in genetical research, it is quite possible that links with other genes and

environmental factors will come to light in future. HD and AD are unusual in a

sense that most common disorders are much more complex in nature and arise out

of interactions between a number of genes along with environmental factors. Cancer

is one such common multifactorial genetic disorder.

As we saw in the discussion of genes, all cells contain the necessary information to

replicate themselves. However, unorderly cell replication can lead to cell prolifera-

tion and ultimate production of malignant tumours. There is a complex mechanism

in place to protect against such an eventuality. Most notably, the tumour suppres-

sor genes or anti-oncogens identify any irregularities and produce a dampening or

repressive effect on the cell division cycle. If such repairs prove futile, the genes pro-

mote apoptosis, a kind of programmed cell death. Most tumour suppressor genes

can function even with one functional allele, i.e. both alleles of these genes must

be mutated before a tumour suppression fails. In this section, we will consider two

such tumour suppressor genes – BRCA1 and BRCA2.

The BRCA1 gene is located on chromosome 17 and codes a protein which regu-

lates the cycle of cell division and inhibits uncontrolled growth of cells, in particular,

those that line the milk ducts in the breast. A large number of alleles of the BRCA1

gene have been identified, many of which are associated with an increased risk of

breast cancer. The BRCA2 gene, based on chromosome 13, has a function simi-

lar to the BRCA1 gene. Again a number of alleles of the BRCA2 gene have been

linked to increased risk of breast cancer. There are also studies which have linked

BRCA1 and BRCA2 genes with ovarian cancer. It is important to note here that

only about 5 to 10% of breast cancers are due to mutations in BRCA1 and BRCA2

18

Page 35: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

genes, suggesting that most cases are sporadic in nature.

Macdonald et al. (2003a) studies the genetics of breast and ovarian cancer from

the perspective of a life and health insurance underwriter, who can only have access

to family histories (often incomplete) of prospective consumers. The authors devel-

oped a multiple-state model and estimated the transition intensities from UK pop-

ulation data. Using the model, they computed conditional probabilities of women

being BRCA1 and BRCA2 mutation carriers (individuals with alleles which possess

greater risk of breast and ovarian cancer) given the family history. The authors

found that these probabilities are very sensitive to the estimates of mutation fre-

quencies and penetrances. They concluded that it may not be appropriate to apply

risk estimates based on studies of high risk families to other groups.

Macdonald et al. (2003b) applied the model to CI insurance. The authors found

that if insurance underwriters had access to genetic test results, most BRCA1 and

BRCA2 mutation carriers will be uninsurable. On the other hand, if underwriting

is based on family history alone, only a few cases will exceed the usual underwriting

limits. If insurers were unable to use genetic test results or family history information

for underwriting, adverse selection was found to be significant in a small CI insurance

market, in case of high penetrances or if higher sums assured could be obtained.

1.3.4 Cardiovascular disease

In breast and ovarian cancer, the two genes involved accounted for a small number

of cases. Cancer can also be caused by mutations due to environmental factors,

like exposure to harmful radiations. So, we have gradually shifted our focus from

simple, but rare, single-gene disorders to complex, but relatively common, multi-

factorial disorders. In this section, we will discuss one more common disorder —

cardiovascular disease.

Cardiovascular disease is a class of disease that involves the heart and blood

vessels. In one common form, fatty deposits (plaques) in the blood vessels make

them narrow and restrict blood flow. The plaques can sometime rupture forming

blood clots that obstruct the artery and stop blood flow to the heart muscles. This

is commonly known as myocardial infarction or heart attack. A number of risk

19

Page 36: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

factors have been identified for cardiovascular disease. Large-scale studies have

found evidence that tobacco smoking can significantly increase an individual’s risk

of heart attack. For an example of such a study, see Woodward (1999). Among

other risk factors, hypercholesterolemia, or elevated cholesterol levels in the blood

stream has been directly linked to heart attacks.

To understand hypercholesterolemia, we have to return to the ApoE gene on

chromosome 19. The function of the protein coded by the gene is to facilitate

transfer of fat and cholesterol from very low density lipoprotein (VLDL), which

carries fat and cholesterol from the liver to the cells that need them. If there is a

malfunction, much fat and cholesterol remains in the blood stream and form plaques

on the walls of arteries, which can ultimately lead to heart attacks.

The efficiency with which the ApoE gene carries out its function depends on

its alleles. It has been found that individuals with two ε4 alleles or two ε2 alleles

are at a heightened risk of cardiovascular disease compared to those who have at

least one copy of the ε3 allele. Of course, a low cholesterol diet can reduce the risk

considerably. So again we can see that external intervention plays an important role

on the efficient functioning of genes.

Clearly, cardiovascular disease is a multifactorial disorder. The risk not only de-

pends on genetic factors (alleles of ApoE gene), but also environmental interactions

(smoking habits, dietary control etc.). We will study heart attack in much greater

detail in later chapters. In Chapter 2, we will develop a multiple-state model for

heart attack and estimate the transition intensities. In Chapter 3, we will show how

we can hypothecate a 2×2 gene-environment interaction based on this model. All

our subsequent analysis will be based on that model.

1.4 Genetics and Insurance Regulations

Insurance companies set premium rates based on the assumption that they have

access to all information relevant to the risk involved. If consumers can withhold

any information from an insurance company, there is a risk that the company will

face adverse selection. This is the basic principle behind underwriting insurance

20

Page 37: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

risks. However, this is not the only consideration behind underwriting classifications.

There might be competitive pressure to charge different premium levels to different

groups of consumers. One such example is charging higher life insurance premiums

to smokers. It is unlikely that smokers, while purchasing insurance, will take into

account the adverse health effects of smoking and over-insure themselves to select

against an insurer. But once an insurer decides to charge differential premiums,

other insurers will have to follow suit, as then charging the average premium will

expose them to attracting only high risk consumers. For an in-depth discussion on

this topic, please refer to Macdonald (2004).

However, underwriting based on genetic test results is very different from the

smoker/non-smoker example. One’s own genes are a very private matter and dis-

crimination based on such information has both moral and social implications. At

the same time, the possibility of adverse selection cannot be ruled out altogether.

Given this dilemma, governments and insurance regulators in different countries

have adopted different approaches to deal with the issue. Sweden, for example,

does not allow the use of genetic test results or family histories for underwriting.

Developments in the UK have been particularly interesting, as the scientific basis

for underwriting has come under fierce scrutiny. We will briefly recount the main

milestones in this section. For a more detailed discussion, please refer to Macdonald

(2003).

In 1997, the Human Genetics Advisory Committee (HGAC) asked the UK Gov-

ernment to impose a moratorium on the use of all genetic test results for insurance

underwriting purposes. The Government, instead, set up the Genetics and Insurance

Committee (GAIC), in 1999, to scrutinise the use of genetic tests in underwriting

on a case-by-case basis. In 2000, GAIC approved the use of genetic test results for

HD for life insurance contracts over £500,000. GAIC made it clear that insurance

companies could not ask individuals to undergo genetic tests for HD. Only if indi-

viduals have already been tested, can insurance companies ask for access to that

information. GAIC noted that it would actually enhance the access to insurance for

individuals with normal test results, but with family history of HD.

In the meantime, HGAC and other advisory bodies were merged to form the

21

Page 38: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Human Genetics Commission (HGC), which was particularly critical of the role

of GAIC. The Association of British Insurers (ABI), who were representing the

majority of UK insurers, also came in for some criticism. ABI advised its member

insurers that they could continue to use genetic test results unless their use had

been rejected by GAIC. Few agreed with this interpretation. The ABI then agreed

to restrict the use of test results to those that GAIC had approved.

In 2001, after more intense debate on the topic, the ABI withdrew all the ap-

plications it had made to GAIC (in respect of HD, breast and ovarian cancer and

AD) and agreed on a five year moratorium on the use of genetic test results. Under

the terms of the moratorium, customers will not be required to disclose the results

of predictive genetic tests for policies up to £500,000 for life insurance, £300,000

for health insurance and paying annual benefits of £30,000 for income protection

insurance. In 2005, the original moratorium was extended for five more years and

will be valid until 1 November, 2011.

The current Concordant and Moratorium on Genetics and Insurance which came

into effect from 14 March, 2005, mentions that GAIC will continue to liaise with the

clinical genetics community, patient groups and experts in insurance and actuarial

science and monitor new developments relevant to genetics and insurance. In the

meantime, the UK Biobank project has been launched to analyse the impact of gene-

environment interaction on common multifactorial disorders. With rapid advances

in genetics, aided by such large-scale population studies, it is likely that new facts

and evidence will come to light with regularity. In particular, it is important to

analyse the kind of results that might come out of UK Biobank and its implications

for the insurance industry.

1.5 The UK Biobank Project

The website http://www.ukbiobank.ac.uk/ is the main source of information on

UK Biobank. In particular, it provides a draft protocol. There (Section 1.2) it is

stated that:

“The main aim of the study is to collect data to enable the investigation of the

22

Page 39: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

separate and combined effects of genetic and environmental factors (including

lifestyle, physiological and environmental exposures) on the risk of common

multifactorial disorders of adult life.”

UK Biobank is a cohort study, meaning that a large number of people will be

recruited, as randomly as possible, and then followed over time. The main features

of the study design are as follows:

(a) The cohort will consist of at least 500,000 men and women recruited from the

UK general population.

(b) The chosen age range is 40 to 69 (note that earlier versions, including the draft

protocol referred to above, proposed an age range 45 to 69).

(c) The initial follow-up period is 10 years.

(d) Participants will be recruited through their local general practitioners. Partici-

pants are expected to come from a broad range of socio-economic backgrounds

and regions throughout the UK, with a wide range of exposures to factors of

interest.

(e) The project will be conducted through the UK National Health Service.

(f) UK Biobank is funded by the Department of Health, the Medical Research

Council, the Scottish Executive and The Wellcome Trust. The cost is approxi-

mately £40 million.

People registered with participating general practices will be requested to join

the study by completing a self-administered questionnaire, attending an interview,

undergoing examination by a research nurse and giving a blood sample, to enable

DNA extraction at a later date. The protocol assumes that DNA extraction would

be deferred and done as and when genotyping is required.

The Office of National Statistics will provide routine follow-up data regarding

cause-specific mortality and cancer incidence. Hospitalisation and general practice

records will provide data regarding incident morbidity. Every two years a subset

of 2,000 participants and every five years the entire cohort will be re-surveyed by

postal questionnaire to update exposure data and to ascertain self-reported incident

morbidity.

23

Page 40: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

It is envisaged that the main study design for assessing the combined effect of

environment and genotype will consist of a series of case-control studies (see Ap-

pendix A) nested within the cohort. Options for the selection of controls include

an individually matched design or a panel of controls selected at random from the

cohort, probably weighted by age and sex. An important principle underlying the

design of the study and the statistical methods that will be applied is to minimise the

assumptions made about the underlying nature of the relationship between genetic

and environmental factors and the risk of disease.

As a comprehensive prospective study with biological samples, UK Biobank is

expected to contribute substantially to international knowledge regarding the com-

bined effects of genotype and exposure on the risk of disease. Its design means

that the study will provide a structure and resources for future research, and will

enable researchers to address current and unforeseen scientific questions. While UK

Biobank will collect and store the data, any analysis of the data in the future will

require further funding.

1.6 A UK Biobank Simulation Model

In this section we will outline how we plan to to simulate the UK Biobank project.

We suppose that the study population is subdivided (or stratified) into sub-

groups with respect to: (a) different genotypes; (b) different levels of environmental

exposures; and (c) other relevant factors such as sex. Genotype defines discrete cat-

egories, and we suppose that any environmental exposures or other factors defined

on a continuous scale are grouped into discrete categories. Thus, we always have a

small number of discrete subgroups (or strata).

The life history of each participant will be represented by a multiple-state model,

with states and transitions defining onset and possibly progression of the disease of

interest. Some of the model parameters, namely the transition intensities, will be

different in different strata — most obviously those associated with the disease of

interest. These intensities are the key to the whole UK Biobank project, as well as

our study.

24

Page 41: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(a) The real-life epidemiologist wants to estimate them (or in practice, odds ra-

tios) from UK Biobank data, given a hypothesis about the effect of measured

exposures on the disease.

(b) The real-life actuary wants to take the estimated intensities (or in practice,

approximate them from published odds ratios) and use them in pricing and

reserving.

(c) We wish to specify hypothetical but plausible dependencies of these intensities,

on genotype and other exposures, so that we can observe our model epidemiol-

ogist and model actuary at work.

The steps in simulating UK Biobank are then as follows.

(a) We choose the number of genotypes and the number of levels of environmental

exposure, and also the frequencies with which each appears in the population.

Thus we can model simple or complex genotypes and exposures, and allow them

to be more or less common or rare. These define the subgroups or strata. The

simplest example (used in the UK Biobank draft protocol) is to have two geno-

types and two levels of environmental exposure. We also choose the intensities

of onset of heart attack in each stratum to reflect the strength of the association

between stratum and the risk of heart attack.

(b) We randomly ‘create’ 500,000 individuals, each equally likely to be male or

female, and with ages uniformly distributed in the range 40 to 69, and allocated

to strata at random according to the chosen frequencies.

(c) The life history of each individual is modelled by simulating the times of any

transitions between states in the model, as governed by the intensities. We

record the times of any transitions taking place within the 10-year follow-up

period of UK Biobank.

We implicitly assume that the 500,000 participants are independent in the sta-

tistical sense, which is unlikely to be true. The sample is so large that some related

individuals are likely to be recruited by chance, but also the method of recruit-

ment (through general practices) guarantees some level of familial and geographical

clustering.

25

Page 42: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

26

Page 43: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Chapter 2

A Model for Heart Attack

2.1 Specification of the Model

Heart attack, cancer and stroke are the three major illnesses generally covered under

a critical illness (CI) insurance contract. Other minor CIs, sometimes included in

the list, are:

(a) coronary artery bypass,

(b) major organ transplant,

(c) chronic kidney failure,

(d) multiple sclerosis, and

(e) total permanent disability.

Our main focus in this thesis will be on heart attacks. The objective is to build a sim-

ple but comprehensive model for heart attacks, which can then be used to represent

hypothetical, but plausible, multifactorial gene-environment interactions. We can

then subsequently analyse the impact of multifactorial disorders on CI insurance.

Hazards of heart attacks have been widely studied by a number of research pro-

grammes. The interested parties include clinical researchers, pharmaceutical indus-

tries, epidemiologists and also actuaries. However, as remits of these papers are

very different, it is difficult to develop a complete model of heart attacks from any

one of these reports. For example, Gutierrez and Macdonald (2003) gives transition

intensities or hazard rates of an individual suffering a heart attack. The authors

were investigating CI insurance and the subject of interest was the incidence of dif-

27

Page 44: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

-

? ?

1 = Healthy

3 = Dead

2 = Heart Attack

4 = Dead

λ12(x)

λ13(x) λ24(x, t)

Figure 2.1: A 4-state heart attack model.

ferent CIs. So, naturally, their analysis did not include post heart attack survivals.

On the other hand Capewell et al. (2000) investigates only survival after a heart

attack. In this chapter, our aim is to bring together all these results and develop a

multiple-state model, which will enable us to track individuals from their birth to

any incidence of heart attack and follow them up until they die.

We propose a simple 4-state heart attack model given in Figure 2.1. All individu-

als are assumed to start in State 1, the Healthy state. From there, they may have a

heart attack and move to State 2, or die and move to State 3. As our ultimate goal

is to apply the model for CI insurance, we are interested in first heart attacks only,

because this will trigger a claim under a CI policy, so any subsequent heart attacks

are ignored. The only possible transition from the Heart Attack state is death. It

is convenient to distinguish deaths occurring after a heart attack, so States 3 and 4

are separate.

A basic introduction to multiple state models and transition intensities is given

in Appendix A. Please refer to Woodward (1999) and Breslow and Day (1980) for

a detailed discussion.

2.2 The Heart Attack Transition Intensity

Once we have formed the structure of the model, we now move on to parameterise

the transition intensities. First we specify the heart attack transition intensity in

the general population, denoted λ12(x), separately for males and females. Gutierrez

and Macdonald (2003) fitted parametric functions to the transition intensities of all

28

Page 45: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.005

0.01

0.015

0.02

0 10 20 30 40 50 60 70 80

Tra

nsi

tion

Inte

nsi

ty

Age (years)

MaleFemale

Figure 2.2: The transition intensity of all first heart attacks, by gender.

major critical illnesses, including heart attacks. The authors used numbers of first-

ever cases of heart attacks between September 1991 and August 1992, taken from

McCormick et al. (1995). The exact exposed to risk is calculated and a parametric

function is fitted to it.

For males, it is given by:

λ12(x) =

exp(−13.2238 + 0.152568x) if x ≤ 44

x− 44

49− 44× λ12(49) +

49− x

49− 44× λ12(44) if 44 < x < 49

− 0.01245109 + 0.000315605x if x ≥ 49

(2.1)

and for females, it is given by:

λ12(x) =0.598694

Γ(15.6412)× 0.1531715.6412 exp(−0.15317x)x14.6412. (2.2)

These intensities are shown in Figure 2.2.

2.3 Mortality After First Heart Attacks

We will now focus on what happens after an individual has experienced his or her

first heart attack. Figure 2.3 shows the part of the full 4-state model (Figure 2.1),

29

Page 46: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

-2 = Heart Attack 4 = Deadλ24(x, t)

Figure 2.3: Subset of the model in Figure 2.1 to study survival after heart attacks.

we are interested in.

Here, the individuals who have had their first heart attack, start off from State

2. We then observe these individuals until their death, at which point they move on

to State 4. We are interested in the transition intensity from State 2 to State 4.

In Section 2.3.1, we will review a number of published articles on survival after

first heart attacks. In Section 2.3.2, we will identify a study which we believe to be

the most appropriate for our model. In Section 2.3.3, we will propose a parametric

form for λ24(x, t). And finally in Section 2.3.4, we will provide a discussion on the

fitted model and validate our model against other relevant data available in the

scientific literature.

2.3.1 Literature Review

There are a number of articles in published journals which study prognosis following

heart attacks. The articles vary widely in their scope and focus. There are articles

like Tunstall-Pedoe et al. (1999), which is an outstanding example of a population-

based study, but concentrates only on short-term survival after heart attacks. As

our interest lies in both short and long-term prognosis, we will review articles which

observe the study subjects over longer periods of time.

Capewell et al. (2000) describes a retrospective cohort study involving 117,718

patients admitted to hospital with heart attacks in Scotland between 1986 and 1995.

This is one of the largest population-based investigations which deals with both

short and long-term prognosis following a first heart attack. The study classifies

individuals according to:

(a) age groups <55, 55–64, 65–74, 75–84, ≥85,

(b) gender,

(c) deprivation categories and

30

Page 47: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(d) co-morbidity.

Case-fatality rates are aggregated for each of these groups. So it is not possible

to model the transition intensity in terms of all these variables. However, we are

only interested in modelling post heart attack mortality in terms of age and gender.

The case fatality rates appear to be higher for women. The authors have confirmed

that this apparent high case fatality rate is due to the fact that the average age of

women in the study was significantly higher than that of men.

From the published case fatality rates based on age-groups, it is clear that the

rates depend on:

(a) the age at first heart attack; and

(b) the duration of survival after suffering first heart attack.

So we will model λ12(x, t) as a function of x and t, where x is the age at first heart

attack and t is the survival duration post-first heart attack.

Goldberg et al. (1998) conducted a similar population-based investigation on pa-

tients admitted in all acute care hospitals in the Worcester, Massachusetts metropoli-

tan area (1990 census population of 437,000) between 1975 and 1995. A total of

8,070 patients were studied in the investigation.

The study classified individuals according to the study periods 1975–78, 1981–

84, 1986–88, 1990–91 and 1993–95, and uses the same age-groups as Capewell et al.

(2000). The results published include:

(a) odds of dying during hospitalisation, and after 1 year and 2 years following

hospital discharge for all age-groups as compared to patients < 55 years;

(b) trends in the odds of dying during hospitalisation, and after 1 year and 2 years

following hospital discharge for each age-group;

(c) 1-year and 2-year death rates of hospital survivors by age-group; and

(d) a graph of long-term survival rates among hospital survivors by age-group.

Brønnum-Hansen et al. (2001) studied patients registered during 1982–91 in 11

municipalities in the western part of Copenhagen County, Denmark. During the

study period, the average size of the population was 202,000 and a total of 3,926

first heart attacks were registered. The patients were classified according to gender,

31

Page 48: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

two age-groups (30–59 and 60–74) and three study periods (1982–84, 1985–87 and

1988–91). The published figures include:

(a) a table of fatal and non-fatal heart attack cases for each age-group and gender

covering the full duration;

(b) a table of standardised mortality ratios (quotient of observed to expected num-

ber of deaths) and excess death rates (observed minus expected number of

deaths per 1,000 person-years) by age-group and gender; and

(c) separate graphs of short-term (≤28 days) and long-term (28 days to 15 years)

survival probabilities for men and women.

The authors point out that according to their findings the age-adjusted case-

fatality rates after a first heart attack do not differ between the sexes. This agrees

with the findings of Capewell et al. (2000).

Among these articles, Capewell et al. (2000) appears most relevant on three

counts. Firstly, the study population is Scottish which provides most relevant data

appropriate for modelling heart attacks in the UK. Secondly, it has the largest study

population providing substantial credibility to the figures published. Thirdly, the

figures published in this article are presented in a suitable format and can be readily

used for parameterising λ24(x, t).

Most of the data in Goldberg et al. (1998) and Brønnum-Hansen et al. (2001)

are presented in the form of graphs, odds ratios, standardised mortality ratios and

excess death rates. Although results in these formats are not suitable for directly

parameterising transition intensities, they can still be used as an independent check

of λ24(x, t), which we will finally propose.

2.3.2 Data

As mentioned in the previous section, Capewell et al. (2000) provides case-fatality

rates for different age-groups for durations 30 days, 1 year, 5 and 10 years following

first heart attacks. We will represent the five age-groups <55, 55–64, 65–74, 75–84,

≥85 by single representative ages, namely, 50, 60, 70, 80 and 90 respectively. For our

calculations, we will transform the case-fatality rates into survival probabilities, by

subtracting the case-fatality rates from 1. The survival probabilities thus calculated

32

Page 49: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 2.1: Survival probabilities after first heart attack.

Age Representative Duration after first heart attackRange Age 0 days 30 days 1 year 5 years 10 years<55 50 1.000 0.949 0.921 0.834 0.737

55–64 60 1.000 0.880 0.827 0.672 0.52865–74 70 1.000 0.771 0.677 0.465 0.31275–84 80 1.000 0.641 0.499 0.255 0.133≥85 90 1.000 0.545 0.351 0.123 0.052

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

Surv

ival

Pro

bab

ility

Duration (years) after first heart attack

P22(50, 50 + t)P22(60, 60 + t)P22(70, 70 + t)P22(80, 80 + t)P22(90, 90 + t)

Figure 2.4: The plots of the data from Table 2.1.

are given in Table 2.1.

2.3.3 Fitting a Parametric Function

Based on the data, we will first parameterise the survival function following a first

heart attack. Let Pij(y, z) denote the conditional probability that a person is in

State j at age z given that he or she was in state i at age y. Table 2.1 gives the

survival probabilities P22(x, x + t) for specific values of x and t, where x denotes

the age at first heart attack and t denotes the survival duration after the first heart

attack.

To get an initial idea of the shape of the functions we are dealing with, we plot

33

Page 50: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7 8 9 10

f(t)

t

a = 0.25a = 0.50a = 1.00a = 2.00a = 4.00

Figure 2.5: The plots of f(t) = 1/(1+ ta) against t for values of a = 0.25, 0.50, 1.00,2.00, 4.00.

the data of Table 2.1 in Figure 2.4. For ease of comparison, the data-points for each

age-group are connected by straight lines.

As an initial guess of a suitable functional form, consider functions of the form

fa(t) = 1/(1 + ta). Figure 2.5 shows fa(t) for a = 0.25, 0.50, 1.00, 2.00 and 4.00.

Note that for all values of a, fa(0) = 1, fa(1) = 0.5 and fa(+∞) = 0. The smaller

the value of a, the steeper is the initial descent of fa(t), but flatter is the descent

later.

A quick glance at Figure 2.4 reveals that we require P22(x, x + t) for the older

ages to have both the initial and later descents steeper than that of the younger

ages. It is apparent that a better fit can be achieved by combining the properties of

fa(t) for both high and low values of a. So we propose an enhanced version of fa(t)

for parameterising P22(x, x+ t) as follows:

P22(x, x+ t) =1

1 + ax × tbx + cx × tdx, (2.3)

where, without loss of generality, we assume 0 < bx < 1 and dx > 1. Note that ax

and cx are scaling parameters.

Clearly by definition, P22(x, x) = 1. For each representative age, we have four

34

Page 51: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 2.2: Parameter estimates.

Age RepresentativeRange Age a b c d

<55 50 0.0684 0.1040 0.0174 1.191955–64 60 0.1686 0.0911 0.0406 1.228065–74 70 0.4001 0.1237 0.0770 1.337075–84 80 0.8564 0.1732 0.1476 1.5504≥85 90 1.5181 0.2431 0.3309 1.6727

data-points (Table 2.1) and four parameters (ax, bx, cx and dx) to estimate. Solving

these equations, we obtain the values of ax, bx, cx and dx, given in Table 2.2.

Given the parametric form of P22(x, x+ t), the transition intensities λ24(x, t) can

be derived using:

λ24(x, t) = − d

dtlogP22(x, x+ t). (2.4)

For the derivation of the above expressions and the underlying assumptions, please

refer to Appendix A. Hence:

λ24(x, t) =ax × bx × tbx−1 + cx × dx × tdx−1

1 + ax × tbx + cx × tdx. (2.5)

Using the parameters from Table 2.2, the graphs of P22(x, x + t) and λ24(x, t)

are provided in Figures 2.6 and 2.7, respectively. The graphs of λ24(x, t) for x =

50, 60, 70, 80 and 90 and the transition intensity for both genders from ELT15 are

given in Figure 2.8.

From the graphs, we observe that both P22(x, x + t) and λ24(x, t) differ signifi-

cantly between age-groups. To extend the definition of the transition intensity to all

ages x and durations 0 ≤ t ≤ 10, we first assign λ24(x, t) for each age-group to its rep-

resentative age. Then define λ24(x, t) = λ24(50, t) for x < 50, λ24(x, t) = λ24(90, t)

for x > 90, and interpolate linearly in x between the given values for 50 < x < 90.

Capewell et al. (2000) do not give survival rates more than 10 years after the first

heart attack. For survival rates after more than 10 years, to ensure that the force of

mortality does not drop below general population mortality, we take the maximum

of λ24(x, t) defined above and the general population mortality given by ELT15.

35

Page 52: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

Surv

ival

Pro

bab

ility

Duration (years) after first heart attack

P22(50, 50 + t)P22(60, 60 + t)P22(70, 70 + t)P22(80, 80 + t)P22(90, 90 + t)

Figure 2.6: The plots of survival probabilities, P22(x, x + t), against duration afterheart attacks for age-groups <55, 55–64, 65–74, 75–84, ≥85 years.

0.1

1

10

0 1 2 3 4 5 6 7 8 9 10

Tra

nsi

tion

Inte

nsi

ty

Duration (years) after first heart attack

λ24(50, t)λ24(60, t)λ24(70, t)λ24(80, t)λ24(90, t)

Figure 2.7: The plots of transition intensities, λ24(x, t), against duration after heartattacks for age-groups <55, 55–64, 65–74, 75–84, ≥85 years.

36

Page 53: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

1

0.1

0.01

0.001

0.00010 10 20 30 40 50 60 70 80 90 100

Tra

nsi

tion

Inte

nsi

ty

Age (years)

ELT15 MaleELT15 Female

λ24(50, t)λ24(60, t)λ24(70, t)λ24(80, t)λ24(90, t)

Figure 2.8: Graphs of λ24(x, t), assigned to representative ages for each age group,and the force of mortality of the ELT15 life tables.

2.3.4 Discussion of the Fitted Model

First, let us compare the survival probabilities of the fitted model with that of the

general population. The survival probabilities of men and women aged 50, 60, 70, 80

and 90 following ELT15 are shown in Figures 2.9 and 2.10. These can now be

compared with the P22(x, x+ t) given in Figure 2.6.

For all ages, P22(x, x+ t) are lower for all durations as compared to those derived

from ELT15. However, the slope of P22(x, x + t) is significantly lower than that of

ELT15 for longer durations. This seems to suggest that survival for a long duration

after a first heart attack implies better overall health as compared to the general

population.

We have also plotted P22(x, x + t) over the first 30 days following a first heart

attack in Figure 2.11. This can be compared with Fig 1 of Brønnum-Hansen et al.

(2001), which gives the graphs of survival probabilities for men and women combined

for all ages over three different time periods. Although not directly comparable, the

graphs show similar features.

Figure 2.12 shows the survival probabilities for hospital survivors calculated from

the P22(x, x + t). Again we find that these graphs show similar features when

37

Page 54: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

Surv

ival

Pro

bab

ility

Duration (years)

5060708090

Figure 2.9: The plots of survival probabilities of men aged 50, 60, 70, 80 and 90following ELT15.

0

0.2

0.4

0.6

0.8

1

0 2 4 6 8 10

Surv

ival

Pro

bab

ility

Duration (years)

5060708090

Figure 2.10: The plots of survival probabilities of women aged 50, 60, 70, 80 and 90following ELT15.

38

Page 55: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.2

0.4

0.6

0.8

1

0 0.02 0.04 0.06 0.08

Surv

ival

Pro

bab

ility

Duration (years)

5060708090

Figure 2.11: The plots of survival probabilities, of individuals aged 50, 60, 70, 80and 90, over the first 30 days after a first heart attack.

0

0.2

0.4

0.6

0.8

1

109876543211 month

Surv

ival

Pro

bab

ility

Duration (years)

5060708090

Figure 2.12: The plots of survival probabilities of individuals aged 50, 60, 70, 80and 90, who survived the first 30 days after a first heart attack.

39

Page 56: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 2.3: Odds of dying within first 30 days, one year and two years following afirst heart attack.

DurationAge(years)30 days 1 year 2 years

<55 1.00 1.00 1.0055–64 2.35 2.19 2.1265–74 4.49 4.09 3.8075–84 7.04 6.34 5.73≥85 8.92 8.21 7.28

Table 2.4: Adjusted odds ratios and the corresponding 95% confidence intervals ofdying within first 30 days, one year and two years following a first heart attackaccording to Goldberg et al. (1998).

DurationAge(years)30 days 1 year 2 years

<55 1.00 (–) 1.00 (–) 1.00 (–)55–64 1.87 (1.30, 2.68) 1.78 (1.27, 2.51) 1.65 (1.25, 2.18)65–74 4.00 (2.86, 5.60) 3.00 (2.18, 4.13) 2.83 (2.18, 3.68)75–84 7.77 (5.55, 10.88) 4.55 (3.28, 6.30) 5.30 (4.05, 6.93)≥85 11.67 (8.10, 16.81) 8.76 (6.12, 12.54) 10.57 (7.75, 14.42)

compared with Fig 3 of Brønnum-Hansen et al. (2001) and Figure 2 of Goldberg

et al. (1998).

Finally, we calculate the odds of dying within the first 30 days, 1 year and 2 years

following a first heart attack. The numbers are given in Table 2.3. Most of these

fall within the 95% confidence intervals given in Tables II and IV of Goldberg et al.

(1998). For reference the relevant numbers are reproduced in Table 2.4.

Based on the discussion above, the proposed model appears to be consistent with

other relevant data relating to survival after first heart attack.

2.4 Mortality Before First Heart Attacks

Going back to the heart attack model proposed in Section 2.1, we have already

parameterised λ12(x) and λ24(x, t). In this section, we will complete the model by

40

Page 57: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

-

? ?

1 = Healthy

3 = Dead

2 = Heart Attack

4 = Dead

λ12(x)

λ13(x) λ24(x, t)

Figure 2.13: 4-state heart attack model - Grouping of states.

parameterising λ13(x). This is the force of mortality affecting individuals who have

not had a heart attack.

To parameterise λ13(x), we make use of the mortality transition intensity affecting

all individuals in the general UK population. Mortality of the general UK population

is well studied and is analysed separately for males and females, and the latest

intensities are given in ELT15. To make use of ELT15 for our investigation, we need

to make the following observations.

The 4-state heart attack model introduced in Section 2.1 is reproduced in Figure

2.13. Note that individuals are alive in States 1 and 2; and they move to States

3 and 4 when they die. The grouping shown in Figure 2.13 using dashed lines,

produces a simple 2-state mortality model, given in Figure 2.14. Here, States 1 and

2 are combined to produce State 5, the Alive state, while States 3 and 4 are grouped

to form State 6, the Dead state. The resulting transition intensity from State 5

to State 6, λ56(x), is the force of mortality for the general population as given by

ELT15 for respective genders.

Recalling that the notation Pij(y, z) denotes the conditional probability that a

person is in State j at age z given that he or she was in state i at age y, the

probability of an individual dying before attaining age x in the 2-state mortality

model can be expressed as:

P56(0, x) = 1− P55(0, x) = 1− exp

(−

∫ x

0

λ56(s)ds

). (2.6)

Note that we can numerically compute the probability P56(0, x) for all ages x, as

the transition intensity λ56(x) is known and given by ELT15.

41

Page 58: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

5 = Alive

6 = Dead

?

λ56(x)

Figure 2.14: A 2-state mortality model.

Going back to our original 4-state heart attack model, we can express the same

probability of dying, in terms of the transition intensities pertinent to that model.

We will assume that all individuals belong to State 1 when they are born. Note that,

according to the definitions of the states in the heart attack model, all individuals

are born healthy, as individuals who are alive and not have suffered a heart attack

are classified as healthy. So in the 4-state heart attack model, the probability of

person dying before attaining age x is given by:

P56(0, x) = P13(0, x) + P14(0, x)

=

∫ x

0

[P11(0, z)λ13(z) +

∫ z

0

P11(0, y)λ12(y)P22(y, z)λ24(y, z − y)dy]dz,

(2.7)

where

P11(0, z) = exp[−

∫ z

0

(λ12(y) + λ13(y)) dy], and (2.8)

P22(y, z) = exp[−

∫ z−y

0

λ24(y, s)ds]. (2.9)

We see that λ13(x) is the only unknown variable above. So now we can solve

for λ13(x) numerically using the above equation. The iterative algorithm to solve

λ13(x) is outlined below.

(a) For a given age x, let us assume that λ13(y) is known for all y < x. Based on

this information, we will now solve for λ13(x).

42

Page 59: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(b) Set an initial guess for the value of λ13(x). The better the initial guess, the

faster will be the convergence to the solution. We have used simple linear

extrapolation based on the values of λ13(x− δ) and λ13(x−2δ) for a small value

of δ > 0.

(c) The approximate value of λ13(x) can then be used to calculate P11(0, x). We

can now calculate P13(0, x)+P14(0, x) using Equation 2.7, assuming that λ13(x)

and P11(0, x) are known quantities. P56(0, x) can be computed independently

and compared with the value of P13(0, x) + P14(0, x) thus obtained. Depending

on the magnitude and sign of the difference between these quantities, we can

refine our initial estimate of λ13(x). Repeat this step with improved estimates

λ13(x) until convergence is achieved.

(d) The above process can be used to calculate λ13(x) for different ages progressively,

starting from age 0. As a starting value, we have assumed λ13(0) = λ56(0).

In the above steps, we have to compute a number of integrals numerically, for

which we have used Romberg Integration. For a detailed discussion on Romberg

Integration see Press et al. (2002). The integration involving λ23(x, t) in Equation

2.7 requires special treatment. The section of the integral we are interested in is

given below:

I =

∫ z

0

P11(0, y)λ12(y)P22(y, z)λ24(y, z − y)dy. (2.10)

For convenience, we make a transformation u = z − y, which gives us the following

integral:

I =

∫ z

0

P11(0, z − u)λ12(z − u)P22(z − u, z)λ24(z − u, u)du. (2.11)

Recall from Section 2.3.3, that for all values of x and t ≤ 10, λ24(x, t) is of the form

λ24(x, t) =ax × bx × tbx−1 + cx × dx × tdx−1

1 + ax × tbx + cx × tdx, (2.12)

where 0 < bx < 1 and dx > 1. This implies that limt→0+ λ24(x, t) = ∞. Also the

smaller the value of bx, the steeper is the initial descent of λ24(x, t). Convergence

is difficult to achieve for numerical integration of an unbounded function. If the

43

Page 60: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.0002

0.0004

0.0006

0.0008

0.001

0 10 20 30 40 50 60 70 80

Inte

gran

d

u

Figure 2.15: The graph of the integrand in Equation 2.11.

integral exists, it is easier to deal with a transformed integrand which is bounded

within the required range. For the type of function given in Equation 2.12 we can use

a transformation of the form w = uα, where α < bx for all x. For our computations,

we have chosen α = 0.05.

Using this transformation, Equation 2.11 becomes

I =

∫ zα

0

P11(0, z−w 1α )λ12(z−w 1

α )P22(z−w 1α , z)λ24(z−w 1

α , w1α )

1

αw

1α−1dw. (2.13)

We show the effect of this transformation in Figures 2.15 and 2.16. Figure 2.15

gives the graph of the integrand in Equation 2.11 before the transformation and Fig-

ure 2.16 shows the graph of the integrand in Equation 2.13 after the transformation.

For both graphs, z has been set to 80.

From the figures we can see that the transformation has successfully converted

the unbounded function in Figure 2.15 to the bounded function in Figure 2.16. Now

we can successfully apply Romberg Integration to evaluate the transformed integral

in Equation 2.13.

Using the techniques outlined above, we have obtained estimates of λ13(x) for

both males and females. They are given in the Figures 2.17. For comparison, we

have also included the gender-specific forces of mortality given in ELT15.

44

Page 61: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.02

0.04

0.06

0.08

0.1

0 0.2 0.4 0.6 0.8 1 1.2

Tra

nsf

orm

edIn

tegr

and

w

Figure 2.16: The graph of the integrand in Equation 2.13.

45

Page 62: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

1

0.1

0.01

0.001

0.00010 10 20 30 40 50 60 70 80

Tra

nsi

tion

Inte

nsi

ty

Age (years)

ELT15 - MaleNon-heart-attack deaths - Male

1

0.1

0.01

0.001

0.00010 10 20 30 40 50 60 70 80

Tra

nsi

tion

Inte

nsi

ty

Age (years)

ELT15 - FemaleNon-heart-attack deaths - Female

Figure 2.17: Transition intensities of non-heart-attack deaths plotted along withELT15 for both males and females.

46

Page 63: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Chapter 3

Gene-Environment Interaction

3.1 Definition of Strata: A Simple Example

The parameters of the heart attack model estimated above are supposed to apply

to the general population. However, the general population is divided into strata

according to genotype, environmental exposures and other factors, and we now

suppose that the intensity of heart attack λs12(x) depends on the stratum s.

In this chapter, we will introduce the simplest possible gene-environment inter-

actions into our model. We suppose that there is a single genetic locus with two

alleles, denoted G and g, therefore just two genotypes. Also, there are just two

levels of environmental exposures, denoted E and e (a simple example might be E

= ‘smoker’ and e = ‘non-smoker’). This simple model can be used as a stepping

stone to study higher dimensional multifactorial models. Note that the UK Biobank

draft protocol used the same assumptions in its examples, despite the fact that the

project aims to study complex multifactorial disorders. We will suppose that G and

E are adverse exposures, while g and e are beneficial. Therefore, we have four strata

for each sex — ge, gE,Ge and GE — and eight in total.

We must choose plausible values for the frequencies with which each stratum is

present in the population, and the stratum-specific heart attack intensities. Since,

unlike the study of single-gene disorders, we are considering common risk factors

for common diseases, let us assume that the probability that a person possesses

genotype G is 0.1, and the probability that a person has environmental exposure E

47

Page 64: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 3.5: The factor ρs, in Equation (3.14), for each gene-environment combination.

G g

E 1.3 0.9e 1.1 0.7

is also 0.1. Assuming independence, the four strata (for each sex) ge, gE,Ge and

GE occur with frequencies 0.81, 0.09, 0.09 and 0.01 respectively.

We will suppose that the heart attack intensity in each stratum is proportional

to the population average intensity. For stratum s, set:

λs12(x) = k × ρs × λ12(x) , (3.14)

where λ12(x) is the population intensity given in Section 2.2 and k×ρs is the constant

of proportionality for each stratum. We suppose, for clarity, that ρs does not depend

on sex, but the constant k does. Again, noting that our interest is in genotypes of

modest penetrance, we choose the values of ρs given in Table 3.5. Then, we choose k

so that the strata-specific heart attack intensities are consistent, in aggregate, with

the population heart attack intensities, for males and females separately. Let the

proportion of the healthy population in stratum s at age x be ws(x). Then:

λ12(x+ t) =

∑sws(x)× exp

(− ∫ t

0λs

12(x+ y)dy)× λs

12(x+ t)

∑sws(x)× exp

(− ∫ t

0λs

12(x+ y)dy) . (3.15)

Substituting Equation (3.14) in Equation (3.15), we get:

λ12(x+ t) =

∑sws(x)× exp

(− ∫ t

0λ12(x+ y)dy

)kρs

k × ρs × λ12(x+ t)

∑sws(x)× exp

(− ∫ t

0λ12(x+ y)dy

)kρs. (3.16)

From Equation (3.16) we see that k ought to depend on a specific choice of age

x and duration t. However, to keep the model simple we will assume that k is

constant and calculate it from Equation (3.16) for a representative choice of age and

duration. Given that the UK Biobank protocol proposes an age range of 40 to 69 and

48

Page 65: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 3.6: The multipliers ks × ρuv for each stratum.

Stratum ge gE Ge GE

Male 0.922 1.186 1.449 1.712Female 0.921 1.185 1.448 1.711

Table 3.7: The true relative risks for each stratum, relative to the baseline ge stra-tum.

Stratum ge gE Ge GE

Male 1.000 1.286 1.571 1.857Female 1.000 1.286 1.571 1.857

a follow-up period of 10 years, we have chosen x = 60 and t = 5. If we assume that

the weights ws(x) are equal to the population frequencies of each stratum, then for

males k = 1.317274 and for females k = 1.316406. The constants of proportionality

(k × ρs) in Equation (3.14) are given in Table 3.6 for future reference.

Having formulated a relationship between strata and the risk of heart attack, we

now consider the quantities likely to be estimated by epidemiologists. We have the

advantage of being able to compute their true values, because we know the true

intensities. From now on, we define the baseline population as the most common

stratum, namely the gene-environment combination ge.

(a) The relative risk in stratum s, with respect to the baseline stratum ge, is denoted

rs and is:

rs =k × ρs

k × ρge

=ρs

ρge

. (3.17)

The values of rs are given in Table 3.7.

(b) The odds ratio at age x in stratum s, with respect to the baseline stratum ge,

based on 1-year probabilities, is denoted ψs(x) and is given by:

ψs(x) =

(P s

12(x, x+ 1)

1− P s12(x, x+ 1)

)/(P ge

12 (x, x+ 1)

1− P ge12 (x, x+ 1)

)(3.18)

where P s12(x, x + 1) is the conditional probability that a person in stratum s

who was healthy at age x will suffer a heart attack before age x+ 1.

49

Page 66: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

We have verified (not shown here) that the odds ratios computed using Equa-

tion (3.18) do not vary significantly with age and are approximately equal to the

corresponding relative risks. The latter is not surprising, as we have used 1-year

probabilities to calculate the odds ratios.

For details on relative risks and odds ratios, see Appendix A, or Woodward (1999)

or Breslow and Day (1980).

3.2 A Sample Realisation of UK Biobank

With the parameterised model, we simulated the life histories of 500,000 people

recruited to UK Biobank and followed up for 10 years. The life histories of the first

20 people are shown in Table 3.8. Consider person No.2 in Table 3.8. He is a male

with the adverse allele G and is exposed to the beneficial environment e. He entered

the study in State 1 as a healthy individual at age 58.74. During the follow-up

period he had a heart attack at age 63.89 and moved to State 2. Finally, he died at

age 63.94 and moved to State 4. The numbers of people in each state at the end of

the 10-year follow-up period are given in Table 3.9.

3.3 Epidemiological Analysis

With 500,000 simulated life histories, we can now carry out one or more typical

epidemiological analyses. Apart from the life histories, the following information is

available to the epidemiologist:

(a) the framework of the UK Biobank project;

(b) the structure of the 4-state Heart Attack model given in Section 2.1;

(c) the transition intensities given in Sections 2.2–2.4;

(d) the stratum to which each person is allocated; and

(e) the proportion ws(x) of individuals in each stratum at a particular age x, say

60.

The UK Biobank protocol suggests that the combined effect of environment and

genotype be analysed using matched case-control studies nested within the cohort.

50

Page 67: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 3.8: The simulated life histories of the first 20 (of 500,000) individuals showingtheir genders, exposure to environmental factors, genotypes and the times and typesof all transitions made within 10 years.

ID Sex E/e G/g Age State Age State Age State1 M e g 41.10 12 M e G 58.74 1 63.89 2 63.94 43 M e g 52.27 14 M e g 68.39 15 F e G 60.94 1 63.81 26 M e g 62.49 1 68.18 37 M e g 55.50 1 61.57 38 F e G 58.95 19 M e g 65.67 1 69.58 310 M e g 49.79 111 F E g 45.43 112 F e g 57.58 113 F e g 59.68 114 F E g 55.14 115 F e g 42.93 116 M e g 56.23 117 F e g 62.84 118 M e g 62.29 119 F e g 43.69 120 M e g 45.16 1

Table 3.9: Number of individuals in each state at the end of the 10-year follow-upperiod.

Sex G/g E/e State 1 State 2 State 3 State 4 Total

G E 1,871 126 356 115 2,468G e 17,579 928 3,219 934 22,660Maleg E 17,588 775 3,236 702 22,301g e 162,474 5,426 29,610 5,002 202,512

G E 2,178 70 214 52 2,514G e 19,746 397 2,021 408 22,572Femaleg E 19,811 367 2,095 330 22,603g e 178,718 2,320 18,891 2,441 202,370

Total 419,965 10,409 59,642 9,984 500,000

51

Page 68: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

In a case-control study, the first step is to define the cases and controls. Here, clearly,

the cases are persons who had first heart attacks during the study period.

In real studies, epidemiologists will face problems such as missing data and cost

constraints, and in most circumstances they will use only a subset of all cases for

their analysis. Here, we have no such problems, unless we choose to model them.

So, in the first instance, we will include all cases in the analysis. Later, we will

consider the more realistic possibility that a subset of all cases is used.

An appropriate matching strategy is particularly important for a matched case-

control study. Firstly, we match controls with cases by age. Suppose, for example,

that we are comparing stratum s with the baseline stratum ge, and that a case

entered the study at age x last birthday and had a heart attack at age x + t last

birthday. A matched control is a person chosen randomly from persons in these two

strata who also entered the study at age x last birthday and remained healthy at

least until age x+ t+ 1 last birthday. Once chosen as a control, that person cannot

be chosen as a control again. As controls are plentiful compared with cases, we will

match 5 controls to each case, called a 1:5 matching strategy. In Section 1.5, we

mentioned that the genotyping of individuals will be done as and when it is required.

So, it might be necessary to genotype a large number of people to ensure that enough

controls are available for a 1:5 case-control study. Other matching strategies with

fewer controls per case will obviously be cheaper to implement.

To calculate odds ratios, we need to group ages sensibly. Note that epidemi-

ological studies often use quite wide age groups, much wider than actuaries are

accustomed to using. We will use 5-year age bands as a reasonable compromise be-

tween accuracy and sample size. Note that the definition of the ages of the controls

needs to be adjusted appropriately to maintain consistency. The results are given

in Table 3.10. We can see no particular trend with respect to age, so we calculate

the age-adjusted odds ratio for each stratum (a weighted average of the age-specific

odds ratios, see the Mantel-Haenszel method described in Appendix A or Woodward

(1999)), shown in Table 3.11.

We can compare the estimated age-adjusted odds ratios with the true odds ratios

given in Table 3.7. The estimates are better for strata gE and Ge where the numbers

52

Page 69: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 3.10: Odds ratios with respect to the ge stratum as baseline, based on a1:5 matching strategy using all cases and 5-year age groups. Approximate 95%Confidence intervals are shown in brackets. There were no cases among females age45–49 in stratum GE.

MalesAge gE Ge GE

40–44 1.043 (0.527,2.065) 2.628 (1.561,4.423) 2.375 (0.712,7.917)45–49 1.069 (0.816,1.400) 1.670 (1.317,2.118) 1.929 (0.940,3.959)50–54 1.330 (1.117,1.583) 1.578 (1.336,1.865) 1.725 (1.121,2.654)55–59 1.358 (1.168,1.579) 1.665 (1.448,1.914) 2.133 (1.486,3.062)60–64 1.175 (1.020,1.352) 1.708 (1.507,1.935) 1.976 (1.417,2.753)65–69 1.267 (1.116,1.438) 1.592 (1.416,1.789) 1.721 (1.251,2.368)70–74 1.362 (1.179,1.574) 1.542 (1.348,1.764) 1.907 (1.334,2.726)75–79 1.487 (1.160,1.907) 1.534 (1.187,1.983) 1.667 (0.910,3.052)

FemalesAge gE Ge GE

40–44 1.167 (0.301,4.520) 1.333 (0.463,3.836) 5.000 (0.313,79.942)45–49 0.944 (0.523,1.702) 1.869 (1.139,3.067) –50–54 0.947 (0.659,1.361) 1.298 (0.929,1.814) 4.167 (1.800,9.644)55–59 1.243 (0.967,1.597) 1.280 (0.999,1.641) 2.324 (1.282,4.211)60–64 1.634 (1.343,1.988) 1.867 (1.538,2.267) 1.842 (1.112,3.053)65–69 1.321 (1.111,1.571) 1.601 (1.359,1.887) 2.457 (1.637,3.689)70–74 1.257 (1.045,1.511) 1.538 (1.296,1.825) 2.354 (1.528,3.626)75–79 1.203 (0.893,1.620) 1.220 (0.896,1.659) 1.773 (0.788,3.986)

53

Page 70: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 3.11: The age-adjusted odds ratios calculated for both males and females.

Strata gE Ge GEMale 1.285 (1.209,1.365) 1.625 (1.536,1.719) 1.880 (1.620,2.182)

Female 1.298 (1.188,1.418) 1.538 (1.413,1.674) 2.250 (1.814,2.790)

of cases are higher than in stratum GE. However all the true odds ratios lie within

the 95% confidence intervals given in Table 3.11.

3.4 An Actuarial Investigation

The actuary starts with the model of Figure 2.1 in mind, and wishes to estimate the

intensity λs12(x) for each stratum. We assume, realistically, that the best available

data are the published odds ratios. The ‘estimation’ procedure, therefore, consists of

finding a reasonably robust way to estimate transition intensities from odds ratios.

There is no simple mathematical relationship, so approximations must be made.

Supposing that the actuary knows the rates of heart attack in the general popula-

tion λ12(x) (separately for males and females) a simple assumption is that the heart

attack intensity for each stratum is proportional to λ12(x). In stratum s, define:

γs12(x) = cs(x)× λ12(x) (3.19)

where γs12(x) is the actuary’s ‘estimate’ of λs

12(x). Assuming that the odds ratios

(denoted ψs(x)) are good approximations of the relative risks, which is reasonable

as long as the age groups are not too broad, we have:

ψs(x) =γs(x)

γge(x)=

cs(x)

cge(x)(3.20)

which leads to:

cs(x) = ψs(x)× cge(x). (3.21)

As observed from Table 3.10, the odds ratios do not appear to depend strongly on

age. So we further assume that cs(x) is a constant cs for all ages (hence also ψs(x)

54

Page 71: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 3.12: The estimated multipliers cs for each stratum.

Stratum ge gE Ge GE

Male 0.918 1.179 1.492 1.726Female 0.920 1.194 1.415 2.070

is a constant ψs), and therefore:

cs = ψs × cge (3.22)

where ψs is the age-adjusted odds ratio. Thus Equation (3.19) becomes:

γs12(x) = cge × ψs × λ12(x). (3.23)

Now Equation (3.16) can be written:

λ12(x+ t) =

∑sws(x) exp

(− ∫ t

0cgeψsλ12(x+ y)dy

)cgeψs λ12(x+ t)

∑sws(x) exp

(− ∫ t

0cgeψsλ12(x+ y)dy

) . (3.24)

Let us assume that at age x = 60, the ws(x) are given by the population frequen-

cies of the respective strata. Now we can solve Equation (3.24) for the multiplier cge

for a particular choice of age x and any duration t. Then we can use Equation (3.22)

to obtain cs for s = gE, Ge and GE. We find (not shown here) that the results

are very similar for different values of t. In Table 3.12, we show the ‘estimated’ cs

for representative age x = 60 and duration t = 5, based on the the age-adjusted

odds ratios in Table 3.11. These values can be compared with the true constants

of proportionality of the underlying model given in Table 3.6. They are in good

agreement for strata s = ge, gE and Ge. The agreement for stratum s = GE is not

so good, but it was based on a small number of cases, 241 males and 122 females.

55

Page 72: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

3.5 Premium Rating for Critical Illness Insurance

3.5.1 A Critical Illness Model

The actuary will use the intensities γs12(x) ‘estimated’ in Section 3.4 to calculate CI

insurance premiums. Gutierrez & Macdonald (2003) obtained the following model

for critical illness insurance based on medical studies and population data. Full

references can be found in that paper. The structure of the model, as outlined

in the paper, is given in Figure 3.18. The relevant transition intensities are listed

below.

©©©©©©©©©©©©*

¡¡

¡¡

¡¡

¡¡

¡¡

¡¡µ

-@

@@

@@

@@

@@

@@@R

HHHHHHHHHHHHj

State 5 Dead

State 4 Other CI

State 3 Stroke

State 2 Cancer

State 1 Heart Attack

State 0 Healthy

µs01(x)

µs02(x)

µs03(x)

µs04(x)

µs05(x)

Figure 3.18: A full critical illness model for gender s.

(a) For males, the age-dependent transition intensities governing the incidence of

heart attack are given below:

56

Page 73: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 3.13: 28-Day mortality rates, qm01(x) = 1 − pm

01(x), for males following heartattacks.

age x qm01(x) age x qm

01(x) age x qm01(x) age x qm

01(x)

20–39 0.15 47–52 0.18 58–59 0.21 65–74 0.2440–42 0.16 53–56 0.19 60–61 0.22 75–79 0.2543–46 0.17 57 0.20 62–64 0.23 80+ 0.26

µm01(x) =

exp(−13.2238 + 0.152568x) if x ≤ 44

x− 44

49− 44× µm

01(49) +49− x

49− 44× µm

01(44) if 44 < x < 49

− 0.01245109 + 0.000315605x if x ≥ 49

(3.25)

For females, the age-dependent transition intensities are:

µf01(x) =

0.598694

Γ(15.6412)× 0.1531715.6412 exp(−0.15317x)x14.6412 (3.26)

We also need the 28-day survival factors following heart attacks. This relates to

the common contractual condition, that payment depends on surviving for 28

days. Let ps01(x) be the 28-day survival probabilities for gender s, and qs

01(x) =

1 − ps01(x). For females, at ages 20–80, qf

01(x) = 0.21, and for males the values

are given in Tables 3.13.

The 28-day mortality rates given in Table 3.13 can be compared against the

survival probabilities obtained from Capewell et al. (2000) and given in Table

2.1. (Note that the odds ratios given in Table 2.3 is derived from the survival

probabilities in Table 2.1.) As compared with the Capewell et al. (2000) data,

the 28-day mortality rates in Table 3.13 appear slightly higher at younger ages

and lower for older ages. However, to maintain consistency with the CI insurance

model we will use the rates in Table 3.13 to calculated the CI insurance premium

rates.

(b) For males, the age-dependent transition intensities governing the incidence of

57

Page 74: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

cancer are given below:

µm02(x) =

exp(−11.25 + 0.105x) if x ≤ 51

x− 51

60− 51× µm

02(60) +60− x

60− 51× µm

02(51) if 51 < x < 60

− 0.2591585− 0.01247354x

+ 0.0001916916x2 − 8.952933× 10−7x3 if x ≥ 60

(3.27)

For females, the age-dependent transition intensities are:

µf02(x) =

exp(−10.78 + 0.123x− 0.00033x2) if x < 53

− 0.01545632 + 0.0003805097x if x ≥ 53(3.28)

(c) For males, the age-dependent transition intensities governing the incidence of

stroke are given below:

µm03(x) = exp(−16.9524 + 0.294973x− 0.001904x2 + 0.00000159449x3) (3.29)

For females, the age-dependent transition intensities are:

µm03(x) = exp(−11.1477 + 0.081076x) (3.30)

We need the 28-day survival factors following stroke. Let ps03(x) be the 28-day

survival probabilities for gender s, and qs03(x) = 1− ps

03(x). For both males and

females, qs03(x) = 0.002x/0.9.

(d) The transition intensities for other minor causes of critical illnesses amount to

15% of those arising from cancer, heart attack and stroke. So the aggregate rate

of critical illness claims, for gender s, is:

µs(x) = 1.15(µs01(x)× ps

01(x) + µs02(x) + µs

03(x)× ps03(x)) (3.31)

(e) Population mortality rates, given by English Life Tables No. 15 (ELT15) were

adjusted to exclude deaths which would have followed a critical illness insurance

claim.

3.5.2 Premium Rating for Critical Illness Insurance

We will assume that all intensities except those for heart attack are as given here.

For heart attack, we use the intensities γs12(x). We compute expected present values

58

Page 75: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 3.14: The true critical illness insurance premiums for different strata as apercentage of those for stratum ge.

Stratum Males Females

Term TermAge

5 15 25 35Age

5 15 25 3545 112% 111% 109% 107% 45 103% 103% 104% 104%

gE 55 110% 108% 107% 55 104% 105% 105%65 107% 106% 65 105% 106%75 106% 75 106%45 124% 121% 117% 115% 45 105% 107% 108% 108%

Ge 55 119% 116% 114% 55 109% 110% 110%65 114% 112% 65 111% 111%75 111% 75 111%45 136% 131% 126% 122% 45 108% 110% 112% 112%

GE 55 129% 124% 121% 55 113% 115% 115%65 120% 118% 65 116% 117%75 117% 75 117%

by solving Thiele’s differential equations numerically, with a force of interest of

δ = 0.044017 (see Norberg (1995)).

Table 3.14 shows the true premiums for the strata s = ge, Ge and GE, as a per-

centage of the premiums for stratum ge, for males and females and for different ages

and terms. Here, ‘true’ means that they have been computed using the intensities

of Chapter 2, not the actuary’s estimates. Table 3.15 then shows the corresponding

premiums, again as a percentage of those charged for stratum ge, using the actu-

ary’s estimates from Section 3.4. The results are similar to those in Table 3.14.

Comparing the actuary’s estimates with the true CI insurance premiums, we can

see that the estimates are very good for stratum gE. For stratum Ge, the estimates

are also within ±2% of the true values. However, the estimates are not as accurate

for females in stratum GE. As mentioned before, stratum GE had relatively few

cases resulting in high volatility of the estimated values.

59

Page 76: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 3.15: The actuary’s estimated critical illness insurance premiums for differentstrata as a percentage of those for stratum ge.

Stratum Males Females

Term TermAge

5 15 25 35Age

5 15 25 3545 112% 110% 109% 107% 45 103% 104% 104% 104%

gE 55 110% 108% 107% 55 105% 105% 105%65 107% 106% 65 106% 106%75 106% 75 106%45 126% 123% 119% 116% 45 105% 106% 108% 108%

Ge 55 121% 117% 115% 55 108% 109% 109%65 115% 113% 65 110% 110%75 112% 75 111%45 137% 132% 126% 123% 45 111% 115% 118% 118%

GE 55 129% 124% 121% 55 119% 121% 121%65 121% 119% 65 124% 124%75 117% 75 125%

60

Page 77: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Chapter 4

UK Biobank Simulation Results

4.1 Varying the Genetic and Environment Model

In the last chapter, we estimated parameters of a heart attack model and the result-

ing CI insurance premiums, based on a simulated realisation of UK Biobank. The

underlying ‘true’ model (chosen by us) was particularly simple — two genotypes,

two environmental exposures and proportional hazards of heart attack — and by

great good luck, our model epidemiologist hit upon exactly the correct hypotheses

in fitting his/her model. So it is not surprising that he/she obtained good parameter

estimates, with the possible exception of those in respect of the smallest stratum,

GE.

In reality, the epidemiologist faces more difficult problems:

(a) There is likely to be more than one gene, many with more than two variants,

as candidates for influencing the disease.

(b) Similarly, there are likely to be several environmental exposures of interest.

(c) Model mis-specification is always possible (indeed, it may be the norm).

(d) On grounds of cost, the number of cases and the number of controls per case

may be limited.

(e) As mentioned earlier, UK Biobank is a single unrepeatable sample, hence sam-

pling error is present. Although 500,000 seems like a huge sample, it may not

be when smaller numbers of cases are sampled from within it.

In a simulation study, we are in a position to explore these problems. In par-

61

Page 78: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

ticular, we can address (d) and (e) above, because we can replicate the entire UK

Biobank dataset many times, and repeat the epidemiological and actuarial analyses

using each realisation. Thus we can estimate the sampling distributions of parameter

estimates and premium rates, while the analysis of the single realisation in Section

3 only gave us point estimates of the latter. (We did give approximate confidence

intervals of the estimated odds ratios, because they can be derived on theoretical

grounds. This is not possible for such a complicated function of the model parame-

ters as a premium rate, and simulation is one of the few practical approaches.) We

concentrate on this question in the rest of this thesis, because it is directly relevant

to the approach adopted by GAIC in the UK, and likely to be adopted by similar

bodies elsewhere, which demands that the reliability of prognoses based on genetic

information must be demonstrated if it is to be used in any way. In the case of

multifactorial disorders, we assume that this requirement is to be interpreted in the

statistical sense rather than as applying to individual applicants. Our exploration

of (a), (b) and (c) above will be the subject of a future paper.

In addition to simulating many replications of UK Biobank, we will consider the

effect of stronger or weaker genetic and environmental effects, and of more common

and less common adverse genotypes. We call each such variant of the underlying

model a ‘scenario’, which should not be confused with the simulation procedure

discussed above. We will hold each scenario fixed, and then simulate outcomes of

UK Biobank under those assumptions.

We have already introduced one set of assumptions is Section 3.1, which we will

refer to as our Base scenario. The details of all the scenarios are given in Table 4.16.

The parameters that must be specified are:

(a) The population frequency of each stratum (the same for males and females).

(b) The parameters k for each sex and ρs for each stratum. Although ρs does not

depend on sex, for convenience Table 4.16 shows the combined constants of

proportionality k × ρs for each sex.

Although the odds ratios are derived quantities rather than parameters, they are

also shown in Table 4.16 for convenience.

The Low and High Penetrance scenarios assume smaller and larger differences,

62

Page 79: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 4.16: The model parameters for different scenarios. Odds ratios are alsoshown.

Penetrance FrequencyParameters Stratum Base Low High Low High

ge 0.81 0.81 0.81 0.9025 0.64Population gE 0.09 0.09 0.09 0.0475 0.16Frequency Ge 0.09 0.09 0.09 0.0475 0.16

GE 0.01 0.01 0.01 0.0025 0.04

ge 0.70 0.85 0.55 0.70 0.70gE 0.90 0.95 0.85 0.90 0.90

ρs Ge 1.10 1.05 1.15 1.10 1.10GE 1.30 1.15 1.45 1.30 1.30

k (Male) All 1.317274 1.136603 1.568090 1.370745 1.221620k (Female) All 1.316406 1.136463 1.564821 1.370230 1.220385

ge 0.922 0.966 0.862 0.960 0.855k × ρs gE 1.186 1.080 1.333 1.234 1.099(Male) Ge 1.449 1.193 1.803 1.508 1.344

GE 1.712 1.307 2.274 1.782 1.588

ge 0.921 0.966 0.861 0.959 0.854k × ρs gE 1.185 1.080 1.330 1.233 1.098

(Female) Ge 1.448 1.193 1.800 1.507 1.342GE 1.711 1.307 2.269 1.781 1.587

ge 1.000 1.000 1.000 1.000 1.000gE 1.286 1.118 1.545 1.286 1.286Odds RatioGe 1.571 1.235 2.091 1.571 1.571GE 1.857 1.353 2.636 1.857 1.857

63

Page 80: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

respectively, between the effects of the different strata, governed by ρs. The Low

and High Frequency scenarios assume that disadvantageous G genotype and E envi-

ronment have population frequencies half (0.05) or double (0.2) those in the baseline

scenario (0.1), respectively.

In Section 3.3, we noted that problems like missing values and cost constraints

might limit the number of cases that can be used for analysis. So we will also examine

the effect of limiting the number of cases used in the analysis. From Table 3.9,

around 20,000 individuals were eligible to be considered as cases (in that particular

realisation). For each scenario, we will show results based on 1,000, 2,500, 5,000 and

10,000 cases as well as those based on all cases.

4.2 Outcomes of 1,000 Simulations: The Base

Scenario

We will make 1,000 simulations of UK Biobank. The outcomes will be the empirical

distributions of the parameters of the epidemiologist’s model, and of CI insurance

premium rates. Let us first consider the Base scenario, all cases included, for males

aged 45 taking out a CI insurance policy with term 15 years. Figure 4.19 shows

scatter plots of the CI insurance premium rates per unit sum assured for strata gE,

Ge and GE versus those of ge. More precisely, the outcome of the ith simulation is a

drawing pi = (pige, p

igE, p

iGe, p

iGE) from the sampling distribution of the 4-dimensional

random variable P = (Pge, PgE, PGe, PGE), where Ps is the premium rate in stratum

s.

The scatter plots show clearly that the premium rate pairs (Pge,PgE) and

(Pge,PGe) are more strongly correlated than the pair (Pge,PGE). This is true, as

the correlation matrix given in Table 4.17 shows, but note that the scale of the

x-axis is greatly compressed compared with that of the y-axis. The reason they

are correlated is that, as outlined in Section 3.4, the actuary uses the three odds

ratios published by the epidemiologist, plus the overall population intensity of heart

attack, to obtain the heart attack intensities for the four strata, so the four premium

estimates are not independent. The reason that the correlations are negative is that

64

Page 81: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

***** **

*** * **

**** **

***

**

*

* ***** ** * *

* ******** * *** *

*****

***

*** *** **

** ***

*** **

***

* *****

*** ** * ** * *** *

*** * ****

** * *** ** ***** **** * * ** ** *

** ** **

* **

* *** *** ***

***

* * **** ** * ***

* ** * **

** *** * **** * *

* * ** ** * * ***

***

* *** ****

**** *

**

* * ** *

** ** *

* ** **

***

*****

****

** * *

*

** *** *

*** * ** ***

* ** * ** * **** * * *

**** * ** *

* ** **

**

** **

***

***

* *** *** ** ***** * *

**

* * **

* **

***

**

** ** *

****

* ** **** *

** *

***

*** *

*** ** ** **** ** ** * ** **

***

* ** * ****

**

**

****

***

* ******

****

* ** ** * **

**

* ***** *

***

**

** *

****

* **

** *** ** *** ** * *

***

* ****

***

***** ****

**

**

** *** ** * * **

**

** *

**

* ****

** * * *

*** *** * *

*** ** ** **** *

*

** **

** ** ** *** ***

* *****

*** * ** *** *

* ** **** * ** *** ** *

*** **

**** *** * * **

** ** ** *

** *

** ** **

**

**

*** * *

** * *

*** *

* **

** **

* * ** * ** **

** ***

** **

*** ****

** *

***

* ** * *** * ** * **** * * *** * ** *

**

** **** ** ***

**

**

* *** * ****

* ***

** ** **

** *

** **** *

* **** *** ** ** *

* ****

****

** *** *

*

*** *** **

*** * ** *** * ** ** *

*** *

***** *

*

*** ** *

**

* * **

** ** ** * ** * *** *** *** *

* ** **** *

*** * **

** **

* *** * *** **** ** *

* **** ** *

** *** *** **

** **

* ***** *

****

***

*****

***

*** **

* ** *

** * * * ***** *

***

** **** * * ** **

** ** *

****

**** ** ** * **

******

*** * *** *** ** **

*** ** *** *

***

** ***** ****

0.00865 0.00870 0.00875 0.00880

0.00

90.

010

0.01

10.

012

0.01

3

Pge

P gE ,

PG

e , P

GE

oo ooo

oo o

o

oo o

o oo

oo

oo

ooo o

o

oo

oooooo

o oo

o oooo oooo o ooo oo oooo

ooo o

oo ooo oo oo ooo ooo o ooo

o

o ooo

oooo

ooo o

oo

oo

oo o ooo o oo oo ooo o

oooo

oooo o

oo

oo oo oo oo

o oo oo oo o

o oo oo

o oo o oooo ooo

o oooo o

o o ooo

o

o

o o oo oo ooo

ooooo o o

o o oo

o

o o o oooo

oo o ooo oooo

o

o oo ooo

o ooo o oo

oo

oo

oo o

oo

oo oooo

oo

oooo o o o ooo o

oo o ooo o oo oooo

oo o oo o oooo o o oo

ooo o oo

o

o oo ooo

o oo oo ooooo

oo

ooo ooo oo ooooo oo

o oo oooo o o

ooo o ooo o

o oooooo

oooo

o o oo o oo ooo oo oooo oo o

ooooo o

oo

oo o

o oo oooooo o ooo oo

ooo

ooooooo o oooo

oooo

o oo oo o

o o oo ooo

oo o

oo ooo oo o o

oo

ooooo o o oo ooo oo

ooo o

oo o

ooo o oo ooo ooo o

ooooo

ooo

oo oo

o

ooo oo

o o ooo oo o oo oo o

o ooo

o o oo

oooo

oo

o oo oo oo

oo oooo oo oo o

ooo

oooo ooo

ooo

oooooo oo

oo oo ooo ooo

o

o

ooo o

oo oooo

oo

ooo oooooo

ooo oo o

ooo oo oo

o oo

oo o ooo

oo

oo o ooo o ooo

ooo

oo oo oooo

oo oo

o

o o oo

o ooo o oooo oo o

ooooooo o o

oo oo

oo o oo

oo o

oo

ooo

oo

o ooo

o o

oo oo oo o

ooooo oooo oo oo ooo o

o ooo o oo o

ooo

oo

ooo o

oo ooo

ooo oooo ooo

oo ooooooo o

o oooo ooo

ooo ooo o

oo oooo

oo o

ooo

o o oo oo

o ooo oo ooo o oo ooo oo oo oo o oo

ooo

ooo o o

oo ooo

ooo ooo oo

oo oooooooo o

oo oo oo o

ooo o oo

ooo

ooooo

o oo oo oo ooo

ooo

ooo ooo

ooo

o ooooo o

oooo oo oooo

o

oooo oooo

oo

o

o ooo o o o

oooooo oo

oo

o oooo o o oo ooo o oo

oooo

ooo

oo oooo o

ooo

oooo

o oo

o o

o

ooo

oo oo

oooo

ooo o o

o ooo o o

oo

ooo oooo

o++

+++ +

+ +++

+

+

+

+

+++

+

+

+

+

+

+

+

+

++

++

+

+

+

+ +

+++

++

+

++

++ +

++

++

+

++++

+

++++

++++ +

++

+

+

+

+

++

++

++

+

+

+

+

++

++

+++ +

++ ++

+

++

++

+

+

+

++

+

+

++

+

+ ++

+

+

++

+

+

++

++

++ + ++

+

+++

+

+

+++

+

++

+

+

+

+

++

+

+

+

+++

++

+

+ +

++

+ +

++

++

+

+

+

+ +

++ ++

++

+

+

++

+

+

+

++

+++

+

+ + +

+

+

++

+++

+

+

+

+

+

+ +

+

+

+

++

+ ++ + +

+

++

++

++

+

++ +

+

+++

++

+

+

+

+

+

+++ +

++

++

++++ +

++

++ +

++

+

++

+

+

+ ++

++

+

++ + +

++

+

+ ++ +

+

++ ++

+

++

+ ++

+

++

+

+

+ ++

+

+

++++

+ ++ ++++

+

+

++

+

++

+++

+

+

+++

++

++ +

+

+

+

+

+

+

+

++ +

+

+

+

+

+

+

+

+ ++

+

+

+ +

+++

+

+

+

++

+

++ ++ +

+

+

+

++

+ +

+

+

++

+

+

+

+

+

++

+ ++

+

+

++

+

+

+

++

+

++

+++++

+++

+

+

+ +

+

+ +

+

+

+++

+

+

+

+

+

++

+

++

+

+

+

+

+

+

+

++

+

+

++

+++

+

+

+

+

+

+ ++

++ ++

+

+

++

++

+++

++

+

++

+

+ +

+

++

+

+

+

++ ++

+

++ +

++

++

+

+ ++

+

++

++ + +

+

+

+ +

+

+

+

++ +

+

++

++

+

+

+

+

++

++

+

++

+

++ +

+

+

+

+ ++++ +

++

+

+

+

++

+ +

+ +

++ +

+

++

++

++

+ +

+

+

+

+

+

++ ++++

++ +

+

+

+

+++

++

+

++ +

++

+

+

++

+

+ +

+

+

+++

++

+

++

+

+

+

+

+ ++

+ +

+ +

+

+

++

++

++ +

+

+

+

++

+

+++

+

++

++

+

+

+

+

+

++

+

+

+ ++++ ++

+

+

+

+++

+

+

+

+

+

+

+

+

++

+

++

++

+

+ +

+

+++

+

+

+

+

+

+

+++

++ +

++

+

+

+ ++

++

+

+

++

+

+ ++

++

+

+ +

++

+

+

+

+++

+

+

+

+

++

+ ++

+

+

+++

+

+++

++

++

+ +++

+

+ +

+

+ +

++

+

++

+

+ +

++ +

++ + ++

+++

+

+

++

++

++

+

+++

+

+ +

+

++

++

+

+

+++

++

++ +

+

+

+

+

++

++

+

+++ ++

+++

+

+

+

+

+

++

+

+

+

+++

+ +

++

+

+ +++

+

+

++ +

++

++++

++

+ ++

+

++

+ +

+

+

+

++

+ ++

+

+

+

+

+

+ + +++

+

++

+

+

++

+++

+

+

+

+

+

++

++

+++

+ ++

+ +

+

+

+

+

+ +

++

+

+

++

++

+ +

++

+++ +

++

++++

+

+++ +

+

+ ++

+ +

++

+++++ +

++

+

+++

++

+ +

+

+

+

+

+

+

++ +

+

++

++

+

+

++

+

+

+

+++

+

+

*o+

( Pge , PgE )( Pge , PGe )( Pge , PGE )

Figure 4.19: Scatter plots of CI insurance premium rates for strata gE, Ge andGE versus that of ge under the Base scenario for males aged 45 and policy term 15years.

the overall level of the four intensities is adjusted so that their aggregate effect is

consistent with the general population. So, if the intensities in any of the strata are

high, the intensities in the others will tend to fall to restore consistency with the

aggregate intensity.

We also consider the premium rates for strata gE, Ge and GE as a proportion

of those for stratum ge, namely PgE/Pge, PGe/Pge and PGE/Pge. These correspond

to premium ratings, if we take the standard premium rate to be that of stratum

Table 4.17: The correlation matrix of the strata-specific premium rates for malesaged 45 and policy term 15 years under the Base scenario, all cases included.

Stratum ge gE Ge GE

ge 1.000gE −0.604 1.000Ge −0.656 −0.123 1.000GE −0.194 −0.057 −0.095 1.000

65

Page 82: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

o ooo o

o oo oo

oo oo ooo

ooo

oooo

oo

o o o ooooo

ooo oo ooo oo oo o oo oo o

ooo

o oooooo oo oo oo o oo o ooo

ooo

oo ooo

oo oo

o ooo

ooo

o ooo o ooo o ooo oooo o

ooo

o o oo ooo oo

oo oo ooo oooo oo

ooooo o o ooo o o oo

o ooooo oo

ooo o oo

oooo oo oo o

ooo o oooo ooo o

oooooo o

oo ooo o oo o oo

ooo oo o

ooooo oo o o

oo

ooo oo

oo oo oo o oooo

o ooooo oooooooo ooo oo oo o

oo oo ooo o ooooo ooo

ooo oo

oo oo oo

oooo oo oo ooo

oo ooo o oooo o oooo

ooo oo

oo oooo

o ooo oo ooo o ooo ooo

o o ooo ooo ooo oo oo ooo ooo

oo ooo

oo

oo

oooo oo oo oooooo oo o o

oo

oo o oo o oo o o o o

oo o o

oo ooooooo oo o

oo

ooooo ooo ooo

oo oo oo oo oo oo ooooo oo o

oo oo o ooo oo

o oo o oooooo o

o ooo

oo oo

o ooo o oooo oo oo o oo oo

oo oo

ooo ooo oooo

oo oo oo oo oooooo oo oo o ooo o

oooo oo o

oo o oo o oo o

ooo ooo oo oo

o

o

o oooo oo ooo o

oo o ooo oo o o o ooo oo

oooooo o ooo

o oo o oo

oo

o ooo o ooo ooo o oo oo oo oo o

o oo ooooo o

oo o ooo o ooooo

o oo ooo ooo o

oo ooooo

o ooo

oo o ooo

ooo

o oo o

ooo oo oooo oooo oo oo oo ooo oo oo oo oo

oooo

o

oo

oo oo

ooooo

oo oooooo oooo ooo o

o o oo oo oo ooo oo

o oo o ooo o o o o

oo ooo oo o oo oo

o oo o oo oo oo oooo ooo oo oo ooo o

o oo

oooooooo o o

o oooo oo oooo

o o oo

o o ooo

oo oo oo ooooo o

oo o

o oooo oo oo oooo o o

ooooo oo

o oo o ooo o ooo oo o ooo oo ooo

oo o o ooo o

oo o

ooo o

oooooo o o o

oo oo

ooo

oooooo oo o ooo oo oo o

o ooooo o o oo

o oo

oo oooo

ooo

oo o

oo ooo

o oo o

oo ooo oo

o o ooo

oo o ooooo

o

1.05 1.10 1.15 1.20

1.0

1.2

1.4

1.6

1.8

RgE

R Ge ,

RG

E

+ ++ +

++ ++ +++

+

++

+++

+

+

+

+

+

+

+

+

++

+ +

+

++

+++ +

+

++

+++

++ +

+ +

++

+

+ +++

++ +

+ ++

++ ++++

+

+

+

+

+ +++

++

+

++

+++++

+ +++

++ + +

+

+++

++

+

+

++

+

+

++ +

+++

++

++

+

+

++

++

+ ++ ++

+

+ ++

+

+

+++ +

+++

+

+

+

+ +

+

++

++ +

++

+

+++

+

+++

+

+ +

+

+

+++

+ ++ ++ +

++

++

+

+

+

++

++ ++

++++

+

++

+ +++

+

++

+++

+

+

++

+++ +++

+

++

+ +++

+++ +

++

+ +++

+

+

+

+

+

++ ++

++

++

++ +++++

+++

++

+

+ +

+

+

++ ++

+

+

+++++

+

++ +++

++

++ +

+

++

+++

+

++

+

+

++++

+++

+ +++++ ++

+++

++

++

+++

++

+++ +

++++ +

+

+

+

+

+

+

++

+++

+

+

+

+

+

+

+++

+

+++

++

+

+

+

++

++

++ + +++

++

++ ++

+

++

+

+

++

+

+

++

++ +

+

+

+ +

+

++

++

+

++

++ +++

+ ++

+

+

+++

+++

+

+ ++

+

+

+

++

++

+

++

+

+

++

+

+

+

++

+

+

++

++++

+

+

+

++++

+++ +

+

+

++ + ++

++

++

++ +

+

+ +

+

+ +

+

+

+

+ + ++++ ++

++

+ ++

++ +

+

++

+++ +

+

+++

+

++

+++

+

++++

+

+

++

++

+ ++

++

+

++++

+

+

++ + ++ +

++

++

++

+

++++

+ ++

+

++

+ +

++

+++

+

+

+

+

+++

+ ++

+++

+

+

+

++++

++

+++

++ +

+

+ +

+

++

+

+

++++ +

+

+ +

+

++

+

+++

++

++

+

+

+++

+

++ +

+

+

++

+

+

+ + ++

+ ++

+

++

++

++

++

+

+++ + ++ +

+

+

+

+ + +

+

+

++

+

+

+

+

++

+

++

++

+

+++

+ ++

+

+

+

++

++

+ ++

++

++

+

+

+ ++

+ +

+

+

++

+++

+

+++

++

+ +

+

+

+

++ +

+

+

++

++

+ ++

+

+

+ ++ +

++ ++ +

++

++ ++

+

++

+

++

++

+

+ +

+

+ +

++ ++ + ++

+++

+

+

+

++

++

++

+

+ ++

+

+++

++

++

++

++ +

+++++

+

+

+

+

+ +

+ ++

++ ++ ++++

+

++

+

+

++

+

+

++ ++

++

++

+

++ ++

+

++

+ ++

+++ +

++

+

+++

+++

++

+

+

+

+ +

+ ++

++

++

+

+ ++ + +

+

+ ++

+

+ ++

+ +

+

+

+

+

+

++

++

++ +

+++

++

+

+

+

+

++++

++

+ +++

+++

++ + +++

++ ++ +

+

++++

+

+ ++

+++

+++ +

+++++

+++ +

++

++

+

+

+

+

+

+

+++

+

+++ +

+

+

+ ++

+

+

+ +++

+

o+

( RgE , RGe )( RgE , RGE )

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

Premium Ratings

Den

sity

RgERGeRGE

Figure 4.20: The scatter plots of the premium ratings Ge/ge and GE/ge versusgE/ge and the corresponding density plots for males aged 45 and policy term 15years under the Base scenario, all cases included.

66

Page 83: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 4.18: The correlation matrix of the premium ratings for males aged 45 andpolicy term 15 years under the Base scenario, all cases included.

Stratum RgE RGe RGE

RgE 1.000RGe 0.095 1.000RGE 0.013 −0.018 1.000

ge, and we will refer to them as such. For brevity, define Rs = Ps/Pge to be the

premium rating for stratum s with respect to stratum ge. The correlation matrix

of these premium ratings is given in Table 4.18 and the corresponding scatter plots

are given in Figure 4.20. Both suggest correlations are small enough to neglect,

which means that instead of always considering the full joint distribution of the

premiums P , we can obtain all the information of interest by separate examination

of the marginal distributions of the premium ratings. The densities of these marginal

distributions are given in Figure 4.20. This immediately suggests a simple approach

to the questions that GAIC must ask, because the reliability of the premium rating

in each stratum — in terms of its distinguishability from the premium ratings in

the other strata — is revealed by the degree to which its marginal density overlaps

the marginal densities of the others. Presented with Figure 4.20, we might expect

GAIC to agree that strata Ge and GE had premium ratings distinct from that of

stratum gE, but to ask whether or not they had premium ratings reliably distinct

from each other.

4.3 A Measure of Confidence

Our precise formulation of the question that GAIC might now ask is: are the

marginal empirical distributions of premium ratings in different strata sufficiently

different to support charging different premiums (when doing so is allowed)? Hence,

we need some kind of measure of confidence in distinguishing one stratum from

another in terms of CI insurance premium ratings.

Statisticians normally use non-parametric tests, like the Kolmogorov-Smirnov

67

Page 84: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

test, to check whether two underlying one-dimensional probability distributions dif-

fer from one another by comparing their empirical distribution functions. How-

ever, these types of test cannot be applied in a simulation exercise as the power of

Kolmogorov-Smirnov type tests increases as the number of observations available for

each distribution increase. In a simulation exercise, it is possible to generate a large

number of estimates by repeating the experiment any number of times and thus

superficially increasing the power of the test. As a consequence, the Kolmogorov-

Smirnov test could not be used to distinguish one risk stratum from another. In the

remainder of this section, we will suggest a simple alternative measure to achieve

this.

Let X and Y be two continuous random variables with cumulative distribution

functions FX and FY respectively. We can find u such that FX(u) + FY (u) = 1. If

the ranges of X and Y overlap, u lies in both and is unique, otherwise any u that

lies between their ranges will do. This can be rewritten as FX(u) = 1 − FY (u), or

P[X ≤ u] = P[Y > u].

Without loss of generality, let us also assume that FX(u) ≥ FY (u). Let us

define our measure of confidence to be 2× FX(u)− 1, which gives a measure of the

overlap of FX and FY . Denote this O(X, Y ), or just O if the context is clear. If

FX(u) = FY (u) = 0.5, then we are as unsure as we can be that FX and FY are

distinct, and O = 0. As FX(u) increases to 1, the area of overlap decreases. If the

ranges of X and Y do not overlap at all, FX(u) = 1 and we have high confidence in

deciding that FX and FY are distinct; in this case O = 1. In this sense, O measures

how confident the underwriter can be that the two distributions are different.

4.4 Results

In this section, we simulate 1,000 realisations of UK Biobank under each scenario

outlined in Table 4.16. Our aim is to examine how reliably UK Biobank might

identify differences in premium ratings, as a body like GAIC might require. This is

measured by the three quantities O(RgE, RGe), O(RGe, RGE) and O(RgE, RGE). We

have verified (not shown here) that these do not vary significantly by age or policy

68

Page 85: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 4.19: The measure of overlap O for CI insurance premium ratings for malesaged 45, with policy term 15 years, for different scenarios.

Scenario Cases O(RgE , RGe) O(RgE , RGE) O(RGe, RGE)

All 1.000 1.000 0.92410,000 0.968 0.962 0.632

Base 5,000 0.872 0.850 0.4842,500 0.718 0.698 0.3561,000 0.490 0.416 0.176

All 0.918 0.904 0.57210,000 0.662 0.658 0.346

Low Penetrance 5,000 0.528 0.472 0.2162,500 0.412 0.360 0.1481,000 0.250 0.222 0.076

All 1.000 1.000 0.99210,000 1.000 0.998 0.844

High Penetrance 5,000 0.984 0.970 0.6922,500 0.906 0.886 0.5401,000 0.688 0.658 0.354

All 0.996 0.948 0.65810,000 0.892 0.706 0.352

Low Frequency 5,000 0.712 0.516 0.2082,500 0.566 0.322 0.0601,000 0.386 0.394 0.226

All 1.000 1.000 0.99410,000 0.988 1.000 0.896

High Frequency 5,000 0.932 0.986 0.7442,500 0.806 0.902 0.5461,000 0.594 0.716 0.358

term, so in Table 4.19, we present results for a representative policy for males aged

45 and policy term 15 years.

Note that it is impossible to calculate an odds ratio for a given age group unless

there is at least one case in that age group in each stratum. In some circumstances

some of the 1,000 simulations failed this criterion, and these were omitted from the

results in Table 4.19. Those affected were the Base and the Low Penetrance scenarios

with 1,000 cases (1 simulation omitted in each case) and the Low Frequency scenarios

with 2,500 and 1,000 cases (10 and 238 simulations omitted, respectively). We make

the following comments on Table 4.19:

69

Page 86: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(a) We saw in Section 4.3 that under the Base Scenario, all cases included, the den-

sities of RGe and RGE overlap over a small region. This qualitative observation

is made more concrete by Table 4.19, which shows that O(RGe, RGE) = 0.924 in

this case. By definition, this means that there exists x such that P[RGe < x] =

P[RGE > x] = 0.924, and we (or GAIC) may have high confidence in assigning

these strata to different underwriting groups.

(b) Stratum GE is always the smallest, so the distribution of RGE is always the

most spread out. This is also evident from the scatter plots in Figure 4.20.

(c) We expect real case-control studies to use only a subset of cases, and Table 4.19

shows that the effect of this is very great. For example, in the Base scenario,

O(RGe, RGE) falls from 0.924 to 0.176 as the number of cases used falls from ‘All’

to 1,000. Figure 4.21 shows, for the Base scenario, the marginal densities with

different numbers of cases. The densities overlap considerably if the number of

cases is small (and bear in mind that 1,000 cases is not a very small investigation

by normal standards).

(d) Figure 4.22 shows the empirical distribution functions of the premium ratings

for males under the Base scenario. For each premium rating, we show the effect

of using different numbers of cases. For example, if only 1,000 cases were used,

there is about a 30% chance that underwriters would incorrectly assume RGE

to be 150% or higher. If instead 10,000 cases were used the chance of making

this error is very small.

(e) Figure 4.23 shows, for 5,000 cases, the effect of the different scenarios. Reduced

frequency of the adverse genetic and environmental exposures, or reduced pene-

trance of the adverse genotype, both reduce the ability to discriminate between

different underwriting classes. Changes in the opposite direction improve the

discrimination. This qualitative observation is backed up in a more quantitative

way by Table 4.19.

Table 4.20 gives the corresponding results for females. When a fixed number of

cases is used the results are very similar to those for males. This is as expected,

as we assumed that the effects of genotype and environmental exposures were the

same for males and females, albeit acting on different baseline risks of heart attack.

70

Page 87: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

Base − All Cases.

Premium Ratings

Den

sity

RgERGeRGE

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

Base − 10,000 Cases.

Premium Ratings

Den

sity

RgERGeRGE

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

Base − 5,000 Cases.

Premium Ratings

Den

sity

RgERGeRGE

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

Base − 2,500 Cases.

Premium Ratings

Den

sity

RgERGeRGE

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

Base − 1,000 Cases.

Premium Ratings

Den

sity

RgERGeRGE

Figure 4.21: Marginal densities of premium ratings in the Base scenario (males)with different numbers of cases in the case-control study.

71

Page 88: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

RgE

Cum

ulat

ive

Dist

ribut

ion

Func

tion

All10,0005,0002,5001,000

1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

RGe

Cum

ulat

ive

Dist

ribut

ion

Func

tion

All10,0005,0002,5001,000

1.0 1.5 2.0 2.5

0.0

0.2

0.4

0.6

0.8

1.0

RGE

Cum

ulat

ive

Dist

ribut

ion

Func

tion

All10,0005,0002,5001,000

Figure 4.22: The empirical cumulative distribution function of the premium ratingsgE/ge, Ge/ge and GE/ge for males aged 45 and policy term 15 years under theBase scenario.

72

Page 89: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

Base − 5,000 Cases.

Premium Ratings

Den

sity

RgERGeRGE

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

Low Frequency − 5,000 Cases.

Premium Ratings

Den

sity

RgERGeRGE

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

High Frequency − 5,000 Cases.

Premium Ratings

Den

sity

RgERGeRGE

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

Low Penetrance − 5,000 Cases.

Premium Ratings

Den

sity

RgERGeRGE

1.0 1.2 1.4 1.6 1.8

05

1015

2025

30

High Penetrance − 5,000 Cases.

Premium Ratings

Den

sity

RgERGeRGE

Figure 4.23: Marginal densities of premium ratings in different scenarios (males),with 5,000 cases in the case-control study.

73

Page 90: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 4.20: The measure of overlap O for CI insurance premium ratings for femalesaged 45 with policy term 15 years, for different scenarios.

Scenario Cases O(RgE , RGe) O(RgE , RGE) O(RGe, RGE)

All 0.990 0.990 0.73410,000 0.958 0.948 0.626

Base 5,000 0.850 0.844 0.4942,500 0.728 0.706 0.3781,000 0.466 0.488 0.244

All 0.778 0.762 0.40210,000 0.680 0.646 0.302

Low Penetrance 5,000 0.528 0.506 0.2222,500 0.392 0.326 0.1221,000 0.238 0.198 0.078

All 1.000 1.000 0.90610,000 1.000 0.998 0.836

High Penetrance 5,000 0.992 0.984 0.6962,500 0.914 0.884 0.4841,000 0.716 0.656 0.320

All 0.932 0.800 0.43610,000 0.896 0.676 0.298

Low Frequency 5,000 0.748 0.486 0.1922,500 0.552 0.340 0.1341,000 0.406 0.374 0.218

All 0.998 1.000 0.92210,000 0.994 1.000 0.884

High Frequency 5,000 0.922 0.986 0.7562,500 0.814 0.914 0.5761,000 0.598 0.678 0.348

However, when all cases are included, the values of O are smaller than those for

males. This is because the lower incidence of heart attack among females results in

fewer cases, therefore estimates with higher variances.

Until now, we have used a 1:5 matching strategy for all case-control studies; that

is, five controls per case. However, cost constraints might dictate the use of fewer

controls. In Table 4.21, we show the values of O for males if a 1:1 matching strategy

is used. As expected these are decreased significantly under all scenarios.

As we mentioned when discussing Table 4.19, we may find simulations under

which the odds ratios cannot be calculated because of a lack of cases. Also, note

74

Page 91: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 4.21: The measure of overlap O for CI insurance premium ratings for malesaged 45, with policy term 15 years, for different scenarios and a 1:1 matching strat-egy.

Scenario Cases O(RgE , RGe) O(RgE , RGE) O(RGe, RGE)

All 0.990 0.990 0.77410,000 0.886 0.872 0.454

Base 5,000 0.740 0.720 0.3742,500 0.554 0.544 0.2481,000 0.378 0.400 0.222

All 0.808 0.820 0.45610,000 0.558 0.526 0.220

Low Penetrance 5,000 0.372 0.378 0.1882,500 0.288 0.308 0.1841,000 0.232 0.204 0.048

All 1.000 1.000 0.90810,000 0.988 0.978 0.680

High Penetrance 5,000 0.898 0.902 0.4942,500 0.762 0.742 0.3661,000 0.548 0.480 0.222

All 0.954 0.856 0.474Low Frequency 10,000 0.738 0.558 0.284

5,000 0.574 0.464 0.228

All 1.000 1.000 0.95010,000 0.944 0.986 0.746

High Frequency 5,000 0.826 0.932 0.5922,500 0.668 0.802 0.4561,000 0.474 0.594 0.306

that the calculation of odds ratios requires the existence of enough exposed controls.

This is more demanding under a 1:1 matching strategy, as fewer controls are available

than in 1:5 matching strategy. At first sight this is surprising; it ought to be easier to

find a smaller number of controls. This is true, but there is also a higher chance that

one of the cells in the 2 × 2 table used to calculate the odds ratio will be empty (see

Table A.31 in Appendix A). Table 4.22 shows the numbers of simulations rejected

for this reason. The numbers are very high for the Low Frequency scenarios where

1,000 and 2,500 cases were used. The values of O based on the remaining simulations

are not reliable and so these are not given in Table 4.21.

75

Page 92: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 4.22: The number of simulations rejected due to the inability to calculate theodds ratios for a 1:1 matching strategy.

Number of CasesScenario All 10,000 5,000 2,500 1,000

Base 0 0 0 0 13Low Penetrance 0 0 0 0 16High Penetrance 0 0 0 0 36Low Frequency 0 0 6 123 630High Frequency 0 0 0 0 0

4.5 Conclusions

Earlier in this chapter, we asked the question: how well may UK Biobank distinguish

between different levels of risk associated with the influence of genes, environment

and their interactions on a given multifactorial disorder? Using a simple model

of heart attack as an example, we simulated the outcome of UK Biobank, each

simulation consisting of 500,000 life histories. Then we supposed that a model

epidemiologist carried out case-control studies using the UK Biobank data, and a

model actuary used the published odds ratios from these studies to parameterise a

pricing model for CI insurance.

We supposed that GAIC (in the UK) would approach the question of the relia-

bility of any genetic test capable of detecting the genetic variation in terms of its

ability to allocate tested individuals to distinct underwriting classes. From each sim-

ulation of UK Biobank we could estimate the premium rates of a representative CI

insurance policy for each stratum defined by genoype and the environment, and for

each sex. From a large number of such simulations, we could estimate the sampling

distributions of premium ratings with respect to a chosen ‘standard’ underwriting

class.

For simplicity, we used only two genotypes and two levels of environmental ex-

posure (as in the examples in the UK Biobank protocol). We used proportional

hazards of heart attack in different strata, and assumed that the model epidemiol-

ogist, in his/her analyses, hit upon the same model. Thus our results correspond

to the simplest possible hypothesis that might be investigated using UK Biobank,

76

Page 93: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

and is free of model mis-specification on the part of the analyst, and of any noise,

nuisance parameters, or missing or contaminated data.

The parameters we chose as our baseline represented genetic and environmental

exposures that were fairly common (10% of the population with each adverse expo-

sure) and had modest penetrance: the most adverse stratum (GE) and least adverse

stratum (ge) had intensities of heart attack 30% higher and 30% lower than average,

respectively. (For comparison, CI insurance underwriters typically might consider

an extra premium to be appropriate once the assessed premium exceeds about 25%

of the standard.) We also considered the effect of varying key parameters, as follows:

(a) The relative incidence rates of heart attack for each stratum.

(b) The population frequencies of each stratum.

(c) The number of cases used in the case-control study.

We defined a very simple measure of the extent to which two distributions over-

lap. We did not attempt to define a cut-off point, beyond which GAIC might deem

a genetic test to be insufficiently reliable to be used in underwriting, but the results

we obtained ranged across all values of this measure, showing that in some cir-

cumstances a genetic test would almost certainly be deemed reliable, and in other

circumstances it would almost certainly be deemed unreliable.

On the basis of this simple model, we conclude that the ability of case-control

studies based on UK Biobank to identify distinct CI underwriting classes was

marginal. If a very large number of cases was used, quite reliable discrimination

was achieved, but this is a very expensive option. If a more realistic number of

cases was used — a few thousands — the power to discriminate quickly diminished.

In particular, it was clear that if the effects of the adverse genotype and adverse

environment were any less than we had assumed, the power to discriminate would

be rather poor.

This conclusion ought to bring comfort to those who are worried about insurers’

use of genetic information, and to insurers themselves. This is particularly important

during the 5 to 10 years that must pass before UK Biobank itself starts to yield

results. We have found no support for the idea that very large-scale genetic studies

like UK Biobank will lead to significant changes in underwriting practice.

77

Page 94: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Our study has been very simple and idealised in several respects mentioned above.

Most obviously, our genetic model is not truly multifactorial, although it does allow

for a basic environmental interaction. Further research is in hand to extend the

model to a more realistic, though still hypothetical, representation of a multifactorial

genetic contribution to heart attack. Our aim will be to find out whether this will

strengthen or weaken the discriminatory power of genetic tests, along the lines that

GAIC has pioneered for single-gene disorders. Another point that will repay further

study is the possibility of model mis-specification.

78

Page 95: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Chapter 5

Adverse Selection and Utility

Theory

5.1 Risk and Insurance

An individual faces financial risk all the time. Be it the risk of losing one’s home

due to fire, flood, earthquake, or loss of a steady source of income due to failing

health; an individual is constantly undertaking huge financial risks. Although the

probability of such a high risk event is small, the resulting loss could be enormous

and potentially catastrophic for an individual.

Facing an uncertain future, an individual might do nothing and gamble on the

risk event not happening. Or, the individual can purchase insurance and pass the

risk on to an insurer at an appropriate price. So, which of the two options should

an individual choose? Economic studies, like Pratt (1964), show that individuals

are generally risk averse. If affordable, an individual would not gamble and would

opt for insurance protection. Of course, the price of insurance plays an important

role. If the insurance premium is set at the actuarially fair price for the risk, it can

be shown that a risk-averse individual would always put a higher value on insurance

as against gambling with the risk. Pursuing this further, it can also be proved that

risk-averse individuals are actually willing to pay more than the fair price for the

risk cover, up to a certain maximum. For more details on rational behaviour in

purchasing insurance coverage against a given risk please refer to Mossin (1968).

79

Page 96: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

This risk-averse nature of individuals provides the business incentive for insurers to

operate in the market.

While a solitary individual prefers to insure against risks, insurers are in the

business of accepting risks. By pooling risks, an insurer can become virtually risk-

neutral. Coupled with the fact that a risk-averse individual is willing to pay more

than necessary, the insurer can charge a premium which will not only cover the

expected cost of claims, but also contribute to their profit margin.

However, an insurer cannot charge an arbitrarily high price, for several reasons.

Firstly, beyond a certain maximum, even the risk-averse individuals will find the

price unattractive, which sets an upper limit for the premium that can be charged.

Moreover, in a competitive market, where individuals can choose between competing

products, they will always buy the cheapest one available, all else being equal. So,

competition ensures that insurance is sold at prices much lower than the upper limit

that the risk-averse individuals could have paid. In fact, in a competitive market,

the equilibrium position for all insurance companies is to charge the fair actuarial

price for the risk involved. Rothschild and Stiglitz (1976) provides a model for

risk-neutral insurance firms in a competitive market.

What can we infer from all this? If the insurance companies can only charge

an actuarially fair premium, could the knowledge that the consumers would have

actually paid more be used under some other circumstances? As we will show here,

the answer to that question is, yes. In the remainder of the chapter, we will see

that in certain situations where insurers do not have access to consumers’ private

information, the upper limits for insurance premiums become relevant. We will

illustrate our results using CI insurance. But before proceeding further, we will

discuss the circumstances that lead to information asymmetry.

5.2 Underwriting Risk

Each individual is unique, their circumstances are different and so are their risk

profiles. So, even if two individuals wish to purchase the same cover from the same

insurer, still they might find that they have to pay very different prices. Insurers

80

Page 97: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

would always want to charge a premium which is at least equal to the actuarially fair

price which is commensurate with the risk they are accepting. Although competition

ensures that they cannot charge more than the fair price, they do not want to quote

a lower price, as this would result in losses. So consumers with a higher risk profile

would be expected to pay a higher premium than those with lower risks.

Charging an appropriate premium for a risk involves a good understanding of

how different factors contribute to the risk in question. The factors which have a

quantitative impact on the risk are identified and are commonly referred to as risk

factors. Different levels of exposures of these risk factors would indicate different

levels of risks. In other words, exposure levels of risk factors stratify an insurer’s

consumer base into a number of homogeneous groups of individuals. Appropriate

premiums can then be set for these groups of individuals based on their exposures

to these risk factors. As and when a potential consumer approaches the insurer

for cover, the individual’s exposure to the risk factors would dictate the premium

to be charged. This is, broadly, how underwriting strategies work for insurance

companies.

However, acquiring information on risk factors has its disadvantages. Firstly,

there are costs associated with it. A piece of information is only useful for under-

writing purposes if the advantages outweigh the cost of acquiring it. The risk factors

which satisfy this economic criterion can then be used for underwriting purposes.

As more and more risk factors come to light through medical research, it is also an

evolving process. This is very relevant for recent developments in genetics, as the

role of genes in an individual’s health becomes clearer. However, as of now, genetic

tests are expensive and it needs further research to establish the relative efficiency

of these tests as underwriting tools.

More importantly, though, there are ethical considerations in accessing private

information. Let us discuss this in the context of CI insurance. The risk of CI is

affected by, among other things, age, gender, lifestyle and genotype. Clearly, CIs are

more common at advanced ages. Medical research has also established differences

between CI incidences for males and females. The same is true for some lifestyle

factors like smoking habits. Use of these items of information for underwriting has

81

Page 98: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

become standard and is widely practised in the industry.

However, the use of certain information on environmental exposures and genetic

test results has proved more controversial. Unlike smoking habits, there are some

environmental exposures which are beyond an individual’s control. And genes are

even more intrinsic to human beings, as individuals are born with them. Given

this backdrop, should insurers be allowed to use such information to discriminate

between individuals? In many countries, a ban has been imposed, or moratorium

agreed, limiting the use of genetic information. In the UK, GAIC is providing

guidance to insurers on the acceptable use of genetic information. As it stands now,

insurers are only allowed access to genetic test results for covers exceeding a certain

prescribed level.

Clearly, the regulators are now responsible for formulating policies on ethical

issues, while the insurers do not access genetic information for the majority of cases.

It is imperative here to understand the role of different types of genetic disorders

that might affect an individual’s health.

5.3 Multifactorial Disorders

We have discussed genetic disorders in detail in Chapter 1. In this section, we will

recount briefly the main issues.

Disorders caused by mutations in single genes, which may be severe and of late

onset, but are rare, have been quite extensively studied in the insurance literature,

see Macdonald and Pritchard (2000) for an example. One reason is that the epidemi-

ology of these disorders is relatively advanced, because biological cause and effect

could be traced relatively easily. The conclusion has been that single-gene disorders

do not expose insurers to serious adverse selection, in large enough markets, because

of their rarity.

The vast majority of the genetic contribution to human disease, however, will

arise from combinations of gene varieties (called ‘alleles’) and environmental fac-

tors, each of which might be quite common, and each alone of small influence but

together exerting a measurable effect on the molecular mechanism of a disease.

82

Page 99: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Some combinations may be protective, others deleterious. These are the multifac-

torial disorders, and they are the future of genetics research. Their epidemiology is

not very advanced, but should make progress in the next 5–10 years through the

very large prospective studies now beginning in several countries. As discussed in

earlier chapters, one of the largest is the Biobank project in the UK, with 500,000

subjects. UK Biobank will recruit 500,000 people aged 40 to 69 from the general

population of the UK, and follow them up for 10 years. The aim is to capture both

genetic and environmental variations and interactions, and relate them to the risks

of common diseases. If successful, the outcome will be much better knowledge of the

risks associated with complex genotypes. Thus the genetics and insurance debate

will, in the fairly near future, shift from single-gene to multifactorial disorders.

5.4 Literature Review

Any model used to study adverse selection risk must incorporate the behaviour of

the market participants. Most of those applied to single-gene disorders in the past

did so in a very simple and exaggerated way, assuming that the risk implied by

an adverse genetic test result was so great that its recipient would quickly buy life

or health insurance with very high probability. These assumptions were not based

on any quantified economic rationale, but since they led to minimal changes in the

price of insurance this probably did not matter. The same is not true if we try

to model multifactorial disorders. Then ‘adverse’ genotypes may imply relatively

modest excess risk but may be reasonably common, so the decision to buy insurance

is more central to the outcome.

Information asymmetry and adverse selection have been considered before. Do-

herty & Thistle (1996) pointed out that, under symmetric information, the private

value of information is negative and insurance deters people from taking diagnostic

tests. This is because, from an individual’s perspective, before undertaking the test,

the premium is a random variable and, being risk-averse, the individual will decide

against testing and opt for an average premium instead. On the other hand, if

the insurer cannot observe test results, acquiring information has a positive private

83

Page 100: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

value as it enables an informed choice to be made. However, as insurers adjust their

premiums to guard against adverse selection, there is a loss of market efficiency. The

authors used a general insurance model to show that insurers can only allow partial

cover for the lowest risk group if positive (beneficial) test results cannot be reported.

If reporting verifiable positive test results was allowed, the lowest risk group could

buy full coverage at the lower price. However, uninformed individuals would pay the

same higher premium charged to the high risk group. Assuming costless informa-

tion, this provides an incentive for taking diagnostic tests. The authors concluded

that the loss of efficiency in the insurance market should be weighed against the

increased value of private information.

Hoy & Polborn (2000) analysed the same problem in a life insurance model. As

life insurance companies do not share information, restricting insurance cover is not

a viable option against adverse selection. Instead the authors propose an income

protection model, which they then use to compute an optimal insurance coverage.

Under specific assumptions, they showed that for a fixed insurance premium, ap-

petite for insurance cover increases with risk. The authors constructed scenarios

where the effect, on welfare, of a new test could go either way. A Pareto worsen-

ing happens when very high risk individuals opt for insurance only when the test

produces very bad news. This increases the average insurance premium for life

insurance buyers and worsens everybody’s situation. On the other hand, if the in-

dividuals with positive (beneficial) test results have lower risk than the average life

insurance buyer, then there is a Pareto improvement. The authors also investigated

a third scenario under which individuals who go for tests gain and those who do not

lose. As, currently, very few people have diagnostic genetic tests, individuals with

bad news can only move insurance premiums by very small amounts in practice.

However, the authors conclude that if tests become cheaper and widely available,

testing could lead to either Pareto improvements or worsening.

Hoy & Witt (2005) applied the results from Hoy & Polborn (2000) to the specific

case of the BRCA1/2 breast cancer genes. They simulated the market for 10-year

term life insurance policies targeted at women aged 35 to 39. They stratified the

consumer base into 13 risk categories based on family background information. This

84

Page 101: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

information is also available to insurers. Then within each risk group, they checked

the impact of test results for BRCA1/2 genes on welfare effects, using iso-elastic

utility functions. The authors showed that in the presence of a high risk group, and

in the presence of information asymmetry, the equilibrium insurance premium can

be as high as 297% of the population weighted probability of death.

All these papers assume that the genetic epidemiology implies that genetic tests

carry very strong information about risk; true of some single-gene disorders but

unlikely to be so true of multifactorial disorders. They concentrate primarily on

providing a proper economic rationale for the impact, on the insurance market, of

genetic tests for, mainly, rare diseases. Here, we try to bring together plausible

quantitative models for the epidemiology and the economic issues, in respect of

more common disorders, therefore affecting a much larger proportion of the insurer’s

customer base. We wish to find out under what circumstances adverse selection is

likely to occur.

5.5 Adverse Selection

We suppose that individuals are risk-averse, have wealth W and aim to buy CI

insurance with sum assured L ≤ W . Their decision is governed by expected utility,

conditioned on the information available to them. Insurers, in a competitive market,

charge an actuarially fair premium P , equal to the expected present value of the

insured loss, conditioned on the information available to them. See for example

Hoy and Polborn (2000) for a similar market model. Because they are risk-averse,

individuals will be willing to pay a premium up to a maximum of P ∗ > P , provided

that they and the insurer have the same information. We can then consider the

effect of genetic information that is only available to applicants.

We propose a simple model of a multifactorial disorder, with two genotypes and

two levels of environmental exposure, and either additive or multiplicative interac-

tions between them. These factors affect the risk of myocardial infarction (heart

attack), therefore the theoretical price of CI insurance. However these price dif-

ferences are not very large. To begin with, the risk factors are not observable,

85

Page 102: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

because the epidemiology is unknown, or the necessary genetic tests have not yet

been developed. Insurers therefore charge everyone the same premium, which is

the appropriate weighted average of the genotype and environment-specific premi-

ums. Subsequently, genetic tests that accurately predict the risk become available,

but only to individuals; insurers are barred from asking about genotype. Adverse

selection therefore becomes a possibility.

5.6 Utility of Wealth

Utility theory has its roots in the early works of the utilitarian philosophers, includ-

ing Bentham (1789) and Mill (1879). They proposed that people ought to desire

those things that will maximise their utility, where utility is measured in terms

of happiness or satisfaction gained from consumption of commodities. Among the

early applications, Daniel Bernoulli suggested the use of expected utility theory to

solve the St. Petersburg paradox. However, the first important breakthrough came

from Von Neumann and Morgenstern (1944), who used the assumptions of expected

utility maximisation in their formulation of game theory. The work of Nash (1950)

on optimum strategies for multiplayer games ushered in a new era and since then

utility theory has been at the forefront of economic research activity. For a full

exposition of utility theory, see Binmore (1991).

In this chapter, we will define utility functions as a measure of an individual’s

preference for wealth. In other words, an individual, hypothetically, assigns a value

U(w) to every amount of wealth w that she can possess. Figure 5.24 shows a

specimen utility function plotted against a person’s wealth. For this individual,

the utility of wealth, measured in terms of U(w), increases with wealth, w. This

is known as the non-satiation property which states that more wealth is preferred

than less wealth. The other feature of the utility function is that it is concave, i.e.,

the rate of increase of U(w) slows down as the wealth goes up. In other words,

the marginal utility of wealth decreases as the wealth increases. For this individual

the value of an extra pound is more when her existing wealth is £1 rather than

£1, 000, 000. This is known as the risk-aversion property.

86

Page 103: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

u(W )

u(W − qL)

u(W ∗) =(1− q)u(W )+qu(W − L)

u(W − L)WW − qLW ∗W − L

Uti

lity

Wealth

Figure 5.24: Utility of wealth for a risk averse individual.

Let us now formalise our definition of utility function in terms of the non-satiation

and risk-aversion properties. In mathematical terminology, the utility function for a

risk-averse individual is increasing and concave. Now, U(w) is concave on an interval

[a,b], if for any points w1 and w2 in [a,b] and for any α in (0,1), we have,

U [αw1 + (1− α)x2] > αU(w1) + (1− α)U(w2). (5.32)

If U(w) is twice-differentiable in [a,b], then a necessary and sufficient condition for

it to be concave on that interval is that the second derivative U ′′(w) < 0 for all

w in [a,b]. So a twice-differentiable utility function for a risk-averse individual has

the properties U ′(w) > 0 (non-satiation property) and U ′′(w) < 0 (risk-aversion

property).

From the above formulation, it is not readily obvious how concavity of utility

functions relates to risk-aversion. To understand the relationship, let us assume that

an individual with a concave increasing utility function U(w), has initial wealth W

from which he might lose L with probability q. The ultimate wealth is the random

variable X, where X = W −L with probability q and X = W with probability 1−q.The expected utility of this gamble from the individual’s perspective is:

87

Page 104: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

E[U(X)] = qU(W − L) + (1− q)U(W ). (5.33)

If he chooses, he can insure the risk for premium P , and accept W−P with certainty.

He should do so if:

U(W − P ) > E[U(X)] = qU(W − L) + (1− q)U(W ). (5.34)

In particular he should insure if the premium is equal to the expected loss qL since:

U(W − qL) = U(q(W − L) + (1− q)W ) > qU(W − L) + (1− q)U(W ). (5.35)

The inequality of Equation 5.35 can also be verified from Figure 5.24, which shows

that an individual values certainty more than a gamble. He is more willing to forgo a

fixed loss of amount qL than to participate in the gamble. This is why the individual

is risk-averse.

In fact, we can see from Figure 5.24 that a risk-averse individual is willing to run

her wealth down further than that which is required for a fair actuarial premium.

If W ∗ is the amount of wealth for which:

U(W ∗) = (1− q)U(W ) + qU(W − L) (5.36)

then the individual will be ready forgo a maximum of W −W ∗ in order to avoid the

gamble. As this is greater than the fair actuarial premium of qL:

P ∗ = W −W ∗ = W − U−1[(1− q)U(W ) + qU(W − L)] (5.37)

is the maximum premium that this person will pay for insurance. So in a market

where competition drives insurers to charge the actuarially ‘fair’ premium qL, in-

surance will be bought, but this is not the limiting case; insurance will be bought

as long as the premium is less than P ∗.

88

Page 105: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

5.7 Coefficients of Risk-aversion

Risk-aversion is the prerequisite for insurance. However, not all individuals have

the same attitude towards risk. Some individuals are more risk-averse than others.

In this section, we will define properties of utility functions which characterise an

individual’s attitude towards risk. For a comprehensive discussion on properties of

risk-averse utility functions, please refer to Pratt (1964).

Let us consider two utility functions U(w) and V (w), where for a > 0, V (w) =

aU(w) + b. In mathematical terminology, U(w) and V (w) are said to be related by

a positive affine transformation. How different are these two utility functions from

each other? If U(w) represents the utility function for a risk-averse individual, i.e.,

U ′(w) > 0 and U ′′(w) < 0, then so does the function V (w), i.e., V ′(w) > 0 and

V ′′(w) < 0. Now, assuming an initial wealth of W , if there is a risk of losing L with

probability q, how will decisions based on utility function U(w) be different from

those based on V (w)? Note that:

V −1[qV (W − L) + (1− q)V (W )] = V −1[a{qU(W − L) + (1− q)U(W )}+ b]

= U−1[qU(W − L) + (1− q)U(W )]. (5.38)

From Equations 5.37 and 5.38, we can see that the maximum premium payable under

both these utility functions is the same. So in a way, a positive affine transformation

has preserved the inherent characteristics of these utility functions.

To understand the underlying mechanics, let us define the absolute risk-aversion

function for a utility function U(w), as follows:

AU(w) = −U′′(w)

U ′(w). (5.39)

Clearly, for a positive affine transformation V (w) = aU(w) + b, we have:

AV (w) = −V′′(w)

V ′(w)= −aU

′′(w)

aU ′(w)= −U

′′(w)

U ′(w)= AU(w). (5.40)

So a positive affine transformation leaves the absolute risk-aversion functions unal-

tered. Conversely, if we assume AU(w) = AV (w) for two risk-averse utility functions

U(w) and V (w), then we have:

89

Page 106: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

V ′(w)

U ′(w)=

V ′′(w)

U ′′(w). (5.41)

(5.42)

Let us now define:

f(w) =V ′(w)

U ′(w). (5.43)

Taking derivatives of both sides:

f ′ =V ′′

U ′− V ′U ′′

(U ′′)2=V ′

U ′[V ′′

V ′ −U ′′

U ′]. (5.44)

From Equation 5.41, f ′ = 0 implying that V (w) = aU(w) + b where a > 0. So,

we can see that the absolute risk-aversion function is the same for two functions

which are related by a positive affine transformation. In other words, the absolute

risk-aversion coefficient fully characterises a utility function.

We will also introduce here a related quantity called the relative risk-aversion

function, defined as follows:

R(w) = AU(w)w = −U′′(w)w

U ′(w). (5.45)

5.8 Families of Utility Functions

We introduce two families of utility functions which we will use in examples through-

out the rest of the document.

(a) The Iso-Elastic utility functions are defined by:

UI(λ)(w) =

(wλ − 1)/λ λ < 1 and λ 6= 0

log(w) λ = 0.(5.46)

The condition λ < 1 ensures concavity. Log-utility is the limiting case as λ→ 0.

The family gets its name, iso-elastic, from the property that scaling wealth by

a certain amount k produces a utility function which is just a positive affine

transformation of the original utility function. In mathematical notation, for

90

Page 107: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

all k > 0, there exist some functions f(k) > 0 and g(k), which are independent

of wealth w, such that:

U(kw) = f(k)U(w) + g(k). (5.47)

It is easy to verify that this family of utility functions satisfies iso-elasticity:

UI(λ)(kw) =

kλUI(λ)(w) + (kλ − 1)/λ λ < 1 and λ 6= 0

UI(λ)(w) + log(k) λ = 0.(5.48)

This property plays an important role, as we will see later that individuals with

iso-elastic utility functions put more emphasis on the proportion of loss under

risk than the actual amount of loss itself.

The absolute risk-aversion function of UI(λ)(w) is:

A(w) =1− λ

w(5.49)

and the relative risk-aversion function is constant, R(w) = R = 1 − λ. Hence

higher λ means less risk aversion.

(b) The Negative Exponential family of utility functions is parameterised by a con-

stant absolute risk-aversion function A(w) = A, as follows:

UN(A)(w) = − exp(−Aw), where A > 0. (5.50)

Clearly, a higher value of A implies more risk aversion.

The Negative Exponential utility functions possess the interesting property that

they are invariant under any translation of wealth. In other words, for all k > 0,

there exist some functions f(k) > 0 and g(k), which are independent of wealth

w, such that:

U(k + w) = f(k)U(w) + g(k). (5.51)

It is easy to verify that for Negative Exponential utilities,

UN(A)(k + w) = exp(−kA)UN(A)(w). (5.52)

91

Page 108: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

We will see later that this property ensures that individuals with Negative Expo-

nential utility functions put all emphasis on the actual amount of loss, ignoring

completely their initial wealth.

The basic properties of these families of utility functions along with some simple

applications to portfolio optimisation are given in Norstad (1999).

5.9 Estimates of Absolute and Relative Risk-

aversion

To parameterise these utility functions, we need estimates of absolute or relative risk-

version coefficients. Eisenhauer and Ventura (2003) pointed out that past research

was inconclusive; estimates of average relative risk-aversion coefficients ranged from

less than 1 to well over 40. Hoy and Witt (2005) illustrated their model using

iso-elastic utilities with R = 0.5, 1 and 3. We will adopt a similar strategy, as

follows.

Eisenhauer and Ventura (2003) estimated the risk-aversion function based on a

thought experiment conducted by the Bank of Italy for its 1995 Survey of Italian

Households’ Income and Wealth. Under certain assumptions, they estimated that

a person with an average annual income of 46.7777 million lira had absolute risk-

aversion coefficient 0.1837, and relative risk-aversion coefficient 8.59.

Allowing for the sterling/lira exchange rate in 1995 (average £1 = 2570.60 lira

http://fx.sauder.ubc.ca/) and price inflation in the UK between July 1995 and

June 2006 (Retail Price Index 149.1 and 198.5, respectively) an average income of

46.7777 million lira in 1995 equates to about £24,226 in 2006, not very different

from the actual average of £25,810 (Jones (2005)).

We need utility functions of wealth, so an estimate of the wealth-income ratio

is required. Estimates of this ratio in the literature are quite varied. According to

Treasury (2005) in the U.K., it varies between 5 and 7 for total wealth, and between

2 and 4 for net financial wealth.

The Inland Revenue in the U.K. also publishes figures on personal wealth distri-

bution http://www.hmrc.gov.uk/stats/personal wealth/menu.htm. Their lat-

92

Page 109: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

est figure (for 2003) shows that 53% of the population has less than £50,000 and

83% has less than £100,000. As the distribution of wealth is positively skewed, we

will assume a total wealth of W = £100, 000. This gives a wealth-income ratio of 4

which is consistent with the figures published by Treasury (2005).

(a) The absolute risk-aversion function depends on the unit of wealth. Given util-

ity functions U(w) and V (w) related by U(cw) = V (w) for some constant c,

their absolute risk-aversion functions are related by AU(cw) = AV (w)/c. Using

the exchange and inflation rates above, we suppose that a Briton in 2006 has

absolute risk-aversion coefficient 8.967× 10−5 ≈ 9× 10−5, denominated in 2006

pounds.

(b) The relative risk-aversion function does not depend on the unit of wealth and

so the estimate of 8.59 can be used without any adjustment. We will use a

rounded-off value of 9 henceforth.

The formulation of utility functions with non-constant relative risk-aversion is an

active area of research. Meyer and Meyer (2005) specified a form of marginal utility

function which gives decreasing relative risk-aversion. Xie (2000) proposed a power

risk-aversion utility function which can produce increasing, constant or decreasing

risk-aversion depending on its parameterisation. These specialised utility functions

are not yet in widespread use and we will not consider them further.

We will use the following utility functions for the purposes of illustration:

(a) Iso-elastic utilities with parameter λ = 0.5, 0 and −8, which corresponds to

constant relative risk-aversion of 0.5, 1 and 9 respectively.

(b) Negative exponential exponential utility with absolute risk-aversion coefficient

A = 9× 10−5.

Since iso-elastic utility with λ = −8 has absolute risk-aversion coefficient equal

to 9 × 10−5 when wealth is £100,000, our assumption of W = £100, 000 allows us

to compare the two utility functions.

93

Page 110: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

94

Page 111: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Chapter 6

Adverse Selection in a 2-state

Insurance Model

6.1 A Simple Gene-environment Interaction

Model

We will illustrate the principles of underwriting long-term insurance in the presence

of a multifactorial disorder in the simple setting of the two-state continuous-time

model in Figure 6.25. We will also assume that all individuals have the same initial

wealth W and follow the same utility function of wealth U(w). The insured event

could be death or illness, and it is represented by transition from state A to state

B. The probability of transition is governed by the transition intensity λs(x), which

depends on age x, and the values of various risk factors which are labelled s (for

‘stratum’).

The risk factors arise from a 2× 2 gene-environment interaction model. That is,

-A B

λs(x)

Figure 6.25: A two state model

95

Page 112: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

there are two genotypes, denoted G and g, and two levels of environmental exposure,

denoted E and e. We assume that G and E are adverse exposures while g and e

are beneficial. Therefore, there are four risk groups or strata, that we label ge, gE,

Ge and GE. Let the proportion of the population at a particular age (at which

an insurance contract is sold) in stratum s be ws. The epidemiology is defined as

follows.

(a) We assume proportional hazards, so for each stratum s there is a constant ks

such that λs(x)/λge(x) = ks for all ages x. Clearly kge = 1.

(b) We assume symmetry between genetic and environmental risks, as follows:

(1) The probability of possessing the beneficial gene g is the same as the proba-

bility of exposure to the beneficial environment e, each denoted ω. Assum-

ing independence, wge = ω2, wgE = wGe = ω(1− ω) and wGE = (1− ω)2.

(2) We assume that kgE = kGe = k.

(c) The gene-environment interaction is represented by either an additive or a mul-

tiplicative model, as follows:

(1) Additive Model: kGE = kGe + kgE − kge = 2k − 1.

(2) Multiplicative Model: kGE = kGekgE/kge = k2.

See Woodward (1999) for a discussion of additive and multiplicative models.

Therefore, the epidemiology is fully defined by the parameters λge(x), ω and k

along with the choice of interaction.

6.2 Single Premiums

For simplicity, let the force of interest be δ = 0. (This is consistent with the

assumptions of Doherty and Thistle (1996), Hoy and Polborn (2000) and Hoy and

Witt (2005).) Then the single premium for an insurance contract of term n years,

with sum assured £1, sold to a person aged x who belongs to stratum s is:

qs = 1− exp

[−

∫ t

0

λs(x+ y)dy

]= 1− (1− qge)

ks . (6.53)

If the proportion of insurance purchasers aged x is the same as the proportion

in the population, ws (for example if the stratum is not known to applicants or to

96

Page 113: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

insurers) observation of claim statistics will lead the insurer to charge a weighted

average premium rate

q =∑

s

wsqs =∑

s

ws[1− (1− qge)ks ] = 1−

∑s

ws(1− qge)ks (6.54)

per unit sum assured. Given our assumption that the ks can all be expressed as

simple functions of k, the stratum-specific and average premium rates can also be

expressed as qs(k) and q(k). In particular, a neat expression can be derived using the

assumption of an additive model along with symmetry between genetic and environ-

mental risks, set out in Section 6.1. Starting from Equation 6.54 and incorporating

these assumptions, we get:

1− q(k) =∑

s

ws(1− qge)ks

= ω2(1− qge) + 2ω(1− ω)(1− qge)k + (1− ω)2(1− qge)

2k−1

= (1− qge)[ω2 + 2ω(1− ω)(1− qge)

k−1 + (1− ω)2(1− qge)2(k−1)]

= (1− qge)[ω + (1− ω)(1− qge)k−1]2. (6.55)

Alternatively, given values of q, qge and ω, one can solve Equation 6.55 for k, using:

k = 1 +log

[√1−q

1−qge− ω

]

log(1− ω). (6.56)

6.3 Threshold Premium

Suppose all individuals have initial wealth W and that the net effect of suffering the

insured event in the next n years is a loss of L. We assume partial insurance is not

possible, so that the individual insures against the full loss L or does not insure at

all. Define the loss ratio f = L/W . If no-one knows to which stratum they belong

everyone will be willing to pay a single premium of up to:

P ∗ = W − U−1[q(k)U(W − L) + (1− q(k))U(W )]. (6.57)

However, someone who knows they are in stratum s will be willing to pay a single

premium of up to:

97

Page 114: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

P ∗s = W − U−1[qs(k)U(W − L) + (1− qs(k))U(W )]. (6.58)

P ∗s is smallest for stratum ge. So if the insurer, ignorant of the stratum, continues

to charge premium q(k)L, adverse selection will first appear if q(k)L > P ∗ge. That

is, if:

U(W − q(k)L) < qge(k)U(W − L) + (1− qge(k))U(W ). (6.59)

6.4 The Additive Epidemiological Model

Replace the inequality in Equation 6.59 with an equality and solve for k; this repre-

sents the relative risk (of each risk factor) with respect to stratum ge, above which

persons who know they are in stratum ge will cease to buy insurance. Doing this

with iso-elastic utility with λ 6= 0 we obtain:

(1− q(k)f)λ = qge(1− f)λ + (1− qge). (6.60)

In the special case of logarithmic utility (iso-elastic utility with λ = 0) we obtain:

1− q(k)f = (1− f)qge (6.61)

and under negative exponential utility:

eq(k)AL = qgeeAL + (1− qge) (6.62)

in which wealth W does not appear. As expected, risk preferences characterised by

different utility functions produce different values of q(k). Once q(k) is obtained

for a particular utility function, the value of k can be derived from Equation 6.56.

Specifically, we have solved Equations 6.60, 6.61 and 6.62 for certain values of base-

line risk qge and loss L, assuming an initial wealth of W = £100, 000. Then using

ω = 0.5 (a uniform distribution across strata) and an additive model, we solve for

k. The results are in Table 6.23. We observe the following:

98

Page 115: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 6.23: The relative risk k above which persons in stratum ge with initial wealthW = £100, 000 will not buy insurance, using ω = 0.5 and an additive model.

Utility loss L in £’000Function qge 10 20 30 40 50 60 70 80 90

0.1 1.025 1.053 1.085 1.122 1.165 1.217 1.284 1.373 1.5130.2 1.024 1.050 1.081 1.116 1.158 1.209 1.274 1.364 1.506

I(0.5) 0.3 1.022 1.047 1.076 1.110 1.150 1.200 1.263 1.352 1.4970.4 1.021 1.044 1.072 1.103 1.142 1.189 1.251 1.339 1.4860.5 1.019 1.041 1.066 1.096 1.132 1.178 1.238 1.324 1.4720.1 1.051 1.110 1.180 1.264 1.368 1.504 1.691 1.976 2.5240.2 1.048 1.104 1.170 1.250 1.350 1.479 1.659 1.939 2.488

Log 0.3 1.045 1.098 1.160 1.235 1.330 1.453 1.626 1.898 2.4510.4 1.042 1.091 1.149 1.220 1.308 1.425 1.590 1.854 2.4130.5 1.039 1.084 1.138 1.203 1.286 1.395 1.551 1.805 2.3720.1 1.598 2.755 4.947 8.831 15.950 – – – –0.2 1.546 2.512 4.153 6.972 14.430 – – – –

I(−8) 0.3 1.498 2.322 3.664 6.148 – – – – –0.4 1.451 2.163 3.313 5.810 – – – – –0.5 1.405 2.023 3.035 6.107 – – – – –0.1 1.566 2.504 3.917 5.793 8.036 10.574 13.428 16.739 20.8620.2 1.516 2.292 3.337 4.617 6.119 7.911 10.226 13.900 –

N(9e-5) 0.3 1.468 2.126 2.963 3.972 5.204 6.857 9.812 – –0.4 1.423 1.987 2.684 3.536 4.655 6.519 – – –0.5 1.379 1.864 2.457 3.206 4.305 7.636 – – –

99

Page 116: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(a) For low loss ratios, even small relative risks k will cause people in the base-

line stratum to opt against insurance. This is as expected as small losses are

relatively tolerable.

(b) As the loss ratio f increases, so does the relative risk at which adverse selection

appears. This is simply risk aversion at work.

(c) The higher the baseline risk qge for a given loss ratio f , the lower the relative

risk at which adverse selection appears. This is the result of a concave utility

function, as the fair actuarial price increases and depletes wealth.

(d) Lower risk-aversion, under iso-elastic utility, (λ = 0.5) means that smaller rela-

tive risks would discourage members of the baseline stratum to buy insurance

at the average premium, and for higher risk-aversion (λ = −8) the reverse is

true.

(e) We have assumed here that everyone has the same utility function and that

partial insurance is not possible. This meant that in our model, individuals

either insure or decide not to insure. In reality, it is possible that individuals

would opt for partial insurance, which we ignore here to keep the model simple.

Comparing iso-elastic and negative exponential utilities, we see that the limiting

relative risks are broadly similar for smaller losses. For larger losses, however, iso-

elastic utility functions have much greater limiting relative risks. This is because

risk-aversion increases as wealth falls under iso-elastic utility, while for negative

exponential utility it is constant. As the fair actuarial premium for bigger losses

increases and depletes wealth, risk-aversion under iso-elastic utility climbs above

that under negative exponential utility, with the result shown.

6.5 Immunity From Adverse Selection

The missing entries in Table 6.23 mean that adverse selection never appears, what-

ever the relative risk k. Clearly, this must be related to the size of the high-risk

strata, and their ability, or otherwise, to move the average premium enough to affect

the baseline stratum. We may ask: given qge and f , is there some proportion wge

in the lowest risk stratum above which members of that stratum will always buy

100

Page 117: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

insurance at the average premium rate? Begin by noting that:

limk→∞

q(k) = limk→∞

∑s

ws[1− (1− qge)ks ] = wgeqge +

s 6=ge

ws = 1−wge(1− qge) (6.63)

and that this limit is not a function of the ks and thus holds for additive and

multiplicative models. As a check, it can be easily verified from Equation 6.55,

that the limit is valid for additive models. Now, substituting this limiting value in

Equations 6.60 to 6.62, we can solve for wge as follows, for iso-elastic utility with

λ 6= 0:

wge =1

1− qge

[1− 1− (qge(1− f)λ + (1− qge))

1/λ

f

], (6.64)

for logarithmic utility:

wge =1

1− qge

[1− 1− (1− f)qge

f

](6.65)

and for negative exponential utility:

wge =1

1− qge

[1− log[qgee

AL + (1− qge)]

AL

]. (6.66)

Values of ω = w1/2ge are given in Table 6.24. Values of ω < 0.5 in Table 6.24

correspond to missing entries in Table 6.23. Table 6.24 shows just how uncommon

an adverse exposure has to be to avoid adverse selection.

Assuming ω = 0.5 is perhaps extreme; it means that half the population possess

a significant genetic risk factor (modulated by environment) yet to be discovered.

This is by no means impossible, but we might expect most as-yet unknown risk

factors to affect a smaller proportion of the population, simply because they are as-

yet unknown. So, we increase ω to 0.9, so that only 10% of individuals are exposed

to the adverse environment or possess the adverse gene. The relative risks k at

which adverse selection appears are given in Table 6.25. They are larger than in

Table 6.23 because the relative risk experienced by the smaller number of high-risk

individuals has to be much higher to have the same impact on the average premium.

101

Page 118: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 6.24: The proportions ω exposed to each low-risk factor above which personsin the baseline stratum will buy insurance at the average premium regardless of therelative risk k, using different utility functions.

Utility loss L in £’000Function qge 10 20 30 40 50 60 70 80 90

0.1 0.999 0.997 0.996 0.994 0.991 0.989 0.985 0.981 0.9740.2 0.997 0.994 0.991 0.987 0.983 0.977 0.970 0.961 0.947

I(0.5) 0.3 0.996 0.992 0.987 0.981 0.974 0.966 0.955 0.941 0.9190.4 0.995 0.989 0.982 0.974 0.965 0.954 0.940 0.920 0.8900.5 0.993 0.986 0.978 0.968 0.956 0.942 0.924 0.899 0.8600.1 0.997 0.994 0.991 0.986 0.981 0.974 0.965 0.951 0.9260.2 0.995 0.989 0.981 0.973 0.962 0.949 0.932 0.906 0.859

Log 0.3 0.992 0.983 0.972 0.960 0.945 0.925 0.900 0.863 0.7980.4 0.989 0.977 0.963 0.947 0.927 0.902 0.870 0.823 0.7430.5 0.987 0.972 0.954 0.934 0.910 0.880 0.841 0.786 0.6930.1 0.969 0.916 0.830 0.719 0.603 0.496 0.398 0.304 0.2030.2 0.943 0.857 0.747 0.632 0.525 0.431 0.345 0.264 0.176

I(−8) 0.3 0.919 0.812 0.693 0.580 0.480 0.393 0.315 0.241 0.1610.4 0.897 0.776 0.653 0.543 0.448 0.367 0.294 0.225 0.1500.5 0.878 0.746 0.622 0.515 0.424 0.347 0.279 0.213 0.1420.1 0.971 0.927 0.868 0.802 0.738 0.682 0.635 0.595 0.5620.2 0.946 0.875 0.797 0.723 0.660 0.607 0.564 0.528 0.498

N(9e-5) 0.3 0.923 0.835 0.748 0.673 0.612 0.562 0.522 0.488 0.4610.4 0.903 0.802 0.712 0.637 0.577 0.530 0.492 0.460 0.4340.5 0.884 0.775 0.682 0.608 0.551 0.505 0.468 0.439 0.414

102

Page 119: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 6.25: The relative risk k above which persons in stratum ge with initial wealthW = £100, 000 will not buy insurance, using ω = 0.9 and an additive model.

Utility loss L in £’000Function qge 10 20 30 40 50 60 70 80 90

0.1 1.126 1.269 1.433 1.625 1.855 2.140 2.511 3.033 3.8990.2 1.120 1.258 1.419 1.613 1.852 2.158 2.577 3.212 4.419

I(0.5) 0.3 1.113 1.246 1.404 1.599 1.847 2.180 2.668 3.502 5.6890.4 1.106 1.233 1.387 1.582 1.841 2.210 2.807 4.108 –0.5 1.099 1.218 1.367 1.562 1.833 2.250 3.055 – –0.1 1.257 1.563 1.934 2.399 3.004 3.839 5.101 7.368 13.8410.2 1.246 1.546 1.923 2.418 3.107 4.170 6.164 13.981 –

Log 0.3 1.233 1.526 1.910 2.444 3.268 4.844 – – –0.4 1.220 1.504 1.894 2.482 3.555 8.317 – – –0.5 1.205 1.479 1.876 2.542 4.296 – – – –0.1 4.458 18.642 – – – – – – –0.2 4.823 – – – – – – – –

I(−8) 0.3 5.705 – – – – – – – –0.4 – – – – – – – – –0.5 – – – – – – – – –0.1 4.246 13.531 – – – – – – –0.2 4.514 – – – – – – – –

N(9e-5) 0.3 5.109 – – – – – – – –0.4 7.984 – – – – – – – –0.5 – – – – – – – – –

103

Page 120: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 6.26: The relative risk k above which persons in stratum ge with initial wealthW = £100, 000 will not buy insurance, using ω = 0.9 and a multiplicative model.

Utility loss L in £’000Function qge 10 20 30 40 50 60 70 80 90

0.1 1.125 1.265 1.424 1.608 1.825 2.090 2.431 2.907 3.7010.2 1.119 1.255 1.412 1.598 1.825 2.115 2.511 3.119 4.315

I(0.5) 0.3 1.113 1.243 1.398 1.586 1.824 2.144 2.617 3.447 5.6600.4 1.106 1.230 1.381 1.571 1.822 2.181 2.773 4.086 –0.5 1.098 1.216 1.362 1.553 1.817 2.229 3.037 – –0.1 1.254 1.549 1.899 2.328 2.880 3.645 4.839 7.107 13.7060.2 1.243 1.533 1.892 2.360 3.018 4.065 6.086 13.967 –

Log 0.3 1.231 1.516 1.884 2.399 3.212 4.805 – – –0.4 1.218 1.495 1.873 2.449 3.527 8.314 – – –0.5 1.203 1.472 1.859 2.521 4.288 – – – –0.1 4.223 18.561 – – – – – – –0.2 4.723 – – – – – – – –

I(−8) 0.3 5.676 – – – – – – – –0.4 – – – – – – – – –0.5 – – – – – – – – –0.1 4.024 13.391 – – – – – – –0.2 4.410 – – – – – – – –

N(9e-5) 0.3 5.073 – – – – – – – –0.4 7.981 – – – – – – – –0.5 – – – – – – – – –

6.6 The Multiplicative Epidemiological Model

Unlike Equation 6.55 for additive models, we cannot derive a neat expression for q(k)

in multiplicative models. However, the equations can easily be solved numerically.

Table 6.26 shows relative risks above which adverse selection appears, assuming

ω = 0.9 and a multiplicative model. They can be compared with the values in Table

6.25. We observe the following:

(a) The missing entries are the same as in the additive model. This is because the

limiting values of q(k) and ω do not depend on the model structure.

(b) The relative risk in stratum GE is higher in the multiplicative model (k2 >

2k − 1) so persons in the baseline stratum will be less tolerant towards any

given value of k. This is why the values in Table 6.26 are smaller than those in

Table 6.25.

104

Page 121: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(c) However the differences between the additive and multiplicative models are not

very large. If k ≈ 1, then k2 ≈ 2k−1, and for large values of ω (which arguably

is most realistic) the impact of stratum GE is relatively small. In view of this,

we will use only the additive model from now on.

105

Page 122: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

106

Page 123: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Chapter 7

Adverse Selection in a Critical

Illness Insurance Model

7.1 A Heart Attack Model

We now model the specific example of CI insurance. We will focus on heart attack

risk, building upon the material developed in earlier chapters.

(a) We will use the CI insurance model developed by Gutierrez and Macdonald

(2003), which we have already seen in Section 3.5.1. To recap, the authors

parameterised the CI model shown in Figure 7.26, using medical studies and

population data. Therefore, in particular, λ12(x) denotes the rate of onset of

heart attacks in the general population (different for males and females).

(b) In Chapter 3, we assumed that a 2 × 2 gene-environment interaction affected

heart attack risk, with genotypes G and g, and environmental exposures E and

e, upper case representing higher risk. So there were four strata for each sex —

ge, gE,Ge and GE. We showed that it is possible to hypothecate assumptions

on strata-specific relative risks, in a way which is consistent with the rate of

onset in the general population. We will use a similar technique here.

Consider all healthy individuals aged x. If q denotes the probability that a healthy

person aged x has a heart attack before age x + t, it can be calculated from the

heart attack transition intensity of the general population as follows:

107

Page 124: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

©©©©©©©©©©©©*

¡¡

¡¡

¡¡

¡¡

¡¡

¡¡µ

-@

@@

@@

@@

@@

@@@R

HHHHHHHHHHHHj

State 6 Dead

State 5 Other CI

State 4 Stroke

State 3 Cancer

State 2 Heart Attack

State 1 Healthy

λ12(x)

λ13(x)

λ14(x)

λ15(x)

λ16(x)

Figure 7.26: A full critical illness model.

q = 1− exp

[−

∫ t

0

λ12(x+ y)dy

](7.67)

Now, for males and females separately, let c denote the relative risk in the baseline

stratum ge with respect to the general population, and let ks denote the relative

risk in stratum s with respect to stratum ge, in both cases assumed to be constant

at all ages (in other words, we assume a proportional hazards model). If we denote

the rate of onset of heart attack in stratum s by λs12(x), it is given by:

λs12(x) = c× ks × λ12(x). (7.68)

Suppose that at age, x, the proportion of healthy individuals who are in stratum

s is ws. In stratum s, let qs be the probability that a healthy person age x has a

first heart attack before reaching age x+ t. Then using Equations 7.67 and 7.68, we

can show that:

108

Page 125: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

qs = 1− exp

[−

∫ t

0

λs12(x+ y)dy

]= 1− (1− q)cks . (7.69)

Equating the weighted average probability over all strata with the population prob-

ability, that is, q =∑wsqs, we have:

q =∑

ws[1− (1− q)cks ]. (7.70)

Given the relative risks, the population proportions and the estimated λ12(x), we

can solve this for c, which fully specifies the stratum-specific intensities λs12(x).

7.2 Threshold Premium for Critical Illness Insur-

ance

To extend the two-state insurance model of Section 6.1 to the CI model with six

states, we make some simplifying assumptions.

(a) We will model gene-environment interactions affecting heart attack risk alone,

leaving other intensities unaffected. This is not completely realistic, since many

known risk factors for heart disease are also risk factors for other disorders.

(b) The heart attack transition intensity is different for males and females. Figure

7.27 shows the ratio λ12(x)/∑5

j=2 λ1j(x) for both sexes. Heart attack is the

predominant CI among middle-aged men, while among women, heart attack is

increasingly prominent from age 30 onwards, but cancer is the dominant CI at

all ages. The ratio for males stays significantly higher than the ratio for females,

except at very high ages. Hence we might expect adverse selection to appear at

different relative risk thresholds for the two sexes.

7.3 Premium Rates for Critical Illness Insurance

As examples, we model single-premium CI insurance contracts of duration 15 years

sold to males and females aged 25, 35 and 45. First, assuming all transition in-

tensities are as given by Gutierrez and Macdonald (2003), we compute the single

109

Page 126: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50 60 70 80

Rat

io

Age (years)

MaleFemale

Figure 7.27: The ratio of heart attack transition intensity to total critical illnesstransition intensity, by gender.

Table 7.27: The premium rates of critical illness contracts of duration 15 years.

Age Male Female25 0.013787 0.01874635 0.048413 0.04971545 0.136363 0.110434

premiums as expected present values (EPVs) of the benefit payments by solving

Thiele’s differential equations (see Norberg (1995)) numerically. Again for simplic-

ity, we assume the force of interest δ = 0. Table 7.27 gives the CI premium rates

per unit sum assured for these contracts.

We make the same epidemiological assumptions as before, namely that kgE =

kGe = k; that an additive model (kGE = 2k − 1) applies, and that wge = ω2,

wgE = wGe = ω(1 − ω), and wGE = (1 − ω)2, where ω = 0.9 (the more realistic

assumption); and also that initial wealth is W = £100, 000. Given the relative risks,

we obtain c and hence the the heart attack intensity for each sex and stratum as in

Section 7.1. This allows us to calculate stratum-specific premium rates.

Let Ps denote the single premium rate for unit CI insurance in stratum s. Note

110

Page 127: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

that apart from the stratum-specific heart attack risk, Ps also covers the risk of

all other CIs, which are assumed to be the same for all strata. Let P denote the

population average premium rate for unit CI insurance (the averaging being over all

strata for a given gender). As before, since we are ignoring interest rates and profit

margins, the various premium rates defined above are the same as the probabilities

of the event insured against. Then define a function Z(P ) of a premium P as follows:

Z(P ) = U(W − PL)− [PU(W − L) + (1− P )U(W )]. (7.71)

Note that Z(Pge) < 0 is the condition under which adverse selection will appear,

equivalent to Equation 6.59 of Section 6.3. Or, let P † be the solution of Z(P ) = 0.

Then Pge < P † is the condition for adverse selection to appear. Tables 7.28 and

7.29 show P † for males and females respectively. It depends on the utility function

but not on the epidemiological model. For the 2-state model, Equation 6.59 was

central in our analysis. Given: (a) a model structure (additive or multiplicative),

the baseline risk qge, and the proportion ω with low values of each risk factor; and

(b) noting that the average risk q was an increasing function of the relative risk

parameter k; we obtained a minimum value of k for which adverse selection first

appears.

111

Page 128: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Tab

le7.

28:P†

for

mal

es,w

hic

hso

lvesZ

(P)

=0,

for

diff

eren

tco

mbin

atio

ns

ofuti

lity

funct

ions

and

loss

es,usi

ng

init

ialw

ealt

hW

=£1

00,0

00.

Uti

lity

loss

Lin

£’00

0Fu

ncti

onA

ge10

2030

4050

6070

8090

250.

0134

380.

0130

680.

0126

740.

0122

500.

0117

880.

0112

770.

0106

950.

0100

040.

0091

02I(0

.5)

350.

0472

290.

0459

690.

0446

220.

0431

670.

0415

770.

0398

080.

0377

880.

0353

780.

0322

1645

0.13

3321

0.13

0058

0.12

6534

0.12

2691

0.11

8448

0.11

3678

0.10

8172

0.10

1522

0.09

2679

250.

0130

950.

0123

740.

0116

200.

0108

260.

0099

800.

0090

650.

0080

550.

0068

910.

0054

23L

og35

0.04

6062

0.04

3604

0.04

1019

0.03

8282

0.03

5353

0.03

2171

0.02

8636

0.02

4543

0.01

9348

450.

1303

160.

1239

180.

1171

080.

1098

010.

1018

790.

0931

580.

0833

260.

0717

720.

0568

6525

0.00

8388

0.00

4503

0.00

2062

0.00

0773

0.00

0223

0.00

0045

0.00

0005

0.00

0000

0.00

0000

I(−

8)35

0.02

9922

0.01

6319

0.00

7596

0.00

2893

0.00

0849

0.00

0174

0.00

0021

0.00

0001

0.00

0000

450.

0877

520.

0499

120.

0242

720.

0096

740.

0029

780.

0006

420.

0000

810.

0000

040.

0000

0025

0.00

8554

0.00

4976

0.00

2733

0.00

1429

0.00

0719

0.00

0351

0.00

0167

0.00

0078

0.00

0036

N(9

e-5)

350.

0305

120.

0180

320.

0100

610.

0053

490.

0027

340.

0013

560.

0006

560.

0003

120.

0001

4645

0.08

9459

0.05

5093

0.03

2069

0.01

7804

0.00

9517

0.00

4938

0.00

2504

0.00

1247

0.00

0613

112

Page 129: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Tab

le7.

29:P†

for

fem

ales

,w

hic

hso

lvesZ

(P)

=0,

for

diff

eren

tco

mbin

atio

ns

ofuti

lity

funct

ions

and

loss

es,usi

ng

init

ialw

ealt

hW

=£1

00,0

00.

Uti

lity

loss

Lin

£’00

0Fu

ncti

onA

ge10

2030

4050

6070

8090

250.

0182

730.

0177

730.

0172

390.

0166

640.

0160

380.

0153

440.

0145

540.

0136

160.

0123

89I(0

.5)

350.

0485

000.

0472

090.

0458

270.

0443

340.

0427

020.

0408

860.

0388

130.

0363

390.

0330

9345

0.10

7899

0.10

5188

0.10

2269

0.09

9094

0.09

5600

0.09

1684

0.08

7179

0.08

1758

0.07

4580

250.

0178

090.

0168

330.

0158

110.

0147

340.

0135

860.

0123

440.

0109

710.

0093

880.

0073

90L

og35

0.04

7304

0.04

4782

0.04

2131

0.03

9322

0.03

6315

0.03

3050

0.02

9420

0.02

5217

0.01

9880

450.

1053

980.

1000

890.

0944

600.

0884

430.

0819

450.

0748

210.

0668

250.

0574

710.

0454

6325

0.01

1431

0.00

6150

0.00

2823

0.00

1060

0.00

0307

0.00

0062

0.00

0007

0.00

0000

0.00

0000

I(−

8)35

0.03

0745

0.01

6778

0.00

7814

0.00

2978

0.00

0875

0.00

0180

0.00

0021

0.00

0001

0.00

0000

450.

0702

190.

0394

380.

0189

240.

0074

380.

0022

560.

0004

790.

0000

590.

0000

030.

0000

0025

0.01

1657

0.00

6796

0.00

3740

0.00

1961

0.00

0989

0.00

0483

0.00

0231

0.00

0108

0.00

0050

N(9

e-5)

350.

0313

510.

0185

390.

0103

500.

0055

060.

0028

170.

0013

970.

0006

770.

0003

220.

0001

5145

0.07

1593

0.04

3550

0.02

5029

0.01

3714

0.00

7231

0.00

3700

0.00

1849

0.00

0908

0.00

0439

113

Page 130: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 7.30: The population average premium rate for CI insurance, P0, as if heartattack risk were absent (λ12 = 0).

Age Male Female25 0.009821 0.01832635 0.031290 0.04648545 0.092818 0.097947

We would like to do the same for the CI insurance model. However, there are

important differences between the two models.

(a) In the 2-state model we specified the baseline risk and relative risks, and these

determined the average risk. In the CI insurance model, we specify the average

risk (given by the population heart attack risk) and the relative risks, and these

determine the baseline risk, in the form of the relative risk c. Clearly increasing

the relative risk k will cause c to fall, hence also the premium Pge. To make this

dependence clear, we will write c(k) and Pge(k) in this section. It will also be

useful to note that the probability qge of a heart attack similarly depends on k,

and write qge(k).

(b) However, unlike in the 2-state model, Pge(k) has a lower bound, denoted P0,

given by the population average premium rate for CI insurance as if heart attack

risk were absent (λ12 = 0 and c = 0). These values are shown in Table 7.30.

They do not depend on the epidemiological model or the utility function. Clearly

Pge(k) ≥ P0, no matter how high k becomes. Thus we have two possibilities:

limk→∞ Pge(k) = P0 (equivalently limk→∞ c(k) = 0); or limk→∞ Pge(k) > P0

(equivalently limk→∞ c(k) > 0). We return to this point in Section 7.4.

(c) If Pge(k) is a strictly decreasing function, which it is for the utility functions

we are using, adverse selection is possible if limk→∞ Pge(k) < P †, and in such

cases we can solve Pge(k) = P † for the threshold value of k above which adverse

selection will appear. Tables 7.31 and 7.32 show these values for the various

utility functions and loss levels, for males and females respectively. The missing

values correspond to combinations of parameters such that limk→∞ Pge(k) > P †,

for which adverse selection will not appear.

(d) Another consequence of this is that there is a level of insured loss, that we

114

Page 131: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 7.31: The relative risk k above which males of different ages in stratum gewith initial wealth W = £100, 000 will not buy critical illness insurance policies ofterm 15 years, where ω = 0.9.

Utility loss L in £’000Function Age 10 20 30 40 50 60 70 80 90

25 1.484 2.111 2.960 4.183 6.117 9.698 18.869 105.569 –I(0.5) 35 1.376 1.846 2.450 3.262 4.420 6.226 9.509 17.715 93.578

45 1.389 1.886 2.544 3.456 4.808 7.027 11.388 24.239 –25 2.062 3.783 7.068 15.883 122.410 – – – –

Log 35 1.808 2.998 4.917 8.530 17.855 98.596 – – –45 1.843 3.138 5.339 9.794 23.063 765.192 – – –25 – – – – – – – – –

I(−8) 35 – – – – – – – – –45 – – – – – – – – –25 – – – – – – – – –

N(9e-5) 35 – – – – – – – – –45 – – – – – – – – –

Table 7.32: The relative risk k above which females of different ages in stratum gewith initial wealth W = £100, 000 will not buy critical illness insurance policies ofterm 15 years, where ω = 0.9.

Utility loss L in £’000Function Age 10 20 30 40 50 60 70 80 90

25 – – – – – – – – –I(0.5) 35 4.031 18.470 – – – – – – –

45 2.293 4.710 10.770 52.668 – – – – –25 – – – – – – – – –

Log 35 15.856 – – – – – – – –45 4.459 26.155 – – – – – – –25 – – – – – – – – –

I(−8.0) 35 – – – – – – – – –45 – – – – – – – – –25 – – – – – – – – –

N(9e-5) 35 – – – – – – – – –45 – – – – – – – – –

115

Page 132: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 7.33: The loss L0 in £,000 above which adverse selection cannot occur. Initialwealth W = £100,000.

Utility FunctionGender Age I(0.5) Log I(−8) N(9e-5)

25 82.3 51.8 7.1 7.2Male 35 92.3 62.6 9.2 9.5

45 89.9 60.4 8.9 9.225 8.9 4.5 0.5 0.5

Female 35 25.3 13.3 1.5 1.645 43.4 23.9 2.9 2.9

denote L0, above which adverse selection cannot occur, because fixing L > L0

in Equation 7.71 and solving for P † yields a solution P † < Pge(k) for all k.

Table 7.33 gives the values of L0, for the usual utility functions and initial

wealth £100,000. The missing values in Tables 7.31 and 7.32 occur for losses

L > L0.

The general pattern of threshold relative risks for males given in Table 7.31 is

similar to that in Chapter 6; what is of most interest are their absolute values, since

we have tried to suggest plausible models for both the risk model and the utility

functions.

(a) For iso-elastic utility with λ = −8 and negative exponential utility with param-

eter A = 9× 10−5, we find no evidence at all of adverse selection.

(b) For all utility functions and at all loss levels, if adverse selection can appear, it

does so at higher levels of relative risk than under the two-state model. This is

because the impact of the gene and environment on heart attack risk is diluted by

the presence of the other CIs. Only for the lowest levels of loss are these relative

risks in the range that might be typical of relatively common multifactorial

disorders; by definition, we do not expect studies like UK Biobank to lead to

the discovery of hitherto unknown high risk genotypes.

(c) When adverse selection can appear, the relative risk threshold first decreases

and then increases with age. This is because among CIs the importance of heart

attack peaks at around age 45 as can be seen from Figure 7.27.

116

Page 133: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

The threshold relative risks for females are given in Table 7.32. We observe the

following:

(a) The threshold relative risks are much higher than those for males, in all cases.

This is because heart attacks form a smaller proportion of all CIs for females,

so a larger increase in heart attack risk is needed to trigger adverse selection.

(b) As for males, at levels of absolute and relative risk-aversion that we regard as

most plausible (consistent with the Bank of Italy study) we find no evidence

that adverse selection is likely.

(c) In contrast to males, the threshold relative risks decrease with age. The reason

is clear from Figure 7.27; for females the relative importance of heart attack

increases with age.

(d) Adverse selection appears to be possible only for: (i) smaller losses; and (ii)

extremely low levels of risk aversion.

7.4 High Relative Risks

In Section 6.5, we considered relative risks that increased without limit, for the

simple 2-state insurance model. We saw that, even in this extreme case, if stratum

ge was large enough, adverse selection would not appear. In this section, we consider

high relative risks (of heart attack) in the CI insurance model.

We assume the heart attack rates in the general population λ12(x) are fixed at

their estimated values (Gutierrez and Macdonald (2003)). From Equation 7.70 we

obtain:

1− q = 1−∑

s

ws[1− (1− q)c(k)ks ]

= wge(1− q)c(k) +∑

s6=ge

ws(1− q)c(k)ks . (7.72)

Differentiation shows the right-hand side to be a decreasing function of c and of

each ks (s 6= ge), all other quantities held constant in each case. Also, if c = 1 the

right-hand side is less than (1 − q) while if c = 0 it is greater than (1 − q). Hence,

as we increase the ks without limit, c must decrease, and being bounded below it

117

Page 134: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

must have a limit. The limit could be zero or non-zero. We can easily see that if c

has a non-zero limit (necessarily positive) then the last term on the right-hand side

of Equation 7.72 vanishes and the limit must be:

limks→∞s6=ge

c(k) = 1− logwge

log(1− q)(7.73)

which in turn implies (1 − q) < wge. On the other hand if (1 − q) > wge, then c

cannot have non-zero limit, so the equation:

limks→∞s6=ge

s6=ge

ws(1− q)c(k)ks = (1− q)− wge (7.74)

holds. Since the left-hand side is finite, at least one of the products cks tends to a

finite limit as the ks →∞. However, we have not specified here how the quantities

ks (s 6= ge) jointly approach infinity, so the behaviour of c is not easy to analyse in

general. It is greatly simplified if the ks are simple functions of a single parameter

k, which is the case in our assumed epidemiological model (in which case we again

make explicit the dependence of c by writing c(k)). For example, under an additive

model with symmetry between genetic and environmental risks, Equation 7.72 can

be written as:

1− q = ω2(1− q)c(k) + 2ω(1− ω)(1− q)c(k)k + (1− ω)2(1− q)c(k)(2k−1)

= (1− q)c(k)[ω + (1− ω)(1− q)c(k)(k−1)]2 (7.75)

therefore:

k = 1 +log[(1− q)(1−c(k))/2 − ω]− log(1− ω)

c(k) log(1− q). (7.76)

If ω2 > (1 − q) then as k → ∞, the limiting value of c(k) is non-zero. Otherwise,

when ω2 < (1− q), c(k) → 0, and Equation 7.76 yields the finite limiting value:

limk→∞

c(k)k =log[(1− q)1/2 − ω]− log(1− ω)

log(1− q). (7.77)

So, in summary:

118

Page 135: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 7.34: q, the probability that a healthy person aged x has a heart attack beforeage x+ t, for policy duration t = 15 years.

Age Male Female25 0.004743 0.00054135 0.021454 0.00429945 0.059959 0.017616

limk→∞

c(k) =

0 if wge ≤ (1− q)

1− log wge

log(1−q)if wge > (1− q).

(7.78)

We want to find out if the baseline stratum ge can ever be large enough that

adverse selection will never appear, no matter how large k becomes. Hence we want

to understand the behaviour of limk→∞ Pge(k) as a function of wge. Equation 7.78

shows that we must treat separately the cases wge ≤ (1 − q) and wge > (1 − q).

Values of q are given in Table 7.34. (Note that P0 + q 6= P , because in a competing

risks model removing one cause of decrement increases the probabilities of the other

decrements occurring.)

(a) If P0 > P † the result is trivial, since limk→∞ Pge(k) ≥ P0 for any value of wge,

and adverse selection can never occur.

(b) If P0 < P † adverse selection will occur if wge ≤ (1 − q), since then

limk→∞ Pge(k) = P0.

(c) The non-trivial case is P0 < P † and wge > (1−q), since then limk→∞ Pge(k) > P0.

We can show that limk→∞ Pge(k) is an increasing function of wge in this range,

because the limit of the heart attack probability limk→∞ qge(k) is (use Equation

7.73 to write:

limk→∞

qge(k) = limk→∞

[1− (1− q)c(k)] = 1− (1− q)

wge

(7.79)

and differentiate). The function limk→∞ Pge(k) is continuous and increases from

P0 to P as wge increases from (1− q) to 1, the upper limit being attained when

all the strata have collapsed into one, and c = 1. Since P † < P for any concave

utility function, the intermediate value theorem guarantees that there exists a

119

Page 136: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 7.35: The proportions ω exposed to each low-risk factor above which personsin the baseline stratum will buy insurance at the average premium regardless of therelative risk k, using different utility functions, for males purchasing CI insurance.

Utility loss L in £’000Function Age 10 20 30 40 50 60 70 80 90

25 1.000 1.000 0.999 0.999 0.999 0.998 0.998 0.998 –I(0.5) 35 0.999 0.998 0.998 0.997 0.996 0.995 0.993 0.992 0.990

45 0.998 0.995 0.993 0.990 0.987 0.984 0.980 0.975 –25 1.000 0.999 0.999 0.998 0.998 – – – –

Log 35 0.999 0.997 0.995 0.994 0.992 0.990 – – –45 0.996 0.991 0.986 0.981 0.976 0.970 – – –25 – – – – – – – – –

I(−8) 35 – – – – – – – – –45 – – – – – – – – –25 – – – – – – – – –

N(9e-5) 35 – – – – – – – – –45 – – – – – – – – –

unique value of wge such that limk→∞ Pge(k) = P †; that is, such that adverse

selection can never appear if wge exceeds this value.

Tables 7.35 and 7.36 give the threshold values of ω = w1/2ge above which no adverse

selection takes place, in the additive model with gene-environment symmetry, for

males and females respectively. Missing values indicate that adverse selection will

never appear. When it is possible, the threshold value of ω ranges from 0.970 to 1

for males and 0.992 to 0.999 for females. As the relative risks in Tables 7.31 and

7.32 are based on ω = 0.9, this explains the missing values in those tables.

This pattern is quite unexpected. If adverse selection can occur, then a large

enough baseline stratum does confer immunity from it, but it has to be very large

indeed, all but a few percent of the population. But once the threshold is crossed,

adverse selection cannot appear at all, even if very few people are in the baseline

stratum. This had no counterpart in the 2-state model, and it is caused by the

presence of substantial other risks not affected by the gene-environment variants.

120

Page 137: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table 7.36: The proportions ω exposed to each low-risk factor above which personsin the baseline stratum will buy insurance at the average premium regardless of therelative risk k, using different utility functions, for females purchasing CI insurance.

Utility loss L in £’000Function Age 10 20 30 40 50 60 70 80 90

25 – – – – – – – – –I(0.5) 35 0.999 0.998 – – – – – – –

45 0.998 0.996 0.994 0.992 – – – – –25 – – – – – – – – –

Log 35 0.998 – – – – – – – –45 0.996 0.993 – – – – – – –25 – – – – – – – – –

I(−8) 35 – – – – – – – – –45 – – – – – – – – –25 – – – – – – – – –

N(9e-5) 35 – – – – – – – – –45 – – – – – – – – –

7.5 Conclusions

Until now, genetical research on information asymmetry and adverse selection has

taken one of two routes — models of single-gene disorders and work on the eco-

nomic welfare effects of genetic testing. Single-gene disorders, by their very nature,

are often severe and it is a reasonable first approximation to assume that private

information about risk makes insurance purchase highly likely. This is not so for

multifactorial disorders, where adverse gene-environment interactions are expected

to be much more common and lead to more modest risk differences. On the other

hand, the economic welfare approach concentrates primarily on efficiency losses in

the insurance market, and may be less concerned with the epidemiology. In this

paper, we have represented multifactorial disorders using standard epidemiological

models and analysed circumstances leading to adverse selection, taking economic

factors into account in a simple way through expected utility.

Logarithmic utility, although popular, may not reflect all risk preferences very

well. In particular, Eisenhauer and Ventura (2003) showed that consumers’ risk-

aversion is normally much greater than implied by logarithmic utility. We therefore

used utilities with both realistic and traditional risk-aversion coefficients to illustrate

121

Page 138: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

our results.

We used a simple 2 × 2 gene-environment interaction model, assuming that

information on status within the model was available only to the consumers and

not to the insurer. Competition leads insurers to charge actuarially fair premiums,

based on expected losses given the information they have. Adverse selection will not

occur as long as members of the least risky stratum (who know their status) can

still increase their expected utility by insuring at the average price.

First, we studied a simple 2-state insurance model, with constant relative risks

in different risk strata defined by the gene-environment model and sex. We found

that adverse selection does not appear unless purchasers are relatively risk averse

(compared with what we think to be a plausible parameterisation) and insure only

a small proportion of their wealth; or unless the elevated risks implied by genetic

information are implausibly high, bearing in mind the nature of multifactorial risk.

In many cases adverse selection is impossible if the low-risk stratum is large enough,

these levels being quite compatible with plausible multifactorial disorders.

We applied the same gene-environment interaction model, assumed to affect the

risk of heart attacks, to CI insurance. As heart-attack risk is just part of the risk of

all CIs, the impact of the gene-environment risk factor was diluted, compared with

the 2-state insurance model where the total risk was influenced. Our results showed

complete absence of adverse selection at realistic risk-aversion levels, irrespective

of the stratum-specific risks. Moreover, the existence of risks other than of heart

attack, and the constraint of differential heart-attack risk to be consistent with the

average population risk, introduced a threshold effect absent from the 2-state model.

When adverse selection was possible at all (low risk aversion, low loss ratios) only

an unfeasibly high proportion of the population in the low-risk stratum would avoid

it, but when the threshold was crossed adverse selection vanished no matter what

the size of the low-risk stratum.

The results from both 2-state and CI insurance models suggest that in circum-

stances that are plausibly realistic, private genetic information, relating to multifac-

torial risks, that is available only to customers does not lead to adverse selection.

This conclusion is strongest in the more realistic CI insurance model.

122

Page 139: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

We have not considered what might happen if insurers were allowed access to

this genetic information. The opportunity would then exist to underwrite using

that information. If one believed that social policy is best served by solidarity, the

important question is whether insurers would find it worthwhile to use the genetic

information. Further research would be useful, to investigate the costs of acquiring

and interpreting genetic information relating to common diseases, compared with

the benefits in terms of possibly more accurate risk classification, in both cases in

the context of multifactorial risk.

123

Page 140: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

124

Page 141: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Chapter 8

Conclusions

In Chapter 1, we set out our broad objectives for the thesis — to analyse how

gene-environment interactions in multifactorial disorders might affect current un-

derwriting practices of the insurance industry. Researchers have found that severe

single-gene disorders, due to their rarity, do not have a significant impact on in-

surance premiums. Equivalently, the extent of adverse selection was found to be

minimal. Multifactorial disorders, on the other hand, are much more common and

any medical development or breakthrough in this area is likely to have a major im-

pact on the insurance industry. With the setting up of large-scale cohort studies,

like the UK Biobank project, specifically to concentrate on multifactorial disorders,

this has become a real possibility. Given this backdrop, we tackled two fundamental

questions in this thesis:

(a) In the next 5–10 years, as results start emerging from UK Biobank, what will

be the impact of these on the insurance industry?

(b) Given the risk-averse nature of insurance purchasers, at what levels of gene-

environment interaction might an insurer face a realistic risk of adverse selec-

tion?

8.1 UK Biobank Simulation Study

In the first half of the thesis, we examined question (a). We chose heart attack as the

disorder of interest and hypothecated a simple 2×2 gene-environment interaction for

125

Page 142: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

the risk of heart attack. This segregated the study population into four strata with

varying risk-profiles based on the impact of the respective genes and environmental

factors. As the rates of onset of heart attack are significantly different for males

and females, we analysed the results separately for each sex. Based on this model,

we randomly simulated 500,000 life histories to generate data similar to what is

expected to emerge out of the UK Biobank project. An epidemiological analysis was

then carried out on the simulated data using case-control studies. The results, thus

obtained, were then used in an actuarial model to calculate CI insurance premium

rates for all strata.

This led us to the question, how reliable are these estimates of premium rates

based on which insurers can possibly justify discriminating between individuals with

different genes and with exposure to different environmental factors? In particular,

GAIC and other interested parties would want insurers to provide factual evidence

and rigorously demonstrate the justification of underwriting strategies based on

genetic information. So, we looked at the empirical distributions of the estimated

premium rates generated by simulating many replications of UK Biobank. We noted

that the strata-specific premium rates, as a proportion of the baseline premiums, are

uncorrelated and the extent of overlap of the empirical densities provided a measure

of reliability.

The main conclusions from the analysis are as follows:

(a) Our Base scenario assumptions reflected fairly common adverse genetic and en-

vironmental exposures with modest penetrances. This is what we would expect

for most common multifactorial disorders. We found that, if epidemiologists

opted for an extensive study, which included all heart attack cases in conjunc-

tion with a 1:5 matching strategy, reliable discrimination could be achieved.

However, this is also an expensive option and case-control studies with such

large numbers of cases and controls may not be economically viable. Case-

control studies with a few thousand cases coupled with a modest 1:1 matching

strategy, although realistic, quickly diminished the reliability of the estimates

and thus the power to discriminate.

(b) We also analysed the results by varying our assumptions of the frequencies and

126

Page 143: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

penetrances of the adverse exposures. The reliability of the estimates reduced

substantially when the proportion of adverse traits in the study population were

halved. Case-control studies also became increasingly harder to carry out as the

number of cases decreased and suitable matching controls with adverse traits

became rarer. Reduced penetrances had a similar impact with reduced ability

to discriminate between different risk-categories; the problems being more acute

for case-control studies with fewer cases and controls.

(c) The results were similar for both males and females when the number of cases

used in the case-control study was fixed in advance. If all cases were to be

included in the study, the estimates of premium rates for females were less

reliable than those for males. This is because heart attacks are rarer for females

and as a result total numbers of cases were fewer.

To summarise, we found that, unless the “adverse” genetic and environmental

factors are abundant or have significant penetrances, the inherent variability of

estimates obtained from case-control studies would make it difficult for insurers to

justify charging different premiums for different risk-groups. This result should bring

comfort to the regulators and other groups who are concerned about insurers using

genetic information to discriminate against the unfortunate few.

While carrying out our analysis we have made a number of simplifying assump-

tions to keep the problem tractable. Further research needs to be carried out to

analyse the implications of relaxing these assumptions. In particular:

(a) We have assumed a 2×2 gene-environment interaction, which is the simplest

of multifactorial models. However, most common disorders are likely to involve

higher order gene-environment interaction with complex interplay between mul-

tiple genes and environmental factors. Extending the simple 2×2 model to

general higher order interactions should produce interesting results.

(b) Caution should also be exercised in interpreting the results because they are

based on some idealised assumptions. In particular, we have ignored the prob-

lems of model mis-specification altogether. In reality, there are a number of

places where this can go wrong. As UK Biobank is essentially an unrepeatable

exercise, epidemiologists will have access to a single set of observations, based

127

Page 144: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

on which they will propose their models. It is thus highly unlikely that the

model will be “accurate”. A poor choice of epidemiological model would lead to

erroneous results and will inevitably have additional knock-on effects for stud-

ies based on these results. Mis-specification can also occur when an actuary

tries to develop his or her own model based on the results published by the

epidemiologists. The true implications of these need to be investigated further.

8.2 Adverse Selection Issues

In the second half of the thesis, we tackled the adverse selection issues in the con-

text of multifactorial disorders. In many countries, due to regulations or agreed

moratoria, genetic information is treated as private and insurers do not have access

to such information. This asymmetry of information can then lead to adverse se-

lection if individuals in the lowest risk-category find the average premium, charged

by the insurer, unacceptably high. Of course, this would depend on a number of

factors including the degree of risk-aversion of these individuals. Our objective was

to analyse the different factors, and the levels of these, which would lead to adverse

selection.

First, we assumed a 2×2 gene-environment interaction in a simple 2-state insur-

ance model. The factors of interest were:

(a) the baseline risk;

(b) the amount of loss insured as a proportion of total wealth;

(c) the proportion of individuals in the lowest risk-category; and

(d) the degree of absolute and relative risk-aversion.

For each of these factors, we analysed the levels of relative risks required to trigger

adverse selection. Our observations were:

(a) The higher the baseline risk, the lower is the level of relative risks of higher risk

strata at which adverse selection appears.

(b) As the amount of loss insured increased as a proportion of total wealth, higher

relative risks were required to trigger adverse selection.

128

Page 145: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(c) The more risk-averse the individuals, the higher are the relative risks required

for adverse selection.

(d) If the proportion of individuals in the lowest risk-category is high, relative risks

in other categories need to be large to move average premium rates so as to

trigger adverse selection. In fact, we found that if the lowest risk-category is

large enough, it is possible to achieve full immunity from adverse selection. Of

course, the levels at which this is attained depended on all other factors.

We then extended our results to a realistic example of a CI insurance model.

As in the UK Biobank simulation study, we hypothecated a 2×2 gene-environment

interaction on heart attack risk. We assumed that other illnesses covered under a CI

insurance contract remained unaffected by these genes and environmental factors.

The results from this model were along similar lines to those obtained from the

2-state insurance model. In particular:

(a) As the rates of onset of heart attack are different for males and females, we

analysed the impact separately. For females, the relative risks required for

adverse selection were substantially higher than those for males. This is because

heart attacks form only a small proportion of all CIs for females.

(b) The presence of other CIs diluted the impact of gene-environment interactions

on heart attack. As a result, the relative risks required for adverse selection were

generally much higher than those observed for the 2-state insurance model. In

fact, for individuals with empirical estimates of risk-aversion, adverse selection

did not appear at all.

(c) The existence of risks other than that of heart attack introduced a floor below

which CI insurance premiums could not fall even when risks of heart attack were

non-existent. This implied that when adverse selection was possible, immunity

from adverse selection was possible only at a very high proportion of population

in the lowest risk-category. Otherwise, adverse selection does not appear at all.

Results from both the 2-state insurance and CI insurance models confirm the key

message that under realistic assumptions, private genetic information does not lead

to adverse selection.

There are further research opportunities in a number of areas:

129

Page 146: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(a) As pointed out for the UK Biobank simulation model, extending the simple

2×2 gene-environment model to higher order interactions, might also produce

interesting results on adverse selection issues.

(b) An insurer’s decision to use genetic test results, if permitted, will depend on

a number of issues including the actual cost of these tests. If the costs are as

high as they are now, it might not make economic sense to use genetic tests

for underwriting purposes. However, as tests become cheaper, in future, the

balance might tilt in the other direction. It might be of interest to find out the

levels of cost at which genetic testing becomes an affordable underwriting tool.

(c) In our analysis, we made a simplifying assumption that all individuals wish

to insure the same amount of loss, as a proportion of wealth, irrespective of

their risk-profiles. This assumption can be relaxed, as Hoy and Polborn (2000)

showed that under certain assumptions, the appetite for cover increases with

risk. The techniques developed in this thesis can be extended to incorporate

these assumptions and analyse the situations where high-risk individuals could

opt for increased cover. Further research in this area might produce interesting

results.

130

Page 147: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Appendix A

Epidemiology

A.1 Introduction

Epidemiology is the study of diseases which tries to answer two fundamental ques-

tions:

(a) What causes a disease?

(b) Who are affected by a disease?

There might be a number of factors whose interplay manifests itself in the form

of a disease. With the advent of genetic knowledge, researchers have found out that

diseases can be caused by a genetic disorder. Or in other words, an individual with

a particular gene might have a higher or lower probability of contracting a disease.

This fact does not, however, diminish the role played by the environment on disease

susceptibility. For example, it is well documented that there are more smokers than

non-smokers among lung cancer patients. These factors, genetic and environmental,

which precipitate a disease are called risk factors and form the primary subject

matter of an epidemiological investigation.

The second question tries to ascertain the distribution of a disease. Instead of

looking at the population as a whole, it can be stratified into groups, the analysis of

which may show variability in disease susceptibility by strata. In an epidemiological

study, the usual stratifications are based on age, sex, social class, marital status,

racial group, occupation and geographical location. However, it is vital not to

overlook any other form of stratification which could explain the variation better.

131

Page 148: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

To answer the questions posed above, epidemiologists collect, analyse and inter-

pret data collected from groups of individuals. The results thus obtained and the

conclusions arrived thereof, apply directly to the individuals from whom the data

is collected. However, it is natural to seek to see if the results and conclusions can

be extended to a wider group. Of course, the ultimate goal of an epidemiological

study is to obtain results which can then be extended and be held to be valid for

the entire human population.

In practice, epidemiological investigations commence with an objective to obtain

results for a target population. For example, the UK Biobank protocol clearly states

that its objective is to investigate the risk of common multifactorial disorders of adult

life. So the target population here is the whole UK general population, and with a

wider focus – the entire human population.

Collecting data from the whole target population may not always be feasible. So

normally data is collected from a representative subset of the target population, the

study population. The UK Biobank project aims to collect data from a large cross-

section of individuals, at least 500,000 men and women, from the general population

of the United Kingdom.

Once the study population is identified, the focus shifts to the collection of ap-

propriate data for analysis. Ideally, each individual within the study population

should be followed-up through time. Every instance of disease should be recorded

along with data on plausible risk factors. Such a detailed study, sometimes called

a cohort study, can then provide direct information on the sequence of happenings

demonstrating causality. Moreover, being so detailed, cohort studies can analyse

many diseases simultaneously.

However, cohort studies are often very expensive and time consuming. Also,

they are not ideal for studying rare diseases as they would require either a very

large study population or a very long time span.

For studying rare diseases, resources can be used more efficiently by employing

case-control studies. Unlike cohort studies, where we follow-up every individual

within the study population prospectively for the entire duration of the study period,

in case-control studies individuals are chosen at the end of the study period according

132

Page 149: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

to their disease status. This is why case-control studies are retrospective studies.

In case-control studies, the first step is to identify a number of cases, subjects

with the disease under consideration. The next step is to select a number of controls,

subjects who are free from the disease. Controls should be a representative sample

of those individuals in the study population who do not have the disease, but had

the same chance as a case, to be classified as a case had they become diseased. This

is best achieved by matching at the design stage. Matching will be discussed in

detail in a later section.

While selecting cases and controls, care needs to be taken that the definitions of

both cases and controls are precise and strictly adhered to during the course of the

investigation. The other important consideration is the possibility of bias that may

arise if the chance of having a particular risk factor among chosen cases is different

from all those with the disease in the study population. The same consideration for

bias needs to be given for controls.

The data from the cases and the controls are then analysed to determine the

effect of different risk factors on these two groups.

As is evident, case-control studies are quicker and cheaper. The resources are also

focused to study the more interesting subjects, the cases, in great detail, which is all

the more crucial for rare diseases. In the UK Biobank project, it is envisaged that

analysis will take the form of case-control studies nested within the study population.

A schematic diagram for a case-control study is given in Figure A.28.

A.2 Measuring risks

Before we start analysing the data, let us clarify what we are trying to measure. In

simple terms, the goal is to measure the risk of a disease. So to start with we need

a formal definition of risk.

The risk of a disease can be defined as the probability of an individual becoming

newly diseased given that the individual has the particular attribute or risk-factor

in question.

We will introduce some notation here. Let S(t) be a stochastic process which

133

Page 150: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

All

Diseased

Healthy

Cases

Controls

-

-

-

-

TargetPopulation

StudyPopulation

Figure A.28: A schematic diagram of a case-control study.

records an individual’s state at time t. Let us also denote Pij(s, t) as the conditional

probability that the study subject is in state j at time t, given that it was in state

i at time s. In mathematical notation:

Pij(s, t) = Prob[S(t) = j|S(s) = i]. (A.80)

The conditional probability defined above is also known as a transition probabil-

ity. Using the transition probabilities, we can now define the transition intensity or

hazard rate, λij(t), as the instantaneous rate of change of probability at time t, of

moving from state i to state j, given that the subject is in state i at time t, i.e.,

λij(t) = limdt→0

Pij(t, t+ dt)− Pij(t, t)

dt, (A.81)

which can also be written as:

Pij(t, t+ dt) = Pi,j(t, t) + λij(t)× dt+ o(dt). (A.82)

The above definition can be simplified further by noting that a subject cannot remain

in two different states at any one particular instant of time, i.e.,

134

Page 151: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

-1 = Healthy 2 = Diseasedλ12(t)

Figure A.29: A 2-state model.

Pij(t, t) =

0 if i 6= j

1 if i = j

Using the fact that∑

j Pij(s, t) = 1 for all t ≥ s, we can derive a useful relation-

ship between the transition intensities. If we sum both sides of Equation A.82 we

get,

∑j

Pij(t, t+ dt) =∑

j

Pi,j(t, t) +∑

j

λij(t)× dt+ o(dt), (A.83)

which leads to,

∑j

λij(t) = 0, or equivalently, λii(t) = −∑

j 6=i

λij(t). (A.84)

Before proceeding further, let us work our way through a simple model with two

states – Healthy and Diseased, where the names of the states refer to a particular

disease. Let us assume that an individual always starts off healthy. During the

course of the investigation, the individual can either stay healthy or contract the

disease and move on to the Diseased state. Once in the Diseased state, we will

assume that the individual cannot turn healthy again. Figure A.29 gives a pictorial

representation of this 2-state model.

The transition intensity, λ12(t), gives the instantaneous rate of change of proba-

bility at time t of being diseased for a subject who is healthy up to time t. Let us

now derive a direct relationship between P12(·) and λ12(·) as follows. Using basic

probability theory,

P12(s, t+ dt) = P11(s, t)P12(t, t+ dt) + P12(s, t)P22(t, t+ dt). (A.85)

135

Page 152: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Using Equation A.82 and the fact that a subject cannot return to the Healthy state,

i.e. P22(t, t+ dt) = 1, we have:

P12(s, t+ dt) = P11(s, t) (P12(t, t) + λ12(t)dt+ o(dt)) + P12(s, t). (A.86)

Noting that P11(s, t) = 1 − P12(s, t) and P12(t, t) = 0, we can rewrite the above

equation as follows:

P12(s, t+ dt)− P12(s, t) = (1− P12(s, t))× (λ12(t)× dt+ o(dt)) . (A.87)

This leads to

1

1− P12(s, t)× d

dtP12(s, t) = λ12(t), (A.88)

which can be solved, noting the boundary condition of P12(s, s) = 0, to give:

P12(s, t) = 1− exp

(−

∫ t

s

λ12(u)du

). (A.89)

If the disease is rare, or the time period t − s is short, we can use a Taylor se-

ries expansion to obtain the following approximate relationship between P12(·) and

λ12(·).

P12(s, t) ≈∫ t

s

λ12(u)du. (A.90)

Moving on to a general multiple-state model, we can derive similar relationships

between transition probabilities and transition intensities. We will start off from a

generalised version of Equation A.85.

Pij(s, t+ dt) =∑

k

Pik(s, t)× Pkj(t, t+ dt). (A.91)

Now using Equation A.82, as before, we have,

Pij(s, t+ dt) =∑

k

Pik(s, t)× (Pkj(t, t) + λkj(t)× dt+ o(dt)) , (A.92)

which yields,

136

Page 153: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

d

dtPij(s, t) =

k

Pik(s, t)× λkj(t). (A.93)

We will discuss ways to solve these differential equations in Appendix B.1.

A.3 Models of Disease Association

In the previous section, we have formulated the risk of a disease through transition

probabilities and transition intensities. Now we will use these concepts to develop

models for measuring the effects of risk factors on a particular disease.

A risk-factor can have a number of levels. Suppose, we are interested in in-

vestigating the effect of smoking on lung cancer patients. Smoking habits can be

classified according to the average number of cigarettes smoked per day. The higher

the number, the higher is the level of exposure to the risk-factor of smoking. In-

vestigations can then be performed to figure out how the risk of lung cancer differs

from one level of risk-factor to the other.

In the simplest situation, we can have two levels of a risk-factor where an individ-

ual is either exposed to the factor or not. In the lung cancer example, people can be

classified as smokers and non-smokers. Analysts will then investigate how smoking

increases the risk of lung cancer. Here we will concentrate primarily on this binary

set-up.

Initially we will develop models to study effects of one risk-factor at a time. To do

this, care needs to be taken that the results are not distorted by the effects of other

risk factors. One way to ensure this is to stratify the study population according

to the levels of these other possible risk factors, and then analyse the effect of

the risk-factor in question within each such stratum. Going back to the example of

investigating the effect of smoking on lung cancer, suppose we believe that age is also

a risk-factor. Following the strategy outlined above, the study population needs to

be stratified according to age-groups. We then examine the effect of smoking within

each such age-group.

Extending the notation from the previous section, let λukij denote the transition

intensity from state i to state j for exposure status u and stratum k. We will assume

137

Page 154: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

that u can take values 1 or 0 depending on whether the individual is exposed to the

risk-factor or not.

One simple formulation to study the excess risk or, more accurately, the excess

rate of risk, which is

bkij = λ1kij − λ0k

ij . (A.94)

In most studies, the risk-factor in question is not the sole contributor to the risk of

the disease. Suppose that the total risk of the disease is the combined effect of the

risk-factor and some other general factors. In Equation A.94, by subtracting the

transition intensity of the unexposed group from that of the exposed group, we are

trying to eliminate the effects of those other general factors.

If our stratification is precise, then the difference represents the true effect of the

risk-factor in question. It should also remain stable from stratum to stratum. This

leads to the following simplification of Equation A.94:

bij = λ1kij − λ0k

ij , for all k. (A.95)

The model in Equation A.95 is also known as the additive model.

An alternative model to study disease association is to study the ratios of tran-

sition intensities instead of the differences. The formulation is as follows:

rkij =

λ1kij

λ0kij

. (A.96)

The ratio in Equation A.96 is known as the relative risk. Again under the assumption

that the effect of the general factors cancels out and the ratios remain stable from

stratum to stratum, we get the multiplicative model:

rij =λ1k

ij

λ0kij

, for all k. (A.97)

There is an interesting relationship between the additive and the multiplicative

model. If we take logarithms of both sides of Equation A.97, we get:

log rij = log λ1kij − log λ0k

ij . (A.98)

138

Page 155: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

p×Q1kijp× P 1k

ij

q ×Q0kijq × P 0k

ij

Total

HealthyDiseased

Exposed

Unexposed

Total

p

q

1p×Q1kij + q ×Q0k

ijp× P 1kij + q × P 0k

ij

Figure A.30: A 2× 2 table for stratum k with corresponding probabilities.

Clearly, Equations A.95 and A.98 have the same structure, except for the scale.

This is why, sometimes multiplicative models are also called log-linear models.

Another important fact to note here is that although all the models above are

specified in terms of the transition intensities, an equivalent formulation can be

achieved through transition probabilities. The relationship defined in Equation A.89

can be used for this purpose.

A.4 Relative Risk and Odds Ratio

In the previous section, we introduced the concept of relative risk. In epidemiological

research, it has become the most frequently used measure for associating exposure

with disease. Here we will develop the concept further by introducing odds ratios.

Using notation similar to the one used for transition intensities in the previous

section, let us denote P ukij as the transition probability from state i to state j, for

an individual from stratum k and exposure status u. If we assume that p is the

proportion of individuals exposed to the risk-factor in question, we can draw up the

2× 2 table in Figure A.30 for stratum k where q = 1− p and Qukij = 1− P uk

ij :

If the study period is reasonably short or the disease under consideration is

relatively rare, we can use the approximation given in Equation A.90 to obtain

the following relationship:

λ1kij

λ0kij

≈ P 1kij

P 0kij

. (A.99)

Using this, along with the definition of relative risk, we get:

139

Page 156: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

bkijakij

dkijckij

Total

HealthyDiseased

Exposed

Unexposed

Total

m1kij

m0kij

Nkijn0k

ijn1kij

Figure A.31: A 2× 2 table with data for stratum k.

rkij =

λ1kij

λ0kij

≈ P 1kij

P 0kij

. (A.100)

Let us now define the odds ratio ψkij, for stratum k, as the ratio of the odds of

disease in the exposed and non-exposed subgroups, i.e.,

ψkij = (P 1k

ij /Q1kij )÷ (P 0k

ij /Q0kij ) =

P 1kij Q

0kij

P 0kij Q

1kij

. (A.101)

Again based on the assumption that the study period is short or the disease is rare,

we get Q0kij ≈ Q1k

ij ≈ 1. This leads to the following approximate relationship between

ψkij and rk

ij.

ψkij =

P 1kij Q

0kij

P 0kij Q

1kij

≈ P 1kij

P 0kij

≈ rkij. (A.102)

A.5 Analysis of Grouped Data

Using the theory developed above, let us now proceed to draw inference based on

actual data. Hence forward we will state most of the results without any proof. For

details, please refer to Breslow and Day (1980) and Woodward (1999).

Suppose we are investigating the effect of a risk-factor on a particular disease. To

avoid distortion of results due to other risk-factors, the study population is stratified

into a number of strata. For each stratum of the study population, the data can be

summarised in a 2× 2 table, as given in Figure A.31.

From the data, we can obtain estimates of the transition probabilities as follows:

140

Page 157: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

P 1kij =

akij

(akij + bkij)

,

P 0kij =

ckij(ckij + dk

ij). (A.103)

These can then be used to derive an estimate of relative risk as follows:

rkij =

P 1kij

P 0kij

=ak

ij/(akij + bkij)

ckij/(ckij + dk

ij)=ak

ij/m1kij

ckij/m0kij

. (A.104)

Using a log transformation and normality assumption, the standard error can be

estimated by

se(logerkij) =

√1

akij

− 1

akij + bkij

+1

ckij− 1

ckij + dkij

. (A.105)

The estimate and the estimated standard error can then be used to obtain approxi-

mate confidence intervals for rkij. They can also be used to obtain p-values for testing

hypotheses on rkij.

Similarly, estimates can be obtained for the odds ratio:

ψkij =

P 1kij Q

0kij

P 0kij Q

1kij

=

akij

(akij+bk

ij)

dkij

(ckij+dk

ij)

ckij

(ckij+dk

ij)

bkij

(akij+bk

ij)

=ak

ijdkij

bkijckij

, (A.106)

se(logeψkij) =

√1

akij

+1

bkij+

1

ckij+

1

dkij

. (A.107)

Again, approximate confidence intervals and p-values can be obtained for ψkij using

these equations.

Note that the marginal totals, mukij , are meaningless for case-control studies, as

individuals are selected according to their disease status and not their exposure sta-

tus. As a result relative risks cannot be estimated for case-control studies. However,

no such problem exists for the estimation of odds ratios as the marginal totals cancel

out. However, if the disease is rare or if the study period is short, the odds ratios

are good approximation to the relative risks. So for case-control studies we will only

concentrate on the estimation of odds ratios.

141

Page 158: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Until now, we have calculated odds ratios separately for each stratum. However,

if we can assume that there is a common true odds ratio for each stratum and

the differences in the observed odds ratios are purely due to chance variation, the

estimate of the common odds ratio is given by the Mantel-Haenszel estimate:

ψij =

(∑

k

akijd

kij

Nkij

)/(∑

k

bkijckij

Nkij

). (A.108)

The estimate of the standard error of loge ψij, as proposed by Robins et al. (1986),

has the following form:

se(logeψij) =

√√√√∑

k UkijW

kij

2( ∑

k Wkij

)2 +

∑k U

kijX

kij +

∑k V

kijW

kij

2∑

k Wkij

∑k X

kij

+

∑k V

kijX

kij

2( ∑

k Xkij

)2 , (A.109)

where, for stratum k,

Ukij =

akij + dk

ij

Nkij

, V kij =

bkij + ckijNk

ij

, W kij =

akijd

kij

Nkij

, Xkij =

bkijckij

Nkij

. (A.110)

A.6 Analysis of Matched Studies

In a case-control study, individuals are selected according to their disease status. If

cases and controls are chosen independently, there is a chance that the profiles of

the individuals in the control group will be different from that of the cases. This

difference will then feed into the analysis to distort the results.

Matching is a method which tackles this problem by choosing controls based on

the profiles of the cases. Matching uses the concept of stratification, introduced

in Section A.3, to subdivide the study population into a number of strata. The

cases are first classified according to the strata they come from. Controls are then

chosen in such a way that they have a distribution similar to that of the cases across

strata. This ensures that analysis can be done within each strata, eliminating the

distortions arising out of the differences between strata.

142

Page 159: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

No exposures

1

0

1

0

Total

ControlsCases

Exposed

Unexposed

Total

1 1 2

2

0

Two exposures

0

1

0

1

Total

Cases Controls

Exposed

Unexposed

Total

1 1 2

0

2

One exposure

0

1

1

0

Total

ControlsCases

Exposed

Unexposed

Total

1 1 2

1

1

One exposure

1

0

0

1

Total

Cases Controls

Exposed

Unexposed

Total

1 1 2

1

1

Figure A.32: The types of table for each case-control pair in a 1:1 matching.

However, care needs to be taken to guard against over-matching. As an ex-

treme example, suppose that the study population is stratified for the risk-factor

in question. This will then result in the same distribution of cases and controls for

each exposure level. No conclusions can then be drawn from the analysis. So it

is important to leave aside the risk-factor in question while stratifying the study

population.

The simplest form of all matching is the 1:1 matching or pair matching. Here for

each case, a control is chosen from the same stratum irrespective of the exposure

status. A case-control pair can then be identified with one of the four possibilities

shown in Figure A.32.

If we assume that each case-control pair represents a stratum and that there exists

a common odds ratio for all strata, we can derive the Mantel-Haenszel estimate using

Equation A.108.

Let,

tu be the number of sets with u exposures, and

143

Page 160: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

mu be the number of sets with u exposures in which the case is exposed.

Using these notations in Equation A.108 we get,

ψij =

(∑

k

akijd

kij

Nkij

)/( ∑

k

bkijckij

Nkij

)

=t0 × 0

2+m1 × 1

2+ (t1 −m1)× 0

2+ t2 × 0

2

t0 × 02

+m1 × 02

+ (t1 −m1)× 12

+ t2 × 02

=m1

t1 −m1

. (A.111)

In other words, the estimate is the ratio of the number of exposed cases to the

number exposed controls where one of the case or the control is exposed. Note

that the sets where both case and control are exposed or where both are unexposed

contain no extra information. So these terms are eliminated from Equation A.111.

The standard error of the estimate can be derived using Equation A.109. How-

ever, when ti is small, Breslow and Day (1980), have provided a formula for an exact

100(1− α)% confidence interval (ψL, ψU), where

ψL =m1

(t1 −m1 + 1)Fα/2(2(t1 −m1 + 1), 2m1),

ψU =(m1 + 1)Fα/2(2(m1 + 1), 2(t1 −m1))

t1 −m1

. (A.112)

Here Fα/2(ν1, ν2) denotes the upper 100(α/2) percentile of the F distribution

with ν1 and ν2 degrees of freedom.

As it is highly likely that for a rare disease, there are more controls available than

there are cases, it is possible to develop a design where each case can be matched

to a number of controls, say c. Increasing c, increases the efficiency of the estimates

as the standard errors fall. However, for each increase in c, the marginal increase in

efficiency decreases. So, 1:c matching is rarely performed with c greater than 5.

For 1:c matching, using techniques similar to the one used for 1:1 matching, we

can derive the Mantel-Haenszel estimate of the odds ratio as follows:

ψij =

∑cu=1(c+ 1− u)mu∑c

u=1 u(tu −mu). (A.113)

Miettinen (1970) gives an approximate formula for the standard error of loge ψij:

144

Page 161: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

se(logeψij) =

[ψij

c∑u=1

(c+ 1− u)tu

(uψ + c+ 1− u)2

]−0.5

. (A.114)

Sometimes in a 1:c matching, it is possible that data from a few controls may

not be available. In this situation, a case can be matched against a number of

controls which is not fixed but can vary between 1 and c. This then becomes a

1:variable matching design. Using similar techniques, estimates of the odds ratio

can be obtained.

Let j denote the number of controls that are matched with any one case, where

j = 1, 2, · · · , c. The Mantel-Haenszel estimate is given by

ψij =

∑cv=1

∑vu=1 T

(v)u∑c

v=1

∑vu=1B

(v)u

, (A.115)

where,

T (v)u =

(v + 1− u)m(v)u

v + 1,

B(v)u =

u(t(v)u −m

(v)u )

v + 1. (A.116)

Also, Equation A.114 can be generalised to obtain the standard error of ψij in

Equation A.115.

se(logeψij) =

[ψij

c∑v=1

v∑u=1

u(v + 1− u)t(v)u

(uψij + v + 1− u)2

]−0.5

. (A.117)

The most general of all matching strategies is the many:many matching design.

Here a variable number of controls are matched against a variable number of cases.

Although conceptually more difficult, similar techniques can be used to derive the

Mantel-Haenszel estimate of the odds ratio.

Suppose that m(rs)uk is the number of matched sets with r cases and s controls in

which there are u exposures to the risk-factor, k of which are exposed cases.

ψij =

∑T

(rs)uk∑B

(rs)uk

, (A.118)

where,

145

Page 162: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

T(rs)uk =

k(s− u+ k)m(rs)uk

r + s,

B(rs)uk =

(u− k)(r − k)m(rs)uk

r + s. (A.119)

The standard error of the estimate can be derived using Equation A.109.

A.7 Effects of Combined Exposures

Until now, we have looked at models to study the effect of one particular risk-factor

at a time. However, in reality, all human diseases are caused by the combined

interactions of a number of risk factors. In Section A.1, we have briefly touched

upon gene-environment interactions, which study the combined effects of genetic

and environmental factors precipitating a disease. In this section, we will develop

models to analyse the effects of combined exposures on a disease.

Suppose we are interested in two risk factors A and B. Extending the notation

developed in Section A.3, let λuvkij be the transition intensity from state i to state

j with exposure level u of risk-factor A and exposure status v of risk-factor B, for

stratum k. As before, in the binary set-up, u and v can take values 1 or 0 depending

on the exposure status. In a similar way, we can extend the notation of relative risk

to ruvkij and odds ratio to ψuvk

ij .

Using this notation, in the binary set-up, for stratum k, we can define:

r11kij =

λ11kij

λ00kij

, r10kij =

λ10kij

λ00kij

, r01kij =

λ01kij

λ00kij

, and r00kij =

λ00kij

λ00kij

= 1. (A.120)

Recall the definition of excess rate of risk in Section A.3. Based on the same

concept, for two risk factors, we can define three types of excess rates of risk, as

follows:

λ11kij − λ00k

ij : When exposed to both A and B.

λ10kij − λ00k

ij : When exposed to A but unexposed to B.

λ01kij − λ00k

ij : When exposed to B but unexposed to A. (A.121)

146

Page 163: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Now let us assume that the effect of risk-factor A is independent of the effect of

the risk-factor B, or in other words, there is no interaction between the risk factors.

Independence or non-interaction between risk factors can be interpreted in a number

of ways. One possible formulation is to assume that the joint effect of risk factors

A and B is additive, i.e.,

(λ11kij − λ00k

ij ) = (λ10kij − λ00k

ij ) + (λ01kij − λ00k

ij ), (A.122)

which simplifies to:

λ11kij = λ10k

ij + λ01kij − λ00k

ij . (A.123)

Dividing both side of Equation A.123 by λ00kij is:

r11kij = r10k

ij + r01kij − 1. (A.124)

An alternative characterisation for the joint association is the multiplicative or

the log-linear model. Here we assume that the log transformation of the transition

intensities are additive. Under this formulation, Equation A.122 transforms into:

(log(λ11k

ij )− log(λ00kij )

)=

(log(λ10k

ij )− log(λ00kij )

)+

(log(λ01k

ij )− log(λ00kij )

), (A.125)

which simplifies to:

logλ11k

ij

λ00kij

= logλ10k

ij

λ00kij

+ logλ01k

ij

λ00kij

, (A.126)

which when re-written in terms of relative risks, is:

log(r11kij ) = log(r10k

ij ) + log(r01kij ), (A.127)

or equivalently,

r11kij = r10k

ij × r01kij . (A.128)

147

Page 164: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

A

+

+

B

+

+

Cases

akij

ckij

ekij

gkij

Controls

bkij

dkij

fkij

hkij

Figure A.33: A 2× 4 table with data for stratum k.

So in the above model, the independence or non-interaction of risk factors implies

a multiplicative combination for the joint effect.

Earlier in Section A.5, we have seen that in case-control studies, although relative

risks cannot be estimated directly, odds ratios can be calculated and used as good

approximations of relative risks. So we will use odds ratios, instead of relative risks,

to analyse the effects of combined exposures in case-control studies.

To study the effects of two risk factors A and B, the data can be summarised in

a 2 × 4 table, as given in Figure A.33, where ‘+’ implies exposure and ‘−’ implies

non-exposure.

Table A.37 lists all possible odds ratios that can be calculated from the data

given in Figure A.33. The first odds ratio, ψ11kij , measures the joint effect of the

risk factors A and B. The next two odds ratios, ψ10kij and ψ01k

ij , measure the effect

of one risk-factor at a time. The remaining four odds ratios, ψ1∗kij , ψ0∗k

ij , ψ∗1kij and

ψ∗0kij , stratify the population based on the exposure level of one risk-factor and then

measure the effect of the other risk-factor. The asterisk, in the notation of these

last four odds ratios, denotes the risk-factor for which the effect is being measured.

For example, ψ1∗kij is the odds ratio measuring the effect of exposure to B, for those

who are already exposed to A.

The odds ratios, ψ11kij , ψ10k

ij and ψ01kij , can also be used to measure the deviation

of the data from both additive and multiplicative models. The first two measures,

given in Table A.38, provide direct checks on deviation from these models. The

148

Page 165: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Table A.37: List of odds ratios obtained from the 2× 4 table in Figure A.33.

Notation Formula Main Information

ψ11kij

akijhk

ij

bkijgk

ijEffect of joint exposures versus none.

ψ10kij

ckijhk

ij

dkijgk

ijEffect of exposure to A alone versus none.

ψ01kij

ekijhk

ij

fkijgk

ijEffect of exposure to B alone versus none.

ψ1∗kij

akijdk

ij

bkijck

ijEffect of exposure to B, given exposed to A.

ψ0∗kij

ekijhk

ij

fkijgk

ijEffect of exposure to B, given unexposed to A.

ψ∗1kij

akijfk

ij

bkijek

ijEffect of exposure to A, given exposed to B.

ψ∗0kij

ckijhk

ij

dkijgk

ijEffect of exposure to A, given unexposed to B.

case only odds ratio gives an alternative measure to check departure from the multi-

plicative model. The control only odds ratio estimates exposure dependencies in the

underlying population. A discussion on these last two measures is given in Khoury

and Flanders (1996).

For a general discussion on the use of 2× 4 tables for measuring combined expo-

sures, please refer to Botto and Khoury (2001).

Table A.38: Other measures based on the 2× 4 table in Figure A.33.

Other measures Formula

Multiplicative interaction ψ11kij /(ψ10k

ij ψ01kij )

Additive interaction ψ11kij − (ψ10k

ij + ψ01kij − 1)

Case only odds ratioak

ijgkij

ckijek

ij

Control only odds ratiobkijhk

ij

dkijfk

ij

149

Page 166: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

150

Page 167: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Appendix B

Numerical Methods

B.1 Differential Equations

B.1.1 Introduction

In this section, we will briefly describe how the transition intensities introduced

earlier can be used to formulate a set of differential equations which can be solved

for the transition and occupation probabilities. For details, please refer to Press et al.

(2002). Here we will consider a general n-state model. Using the same definitions

and notations defined in the previous chapter, we have the following set of equations,

commonly referred to as the Kolmogorov forward equations:

d

dtPij(s, t) =

k

Pik(s, t)λkj(t), (B.129)

or in matrix notation,

P′(s, t) = P(s, t)×Λ(t), (B.130)

with the boundary condition P(s, s) = I.

With arbitrary Λ(t), defined by typical life history events, we can only solve these

equations numerically and not explicitly. We now discuss some numerical methods

of solving differential equations.

151

Page 168: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

B.1.2 Euler Method

The formula for the Euler method is:

P(s, t+ h) = P(s, t) + h×P′(s, t) (B.131)

which advances a solution from t to t + h. However, this method advances the

solution through an interval of length h using derivative information only at the

beginning of that interval. Although the method converges, it is inefficient and

asymmetric and is not normally recommended.

The Euler method can easily be improved upon by making use of an intermediate

solution to achieve greater accuracy. A simple approach is to find a solution at the

mid-point of the interval and to then obtain the solution at the end of the interval

as illustrated below.

Define:

K1 = h×P′(s, t) = h×P(s, t)×Λ(t), (B.132)

K2 = h×{

P(s, t) +1

2K1

}×Λ(t+

1

2h), (B.133)

leading to:

P(s, t+ h) = P(s, t) + K2 +O(h3) (B.134)

This method is sometimes referred to as the midpoint method and can be further

refined to give the fourth-order Runge-Kutta method which is outlined in the next

section.

B.1.3 Runge-Kutta Method

By far the most often used method is the classical fourth-order Runge-Kutta formula.

The steps are outlined below.

Define:

152

Page 169: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

K1 = h×P′(s, t) = h×P(s, t)×Λ(t)

K2 = h×{P(s, t) +

1

2K1

}×Λ(t+

1

2h)

K3 = h×{P(s, t) +

1

2K2

}×Λ(t+

1

2h)

K4 = h× {P(s, t) + K3} ×Λ(t+ h). (B.135)

leading to:

P(s, t+ h) = P(s, t) +1

6K1 +

1

3K2 +

1

3K3 +

1

6K4 +O(h5) (B.136)

For any multiple state model, the transition intensities will form the fundamental

building blocks. So in almost all circumstances we will be able to define a set of dif-

ferential equations specifying the problem and numerical solutions can be computed

using Runge-Kutta method.

B.2 Random Numbers

B.2.1 Introduction

Generation of random numbers from a particular distribution forms one of the most

important tasks in a simulation exercise. This topic is covered in many textbooks

on numerical analysis. So this section is not meant to be an exhaustive discussion

on this topic. Rather the aim will be to provide a documentation of the methods

that we are going to use. For a fuller treatment of the topic, please refer to Press

et al. (2002).

In the next section, we will give a brief introduction to the generation of random

numbers from a uniform distribution. Then we will move on to other distributions

of interest, from which random numbers can be generated using suitable transfor-

mations. In the final section, we will outline a method that can be used for any

general continuous distribution.

153

Page 170: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

B.2.2 Uniform Deviates

Standard libraries of all major programming languages provide random number

generators. In our case, we will concentrate primarily on C++, as all our programs

will be written in that programming language. C++ has inherited from the ANSI

C library a pair of routines, srand() and rand() for initialising and then generating

random numbers. The random number generator is initialised with a seed and

a sequence of random numbers can be generated based on that seed. Note that

the same initialising value of seed will always return the same sequence of random

numbers.

The rand() function of C++ is a linear congruential generator, which can generate

a sequence of integers I1, I2, . . . each between 0 and m−1 by the recurrence relation

Ij+1 = aIj + c (mod m). Here m is called the modulus, and a and c are positive

integers called the multiplier and the increment respectively. ANSI C requires that

m be at least 32768, which is nevertheless too small an integer for any large scale

simulation exercise.

In Press et al. (2002), there is detailed discussion on efficient routines for random

number generation, salient features of which are listed below.

ran0 This routine is a simple linear congruential generator, which is satisfactory

for the majority of applications. However, it is not recommended because of

the presence of subtle serial correlations.

ran1 The routine uses the same algorithm as ran0. However, it shuffles the output

to remove low-order serial correlation. The routine ran1 passes those statistical

tests that ran0 is known to fail. However, it is 30% slower than ran0. This

routine is recommended for general use.

ran2 The ran2 routine uses a long period random number generator with the shuf-

fle. It is recommended for generating more than 100,000,000 random numbers

in a single calculation, as it has a longer period than ran1. However, this

routine is only half as fast as ran0.

For our simulation exercise, we would need to generate a lot of random numbers.

So we will use ran2 for our simulation exercise. In Press et al. (2002), there is also

154

Page 171: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

a discussion on ran4 which generates “extremely” good random deviates. However,

it is only half as fast as ran2 and we will not describe it here. Unlike rand() of C++

library which generates integers, ran0, ran1 and ran2 produce uniform random

deviates between 0.0 and 1.0 (exclusive of the endpoint values). Similar to the

rand() function all these random number generators require a seed to initiate the

sequence. If a seed is not provided, the seed will automatically be set to the time

of the machine clock.

B.2.3 The Transformation Method

In the last section, we have seen how we can generate uniform deviates using the

ran2 routine. Now we will see how we can use randomly generated uniform deviates

to produce random deviates from a specific distribution.

Let us first look at a simple discrete distribution — the Bernoulli distribution.

Let Y ∼ Ber(p), i.e. P [Y = 1] = p and P [Y = 0] = 1− p. The following steps can

be used to generate random deviates from this distribution.

(a) Generate a random deviate x from a U(0, 1) distribution.

(b) If x < p produce 1, else produce 0 as the required random deviate from Ber(p).

Random deviates from Bin(n, p) can be produced by adding n independent

Ber(p) random deviates.

Random deviates from the Multinomial(n, p1, p2, . . . , pn) can be generated as

follows:

(a) Generate a random deviate x from a U(0, 1) distribution.

(b) If x ≤ p1 produce 1, else ifk−1∑j=1

pj < x ≤k∑

j=1

pj produce k as the required random

variate.

For continuous distributions, let us first consider U(a, b), a simple generalisation

of U(0, 1). We know that if we define Y = a + (b − a)X where X ∼ U(0, 1), then

Y ∼ U(a, b). So if we generate x from U(0, 1) and define y = a + (b − a)x, then y

is a random deviate from U(a, b). So we see that a simple linear transformation of

the U(0, 1) produces random deviates from U(a, b) distribution.

155

Page 172: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

The next distribution of interest is the exponential distribution, Exp(λ). Here we

use the following transformation: Y = − log(1−X). If X ∼ U(0, 1), Y ∼ Exp(λ).

Note that in both the examples above, we have made use of the fact that for

any random variable Y , F (Y ) ∼ U(0, 1), where F (·) is the cumulative distribution

function of the random variable Y . In other words, X = F (Y ) ∼ U(0, 1). So,

Y = F−1(X) has the cumulative distribution function F (·).A general method for producing random deviates from any random variable with

cumulative distribution function F (·) requires the following steps:

(a) Generate a random deviate x from a U(0, 1) distribution.

(b) Find y, such that, F (y) = x.

(c) Produce y as a random deviate from F (·).

The result above can be used to generate random deviates from a general distri-

bution if the cumulative distribution function for that distribution can be inverted.

However, most distributions that we will be interested in rarely have a cumulative

distribution function that can be inverted easily. Of course, an iterative method can

be used to act as a substitute.

However, the algorithm above will not be efficient if F (y) is not easy to compute.

If this is the case then it is advisable to tabulate the values of F (y) at appropriately

short-spaced y’s, and use linear interpolation at intermediate points. Note that the

shorter the spacing between tabulated y’s the greater the accuracy but the larger

the space requirement.

As an example, let us assume that we know the age-dependent transition intensity

λ(x) for a particular hazard. Suppose we are interested in generating the waiting

time T for an individual aged a to make the relevant transition. We know that the

distribution function of T is given by

F (t) = 1−exp

(− ∫ t

0λ(s)ds

)

exp(− ∫ a

0λ(s)ds

) (B.137)

Unless we have a very simple form for λ(·), we will have to perform numerical

integration each time we need F (t). As this is inefficient and time consuming, we

evaluate the values F (t1), F (t2), . . . where tj+1 = tj + δ, δ being a small positive

number, say 0.01, and then store these values for ready reference.

156

Page 173: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Now following the algorithm outlined above, generate a uniform random variate

x and find t such that F (t) = x. One can choose an efficient search algorithm which

can minimise the search for the correct t. As we are searching within a bounded

interval the Bisection method can be used. The t thus obtained gives us the required

waiting time.

There is an important point to note here. Many of the transition intensities

that we will be working with may not have the property F (∞) = 1. This means

that there is a probability 1 − F (∞) that an individual will not make a transition

at all. This can be taken into account by generating a Bernoulli random variate

Y ∼ Ber (1− F (∞)), where Y = 0 will indicate that the individual will never make

the transition and Y = 1 will indicate otherwise. So the above algorithm will only

be implemented if Y = 1, as searching for a value of t is only required if a transition

is made.

For a Normal distribution, the cumulative distribution function is not easily

invertible. So a different transformation known as the Box-Muller transformation

is usually used to produce standard normal deviates. Consider the transformation

between two random deviates x1, x2 from U(0, 1) and two quantities y1, y2,

y1 =√−2 ln x1 cos 2πx2 (B.138)

y2 =√−2 ln x1 sin 2πx2 (B.139)

It can be shown that y1, y2 are independent random deviates from the N(0, 1)

distribution.

B.2.4 The Rejection Method

The rejection method is a powerful, general technique for generating random de-

viates from a distribution whose density function p(·) is known and computable.

The rejection method does not require that the cumulative distribution function be

readily computable, much less the inverse of that function, which was required for

the transformation method described in the previous section.

The rejection method involves the following steps:

(a) Find a majorising function M(·), for which M(x) > p(x) for all x.

157

Page 174: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(b) Calculate the area A under the majorising function M(·), i.e. A =∫∞−∞M(s)ds.

(c) Generate a random deviate x1 from U(0, A).

(d) Find y, such that x1 =∫ y

−∞M(s)ds.

(e) Generate a random variate x2 from U(0, 1).

(f) If x2 < p(y)/M(y), produce y as the required random deviate from the distri-

bution with density function p(·); otherwise return to step (c).

As we have already seen how to generate uniform random deviates, the main

issue here is to obtain an appropriate majorising function. There are many different

ways one can define a majorising function and suitability of the majorising function

will also depend on the shape of p(·). Also, apart from the fact that the M(·) needs

to have the property that M(x) > p(x) for all x, it should also be easy to invert∫ y

−∞M(s)ds. Here, we will propose a general method of producing the majorising

function for any density function p(·).Our aim will be to find a step function M(x) which will provide an upper envelope

for p(x). For this, we need to start off from any x = x0, such that 0 < p(x0) < ∞.

Given x0, we move on to x1, such that M(x) on this interval is a constant and

exceeds p(x) for all x in that interval and the area under M(x) does not exceed a

pre-specified positive number. Once p(x) becomes smaller than a set tolerance level,

it is assumed that the tail of the distribution is reached and is approximated by an

exponential function. These steps are followed on both sides of x0 to +∞ and −∞.

The full algorithm is outlined below.

First find x = x0 such that 0 < p(x0) < ∞. This will be the starting point for

setting our majoring function M(·). We set M(x0) = p(x0). Now our algorithm will

set the values of M(·), first for x > x0 and then for x < x0. At each step it will

be required to calculate p′(x) for which any simple numerical differentiation method

can be used.

So for x > x0 do the following:

1. Find x+n from x+(n−1), so that the area∣∣x+n − x+(n−1)

∣∣× p (x+(n−1)

)equals a

pre-defined small value δ > 0.

2. Depending on the values of p′(x+(n−1)

)and p′ (x+n) do one of the following:

158

Page 175: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

(a) If p′(x+(n−1)

)< 0 and p′ (x+n) < 0, then set M(x+(n−1)) = p

(x+(n−1)

).

(b) If p′(x+(n−1)

)> 0 and p′ (x+n) > 0, then set M(x+(n−1)) = p (x+n).

(c) If p′(x+(n−1)

)> 0 and p′ (x+n) < 0, then set M(x+(n−1)) as the minimum

of the following two terms:

� p(x+(n−1)

)+

∣∣x+n − x+(n−1)

∣∣× p′(x+(n−1)

)

� p (x+n) +∣∣x+n − x+(n−1)

∣∣× p′ (x+n).

(d) Else set M(x+(n−1)) as the maximum of p(x+(n−1)

)and p (x+n).

3. If p (x+n) < ε n = 1, 2, . . . for a pre-specified small ε > 0, then

(a) If p′(x+(n−1)

)< 0, set

M(x) = p (x+n)× e−(x−x+n) x > x+n (B.140)

(b) If p′(x+(n−1)

)> 0, set

M(x) =

p(x+(n−1)

)× e(x−x+(n−1)) x+(n−1) < x ≤ x+n

0 x > x+n

(B.141)

and stop. Else continue.

Similarly for x < x0 do the following:

1. Find x−n from x−(n−1) n = 1, 2, . . ., so that the area∣∣x−n − x−(n−1)

∣∣ ×p(x−(n−1)

)equals a pre-defined small value δ > 0.

2. Depending on the values of p′(x−(n−1)

)and p′ (x−n) do one of the following:

(a) If p′(x−(n−1)

)< 0 and p′ (x−n) < 0, then set M(x−n) = p (x−n).

(b) If p′(x−(n−1)

)> 0 and p′ (x−n) > 0, then set M(x−n) = p

(x−(n−1)

).

(c) If p′(x+(n−1)

)< 0 and p′ (x+n) > 0, then set M(x−n) as the minimum of

the following two terms:

� p(x−(n−1)

)+

∣∣x−n − x−(n−1)

∣∣× p′(x−(n−1)

)

� p (x−n) +∣∣x−n − x−(n−1)

∣∣× p′ (x−n).

(d) Else set M(x−n) as the maximum of p(x−(n−1)

)and p (x−n).

159

Page 176: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

3. If p (x−n) < ε for a pre-specified small ε > 0, then

(a) If p′(x−(n−1)

)> 0, set

M(x) = p (x−n)× e(x−x−n) x < x−n (B.142)

(b) If p′(x−(n−1)

)< 0, set

M(x) =

p(x−(n−1)

)× e−(x−x−(n−1)) x−n < x ≤ x−(n−1)

0 x < x−n

(B.143)

and stop. Else continue.

For values of x where M(·) is not defined above, define M(x) = M(y) where y is

the largest value less than x for which M(·) is defined.

It is easy to verify thatM(·) defined above is easily invertible and has the property

M(x) > p(x) where x does not belong to the tail region. For the tails, we will assume

that the scaled exponential function majorises p(x). The exponential approximation

of the tails is satisfactory for most distributions which will be of interest to us.

However, this approach is not adequate for dealing with distributions with fat tails.

Now that we have obtained M(·) for a general p(·), we can use the rejection

method to generate random deviates from the general distribution with density

function p(·).As an example, let us consider Exp(1) distribution. If we start from x0 = 1 and

set δ = 0.10, Figure B.34 shows how the majorising function M(x) will provide an

upper envelope for the exponential density p(x). Now if we change δ to 0.01, the

new majorising function M(x) is given in Figure B.35. Clearly, with δ = 0.01, the

majorising function is a very close approximation of the Exp(1) density function.

The important point to note here is that the only difference in the simulation

of random deviates in cases, δ = 0.10 and δ = 0.01 lies in the efficiency of the

method. It is quicker to compute the majorising function if δ is large. However,

this might mean generating a significantly large number of uniform deviates to get

a single random deviate from the target distribution. On the other hand, small δ

means a significant amount of time spent on computing M(x), but more efficiency

is achieved in terms of actual generation of random deviates. But since M(x) need

160

Page 177: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6

Den

sity

x

Majorising functionExponential(1) density

Figure B.34: The Exp(1) density and the majorising function with δ = 0.10.

only be computed once, the following rule of thumb can be used — to generate a

large number of random deviates, use a small δ.

For the N(0, 1) distribution, a similar exercise leads to the majorising functions

given in Figures B.36 and B.37.

The density estimates based on the simulated 50,000 random deviates obtained

from the Exp(1) and N(0, 1) distributions using the Rejection method with δ = 0.01

are given in Figures B.38 and B.39 respectively.

161

Page 178: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6

Den

sity

x

Majorising functionExponential(1) density

Figure B.35: The Exp(1) density and the majorising function with δ = 0.01.

0

0.1

0.2

0.3

0.4

0.5

-4 -3 -2 -1 0 1 2 3 4

Den

sity

x

Majorising functionNormal(0,1) density

Figure B.36: The N(0,1) density and the majorising function with δ = 0.10.

162

Page 179: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

0

0.1

0.2

0.3

0.4

0.5

-4 -3 -2 -1 0 1 2 3 4

Den

sity

x

Majorising functionNormal(0,1) density

Figure B.37: The N(0,1) density and the majorising function with δ = 0.01.

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

x

Den

sity

Figure B.38: Density estimates based on the simulated 50,000 random deviates fromExp(1).

163

Page 180: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

−4 −2 0 2 4

0.0

0.1

0.2

0.3

0.4

0.5

x

Den

sity

Figure B.39: Density estimates based on the simulated 50,000 random deviates fromN(0, 1).

164

Page 181: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Bibliography

Arrow, K. (1963). Uncertainty and the welfare economics of medical care. Amer-

ican Economic Review, 53(5), 941–973.

Bentham, J. (1789). An introduction to the principles of morals and legislation.

Oxford University Press (1996).

Binmore, K. (1991). Fun and games: A text on game theory. Houghton Mifflin.

Botto, L. and Khoury, M. (2001). Commentary: Facing the challenge of gene-

environment interaction: The two-by-four table and beyond. American Journal

of Epidemiology, 153, 1016–1020.

Breslow, N. and Day, N. (1980). Statistical Methods in Cancer Research: Volume

1 – The analysis of case-control studies. International Agency for Research on

Cancer.

Brønnum-Hansen, H., Jørgensen, T., Davidsen, M., Madsen, M., Osler,

M., Gerdes, L. and Schroll, M. (2001). Survival and cause of death after my-

ocardial infarction: The danish monica study. Journal of Clinical Epidemiology,

54, 1244–1250.

Capewell, S., Livingston, B., MacIntyre, K., Chalmers, J., Boyd, J.,

Finlayson, A., Redpath, A., Pell, J., Evans, C. J. and McMurray, J.

(2000). Trends in case-fatality in 117 718 patients admitted with acute myocardial

infarction in scotland. European Heart Journal, 21, 1833–1840.

Darwin, C. (1859). On the origin of species by means of natural selection, or the

preservation of favoured races in the struggle for life. Jon Murray, Albermarle

Street, London.

165

Page 182: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Darwin, E. (1794). Zoonomia: or the laws of organic life. J. Johnson.

Daykin, C., Akers, D., Macdonald, A., McGleenan, T., Paul, D. and

Turvey, P. (2003). Genetics and insurance — some social policy issues (with

discussions). British Actuarial Journal, 9, 787–874.

Doherty, N. and Posey, L. (1998). On the value of a checkup: Adverse selection,

moral hazard and the value of information. Journal of Risk and Insurance, 65(2),

189–211.

Doherty, N. and Thistle, P. (1996). Adverse selection with endogeneous infor-

mation in insurance markets. Journal of Public Economics, 63, 83–102.

Eisenhauer, J. and Ventura, L. (2003). Survey measures of risk aversion and

prudence. Applied Economics, 35, 1477–1484.

Goldberg, R., McCormick, D., Gurwitz, J., Yarzebsky, J., Lessard, D.

and Gore, J. (1998). Age-related trends in short- and long-term survival after

acute myocardial infarction: A 20-year population-based perspective (1975-1995).

American Journal of Cardiology, 82, 1311–1317.

Gutierrez, C. and Macdonald, A. (2003). Adult polycystic kidney disease and

critical illness insurance. North American Actuarial Journal, 7(2), 93–115.

Gutierrez, C. and Macdonald, A. (2004). Huntington’s disease, critical illness

insurance and life insurance. Scandinavian Actuarial Journal, pages 279–313.

Hoy, M. and Polborn, M. (2000). The value of genetic information in the life

insurance market. Journal of Public Economics, 78, 235–252.

Hoy, M. and Witt, J. (2005). Welfare effects of banning genetic information in

the life insurance market: The case of brca1/2 genes. Technical report, University

of Guelph Discussion Paper 2005-5.

Jones, F. (2005). The effects of taxes and benefits on household income, 2004/05.

Technical report, Office for National Statistics.

166

Page 183: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Khoury, M. and Flanders, W. (1996). Nontraditional epidemiologic approaches

in the analysis of gene-environment interaction: case-control studies with no con-

trols. American Journal of Epidemiology, 144, 207–213.

Lewin, B. (2000). Genes VII. Oxford University Press.

Macdonald, A. (2003). Moratoria on the use of genetic tests and family history

for mortgage-related life insurance. British Actuarial Journal, 9(1), 217–237.

Macdonald, A. (2004). Genetics and insurance management. In A. Sandstrom

(ed.) The Swedish Society of Actuaries: One Hunderd Years. Svenska Aktuar-

ieforeningen, StocKholm.

Macdonald, A. and Pritchard, D. (2000). A mathematical model of

alzheimer’s disease and the apoe gene. ASTIN Bulletin, 30, 69–110.

Macdonald, A. and Pritchard, D. (2001). Genetics, alzheimer’s disease and

long-term care insurance. North American Actuarial Journal, 5(2), 54–78.

Macdonald, A., Pritchard, D. and Tapadar, P. (2006). The impact of mul-

tifactorial genetic disorders on critical illness insurance: A simulation study based

on uk biobank. To appear in ASTIN Bulletin.

Macdonald, A. and Tapadar, P. (2006). Multifactorial genetic disorders and

adverse selection: Epidemiology meets economics. Submitted.

Macdonald, A., Waters, H. and Wekwete, C. (2003a). The genetics of breast

and ovarian cancer i: A model of family history. Scandinavian Actuarial Journal,

pages 1–27.

Macdonald, A., Waters, H. and Wekwete, C. (2003b). The genetics of breast

and ovarian cancer ii: A model of family history. Scandinavian Actuarial Journal,

pages 28–50.

McCormick, A., Fleming, D. and Charlton, J. (1995). Morbidity Statistics

from General Practice: Fourth National Study 1991-1992. Series MB5 No. 3.

Washington, D.C.: OPCS, Government Statistical Service.

167

Page 184: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Mendel, G. (1866). Proceedings of the natural history society. Journal of Monetary

Economics, 4, 3–47.

Meyer, D. and Meyer, J. (2005). Risk preferences in multi-period consump-

tion models, the equity premium puzzle and habit formation utility. Journal of

Monetary Economics, 52, 1497–1515.

Miettinen, O. (1970). Estimation of relative risk from individually matched series.

Biometrics, 26, 75–86.

Mill, J. (1879). Utilitarianism. Longmans, Green and Co.

Mossin, J. (1968). Aspects of rational insurance purchasing. Journal of Political

Economy, 76(4), 553–568.

Nash, J. (1950). The bargaining problem. Insurance: Mathematics and Economics,

17, 155–162.

Norberg, R. (1995). Differential equations for moments of present values in life

insurance. Econometrica, 18(2), 171–180.

Norstad, J. (1999). An introduction to utility theory. Unpublished manuscript at

http://homepage.mac.com/j.norstad.

Pasternak, J. (1999). An introduction to human molecular genetics: mechanisms

of inherited diseases. Fitzgerald Science Press.

Pratt, J. (1964). Risk aversion in the small and in the large. Econometrica, 32,

122–136.

Press, W., Teukolsky, S., Vetterling, W. and Flannery, B. (2002). Nu-

merical Recipes in C++. Cambridge University Press.

Ridley, M. (1999). Genome: The autobiography of a species in 23 chapters. Fourth

Estate.

Robins, J., Greenland, S. and Breslow, N. (1986). A general estimator for the

variance of the mantel-haenszel odds ratio. American Journal of Epidemiology,

124, 719–723.

168

Page 185: THE IMPACT OF MULTIFACTORIAL GENETIC DISORDERS ON …angus/papers/pt_phd.pdf · 2007-01-17 · 4.19 The measure of overlap O for CI insurance premium ratings for males aged 45, with

Rothschild, M. and Stiglitz, J. (1976). Equilibrium in competitive insurance

markets: An essay on the economics of imperfect information. The Quarterly

Journal of Economics, 90(4), 630–649.

Strachan, T. and Read, A. (1999). Human Molecular Genetics 2. BIOS Scientific

Publishers Ltd.

Sudbery, P. (1998). Human molecular genetics. Addison Wesley Longman Lim-

ited.

Treasury, H. (2005). Economy charts and tables. Technical report, Pre-Budget

Report.

Tunstall-Pedoe, H., Kuulasmaa, K., Mahonen, M., Tolonen, H.,

Ruokokoski, E. and Amouyel, P. (1999). Contribution of trends in survival

and coronary event rates to changes in coronary heart disease mortality: 10 year

results from 37 who monica project populations. The Lancet, 353, 1547–1557.

Von Neumann, J. and Morgenstern, O. (1944). Theory of games and economic

behavior. Princeton University Press.

Watson, J. and Crick, F. (1953). Moelcular structure of nucleic acids. Nature,

171, 737–738.

Woodward, M. (1999). Epidemiology: Study Design and Data Analysis. Chapman

& Hall.

Xie, D. (2000). Power risk aversion utility functions. Annals of Economic and

Finance, 1, 265–282.

169