development of a software tool for risk assessment in ... · development of a software tool for...

I

Development Of A Software Tool For Risk

Assessment In Genetic Disorders

Yusman Yusof

MSc. in Control Systems Engineering: August 2001

II

ABSTRACT

This dissertation describes the process of developing a prototype software tool to

automatically calculate risks for autosomal recessive inheritance diseases. The process

involves the understanding of genetic risk calculation and software design and

development. To calculate genetic risk, genetic counsellors consider a variety of data

including family history and disease characteristics. However, with regard to the

massive data involved, manual processing is not only tedious, it is also error prone.

The key element of the project is the translation of the manual process of risk

calculation into program logic. Probabilit y law, utili zing Bayesian conditional

probabilit y is used for the genetic risk calculation. The software was developed using

Java, while the database design is based on the relationship database system. The

purpose of opting for such method is to make the software platform independent

which can later be upgraded to a web based client server system.

The current software provides a general framework which allows further

development of an automatic risk assessment software. The next phase of software

development is to emphasize on the risk calculation for consanguinity cases, the

posterior probabilit y of a person being affected after screening process on an

individual has been done and etc. Other than that, is to include utilit y features as part

of the software in order to make it a better supporting tool for genetic risk assessment.

III

EXECUTIVE SUMMARY

INTRODUCTION/BACKGROUND

Genetic counsellors begin risk assessment process by collecting varieties of

information including family history, disease characteristics and etc. Next, these

information are analysed and the risk for various genetic disorders will be calculated.

With regards to the massive amount of information involved, manual process of risk

calculation is not only tedious, it is also error prone.

AIMS AND OBJECTIVES

The aim and objective of the project is to develop a software tool that will

automatically calculate the risk for autosomal recessive disorders.

ACHIEVEMENTS

So far, the developed software is considered a prototype because it only allows risk

calculations for certain cases of autosomal recessive disorder.

CONCLUSIONS AND RECOMMENDATIONS

The current prototype software provides a general framework which allows further

development of an automatic risk assessment software. The prototype software will

evolve into a fully functional software when enhancements and improvements on the

software are made.

IV

ACKNOWLEDGEMENTS

Upon the completion of my dissertation, I would like thank my project

supervisor, Dr. R. F. Harrison who has been generous with his time and

ideas. I have been most influenced by his ideas.

In addition I would to thank Dr. Ann Dalton for her exhaustive introduction to

medical genetics especially on the practical side of genetic counselling and

risk calculation.

Finally I owe a great debt to the rest of my family and friends: reminding me

why it all matters, my parents: for sheltering me from travails of the real world

and surrounding me with so much love, support and hope.

TABLE OF CONTENTS

Chapter 1 - Introduction ........................................................................................... 1

1.1. A brief introduction......................................................................................1

1.2. Structure of the dissertation.........................................................................2

1.3. Related work ................................................................................................3

Chapter 2 - Risk Assessment..................................................................................... 5

2.1. Genetic risk: a brief introduction................................................................. 5

2.2. The estimation of risks.................................................................................7

2.2.1. Laws of the probabilit y ........................................................................7

2.2.2. Bayes’ theorem.....................................................................................8

2.2.3. Risk calculation: Autosomal recessive inheritance............................11

Chapter 3 - Software Development ........................................................................ 20

3.1. Evolutionary development .........................................................................20

3.2. Software Design.........................................................................................20

3.2.1. System Overview ...............................................................................20

3.2.2. Sub-system modules..........................................................................21

3.3. Essential program logic..............................................................................24

3.3.1. Risk Calculation.................................................................................24

3.3.2. Drawing pedigree: Connecting elements...........................................27

3.3.3. Data structure.....................................................................................29

Chapter 4 - Automatic risk calculation.................................................................. 31

4.1. Software test results...................................................................................31

4.1.1. Risk to the offspring of a healthy sibling...........................................31

4.1.2. Risks to the extended family..............................................................33

4.1.3. Risk to the offspring of an affected homozygote...............................39

Chapter 5 - Conclusions and Recommendations .................................................. 42

5.1. Conclusions................................................................................................42

5.2. Further Development and enhancement ....................................................44

5.2.1. Risk calculation..................................................................................44

5.2.2. Software enhancement .......................................................................47

Bibliography ............................................................................................................. 52

LIST OF FIGURES

Figure 2.1 Symbols used in drawing a pedigree..........................................................6

Figure 2.2 Autosomal recessive inheritance: carrier risk to the healthy sibling for

someone with an autosomal recessive disorder .................................................10

Figure 2.3 Carrier risk using Hardy-Weinberg equili brium.......................................13

Figure 2.4 Risk to the healthy sibling of someone with cystic fibrosis for having an

affected child (Disease incidence is assumed to be 1/2500) ................................. 14

Figure 2.5 The probabilit y of being a carrier in relatives of someone with an

autosomal recessive disorder .............................................................................16

Figure 2.6 Risk to extended family (Disease incidence is assumed to be 1/2500) .......17

Figure 2.7 Risk to the offspring of someone with an autosomal recessive disorder who

marries a healthy unrelated individual ...............................................................18

Figure 3.1 The system overview ................................................................................20

Figure 3.2 The sub-system component for computing genetic risk ...........................21

Figure 3.3 Graphics elements properties: Basic symbols used in pedigree to represent

individuals..........................................................................................................27

Figure 3.4 Graphics elements properties: Connecting children and parent ...............28

Figure 3.5 Pedigree and its data structure representation ..........................................29

Figure 4.1 Software output (Risk to the offspring of a healthy sibling): Carrier risk

values.................................................................................................................32

Figure 4.2 Software output (Risk to the offspring of a healthy sibling): Risk of being

affected for individual 6.....................................................................................33

Figure 4.3 Software output (Risks to the extended family): Carrier risk values........35

Figure 4.4 Software output (Risks to the extended family): Risk of being affected for

individual 5 ........................................................................................................35

Figure 4.5 Software output (Risks to the extended family): This relates to the same

situation as in Figure 4.4 , but the ‘at risk’ half sibling already has two healthy

full siblings.........................................................................................................37

Figure 4.6 Software output (Risks to the extended family): Carrier risk probabilit y for

various members of extended family................................................................. 38

Figure 4.7 Software output (Risk to the offspring of an affected homozygote): Risk to

the offspring of someone with an autosomal recessive disorder........................39

Figure 4.8 Software output (Risk to the offspring of an affected homozygote): Carrier

risk probabilit y for various members of extended family..................................40

Figure 4.9 Software output (Risk to the offspring of an affected homozygote): Risk to

the offspring of someone with an autosomal recessive disorder who marries a

healthy first cousin.............................................................................................41

Figure 5.1 Healthy parents have two children each with a different autosomal

recessive disorder...............................................................................................46

Figure 5.2 Database connection and implementation architecture............................47

Figure 5.3 Database design: Entity relationship diagram for the system database tables

(refer to Elmasri and Navathe [16] for information on the ER diagram notation)

............................................................................................................................48

LIST OF TABLES

Table 2.1 Hardy-Weinberg equili brium: disease frequency ......................................13

Table 5.1 Database design: Individual table with attributes......................................49

Table 5.2 Database design: Pedigree table with attributes.........................................50

Table 5.3 Database design: Marriage table with attributes........................................50

1

Chapter 1 - Introduction

1.1. A brief introduction

This dissertation describes the process of developing a prototype software tool to

automatically calculate risks for autosomal recessive inheritance diseases. The process

involves understanding of genetic risk calculation and software design and

development. To compute genetic risk, genetic counsellors consider a variety of data

including family history, disease characteristics and etc. However, to manually

process all the information is an error prone and tedious task.

The software uses the same kind of reasoning that people use to compute risk.

The key difference is that while people can be easily overwhelmed by the size of work

involved the program is able to carry out a complete analysis. The availabilit y of a

user friendly automatic risk assessment software will be a valuable aid to genetic

counsellors.

The project is centred around the process of translating the manual process of

calculating genetic risk into program logic. The software is designed using object-

oriented approached and the program is written in Java, while the database design is

based on the relationship database system. The purpose of opting for such method is

to make the software platform independent and later could be enhanced to a web

based client server system.

2

1.2. Structure of the dissertation

This section briefly describes the contents of this dissertation and the sequence that has

been followed in the presentation and discussion of the information.

Chapter 1 contains the introduction to the project and reviews on the previous

work on solving the problem.

Chapter 2 introduces the probabilit y theory and outlines the details of the

genetic risk assessments.

Chapter 3 reviews on the steps taken in realizing the software development

and includes the software design.

Chapter 4 compares the risk assessment calculation result between the

software outputs and manual calculation.

Chapter 5 draws the conclusion from the project work and describes the

important features that need to be added to the software and suggests enhancements to

the software for future use.

3

1.3. Related work

Pathak and Perlin [4] describe a software for genetic counselli ng. Their software

utili zes the use of DNA linked marker data of patients. Even though this practice

increases the confidence during risk assessment but according to Young [3],

The increasing use of linked DNA markers and the availabilit y of DNA mutation

analysis often serve to complicate, rather than simplify, risk calculations which

require careful consideration and a relatively high level of numerical

competence if the provision of incorrect information is to be avoided.

The use of DNA markers is more complex and contradicts with the real

situation that occurs during information gathering in genetic counselli ng, where in

practice it is almost impossible to obtain the DNA information of a person. It is either

that the participating individuals in the family are reluctant to be examined or

individuals involved is dead and there is no DNA information about the deceased or

the data collection involves distant relatives and it is diff icult to perform test on them.

Therefore, to make it simple, the software only considers classical data which is

already suff icient for genetic risk assessment and proven to be reliable.

A computer program, RISK-XLR which calculates genetic risk for carrier

status of an X-linked recessive condition has been written by Rivas and Martens [7].

Even though the purpose of this project is to develop an automatic risk assessment for

autosomal recessive conditions, but the technique used is almost similar because both

X-linked and autosomal recessive are Mendelian disorders. The technique used

4

incorporates family information and carrier test results and utili zes Bayesian

conditional probabilit y. The implementation of Bayesian conditional probabilit y in the

RISK-XLR software provides the groundwork to the programming logic for the

software that is going to be developed.

An essential feature of an automatic risk assessment software is the abilit y to

link the collected data in a proper manner, in other words the collected data are to be

represented by a pedigree. A pedigree is a diagram that helps genetic counsellors to

visualize family information (number of sons, daughters and granddaughters with

normal sons, relatives, etc.) Previous work for this purpose has been done by Damme

[5]. He suggested that in the patients database each patient is given a unique

identification number. In order to realize a linkage between patients records, some

additional tables will have to be created in which information such as the parent and

child relationship are stored. His method of linking the patient database for genetic

counselli ng purposes leads to the design of the data structure and database for the

automatic risk assessment software.

5

Chapter 2 - Risk Assessment

2.1. Genetic risk: a brief introduction

In order to understand genetic risk one must know from where this problem is derived

from. The following is a definition of genetic counselli ng by Harper in his valuable

book Practical Genetic Counselli ng (1999) (see Bibliography).

Genetic counselli ng is the process by which patient or relatives at risk of a

disorder that may be hereditary are advised of the consequences of the, the

probabilit y of developing or transmitting it and of the ways in which this may be

prevented, avoided or ameliorated [6].

A second definition by Young to describe the genetic risk process is as follows.

Genetic risk refers to the probabilit y that patients may either develop a genetics

disorder or transmit a genetic disease to their children [3].

Comparing these two definitions it is clear that the genetic risk is a sub-process

from the genetic counselli ng. Genetic counselli ng is the process when genetic

counsellors collect varieties of data including family history and disease characteristic.

From these information genetic counsellors calculates the genetic risk i.e. to determine

the possibilit y of a person inheriting or transmitting the disease to others in the family.

6

In assisting genetic counsellors a special diagram, pedigree is being used as an

aid to represent the gathered family information. The symbols used in the pedigree are

ill ustrated in Figure 2.1.

Figure 2.1 Symbols used in drawing a pedigree

3 2

M a l e

Fema le

Ma t i ng

Parents and 1 boy, 1 g i r l in b i r th order

Dizygot ic ( two-eggs) tw ins

Monozygot ic (one-egg) tw ins

Sex unspec i f ied

Number o f ch i ld ren o f sex ind icated

Af fected ind iv idua ls

Heterozygotes for autosomal recess ive

Carr ier o f sex- l inked recess ive

Dea th

Abort ion or s t i l l b i r th (sex unspeci f ied)

Propos i tus

M e t h o d s o f i d e n t i f y i n g p e r s o n s i n a p e d i g r e e ; h e r e t h eproposi tus is ch i ld 2 in generat ion II or I I .2

Cosangu ineous marr iage (mar r iage o fb lood re la t ives)

7

Using the collected information and subsequently transform it into pedigree

representation, genetic counsellors then proceed with the risk assessment and finally

advise the counselee on the outcome.

2.2. The estimation of risks

The correct estimation and interpretation of risk is important in genetic counselli ng; it

is similar to the situation where a doctor must prescribe a correct dosage of medicine

to a patient. An incorrect or inaccurate estimation and interpretation might lead

counselees to misery.

Medical genetics problems do not provide an absolute ‘yes’ or ‘no’ solution;

information associated with genetic risks works entirely in terms of probabiliti es and

odds. The following sub-section will further discuss the details of genetic risks

calculations.

2.2.1. Laws of the probabilit y

The starting point in genetic risk calculation is to be able to grasp and understand the

basic laws of probabilit y. In genetic risk calculation there are two basic laws, the laws

of addition and multiplication.

2.2.1.1. Law of addition

In probabilit y this law applies when where no two events have any outcomes in

common, in other words the events are mutually exclusive the probabilit y of the union

of the events. The probabilit y of the union of the events can be obtained by summing

the probabiliti es of the individual events. For example, if the probabilit y that a

8

newborn child is a male equals 0.5 and the probabilit y that a newborn child is a female

equals 0.5. Then the probabilit y that the newborn will be either male or female equals

to 1 i.e. 0.5 + 0.5 = 1.0.

2.2.1.2. Law of multiplication

This law is applied to independent events i.e. the occurrence of one event does not

affect the probabilit y of the occurrence of the second event. For example, if a pregnant

woman is expected to give birth to dizygotic twins, the probabilit y that both the twins

are male equals the product that both the newborn are male i.e. ½ x ½ = ¼.

2.2.2. Bayes’ theorem

In genetic risk calculation Bayes’ theorem offers a method for considering all

possibiliti es or events and then modifying the probabiliti es for each of these by

incorporating information which sheds light on which is most likely (Young, 1999). In

practice this theorem provides an extremely effective means of quantifying genetic

risks because it shows how new information can properly be used to update or revise

an existing set of the risks.

Using gathered family data and disease characteristic, the initial or ‘prior’

probabilit y of each event, such as being a carrier or not being a carrier is calculated.

The probabilit y calculation is based on ‘anterior’ information taken from ancestral

family history. Then the probabilit y will be modified by conditional probabiliti es

derived using ‘posterior’ information, which is based on observations i.e. the results of

9

carrier tests. The use of ‘posterior’ information to modify the ‘prior’ probabilit y is

known as Bayes’ theorem.

The following steps describes the details of Bayes’ theorem,

1. If P(A) is the prior probabilit y of an event A occurring and

2. P(NA) is the prior probabilit y of event A not occurring and

3. P(O|A) is the conditional probabilit y of observation O occurring if A

occurs and

4. P(O|NA) is the conditional probabilit y of observation O occurring if A

does not occur, then the overall probabilit y of event A given that O is

observed equals

[ ] [ ])|()()|()()(

)|()(

NAOPNAPAOPAPAP

AOPAP

×+××

Or in the tabular form,

Probability Event A occurs Event A does not occur

Prior P(A) P(NA)

Conditional O occurs P(O|A) P(O|NA)

� � � � � � � � �×

� � � � � � � � �×

� � � � �

10

The posterior probabilit y of event A occurring equals

[ ] [ ])|()()|()()(

)|()(

NAOPNAPAOPAPAP

AOPAP

×+××

The posterior probabilit y of event A not occurring equals

[ ] [ ])|()()|()()(

)|()(

NAOPNAPAOPAPAP

NAOPNAP

×+××

Example 2.1

Figure 2.2 Autosomal recessive inheritance: carrier risk to the

healthy sibling for someone with an autosomal recessive

disorder

In Figure 2.2 the women’s brother is affected with a rare autosomal

recessive disease. A carrier detection test is available: assuming that

mutation screen picks up 80 per cent of the disease mutation.

11

The Bayesian calculation to calculate the carrier risk for the women,

Probabilit y Carrier Not a carrier

Prior 2/3 1/3

Conditional: normal on

screening 0.20 1.0

Joint 2/3 × 0.20 = 0.1333 1/3 × 1.0 = 0.3333

Posterior 2857.0

3333.01333.0

1333.0 =+

The above calculation shows that the women’s posterior risk of being a carrier

is reduced from its ‘prior’ value of 2/3 (0.6667) to 0.2857. This example ill ustrates the

importance of Bayesian reasoning in genetic risk calculation.

2.2.3. Risk calculation: Autosomal recessive inheritance

Autosomal recessive inheritance is a mode of disease inheritance from one individual

to another in a family. A simple explanation to describe this type of disease

inheritance; there is the possibilit y of a person inheriting a disease when there is a

family member who is affected with the disease, and there are carriers in the family

who are responsible of transmitting the disease i.e. the autosomal recessive diseases

are transmitted and inherited through family relationship.

12

In general the calculation of genetic risk involves the process of investigating

members of a pedigree. It is a process of identifying and determining obligatory and

possible carriers and individuals who are affected and are potentially to be affected

with a disease. These conditions are represented by the probabilit y value of the carrier

and affected risk that each individual holds.

2.2.3.1. Carrier detection

One of the major tasks in risk calculation is the abilit y to identify those individuals

who, while apparently healthy themselves, have a high risk of transmitting a genetic

disorder. There are two types of carriers, the obligatory and possible carriers.

Obligatory carriers for autosomal recessive disorders include all children and parents

of an affected individual, while other relatives are possible carriers.

The process of identifying carriers in a family is important because the risk of

someone being a carrier will determine the carrier risk of others in the family. In order

to calculate the probabilit y of someone will be affected with a disease, it requires the

product of the carrier risk for each parent, multiplied by 1/4, the chance that they will

both pass on the gene if they are carriers. The details of the carrier risk calculation will

be explained later in the successive sections.

2.2.3.2. Population risk: Hardy-Weinberg equili brium

The Hardy-Weinberg equili brium is a fundamental principal and is often utili zed in

risk calculation (The basis of the Hardy-Weinberg equili brium and the variation from

it are covered fully in genetics textbooks and are not given here). This concept is used

13

to estimate the disease frequency for a population and it enables the carrier frequency

of an autosomal recessive disorder to be determined if the disorder is known. Thus, if a

population is in Hardy-Weinberg equili brium:

Phenotype Frequency

Abnormal homozygotes (= disease

frequency/Affected)

q2

Normal homozygotes p2

Heterozygotes (carrier) 2pq

p = frequencies of normal genes q = frequencies of abnormal genes p + q = 1

Table 2.1 Hardy-Weinberg equili brium: disease frequency

In risk calculation, the carrier risk for someone with no family history equals

2pq, provided that the partner is unrelated.

Example 2.2

Figure 2.3 Carrier risk using Hardy-Weinberg equili brium

1

14

Referring to Figure 2.3 it is assumed that the population is in Hardy-

Weinberg equili brium, therefore the carrier risk of individual 1 equals

2pq.

2.2.3.3. Risks to the offspring of a healthy sibling

The probabilit y for a healthy sibling of someone with an autosomal recessive disorder

is a carrier equal 2/3. The probabilit y risk that his/her own child could be affected is

calculated by multiplying the independent probabiliti es of the sibling and his/her

partner are carriers and then multiplying by 1/4.

Example 2.3

The healthy brother (individual 1) of someone with cystic fibrosis wish

to know the probabilit y that his first child will be affected. Assume that

the man’s partner is not a blood relative, has no family history of cystic

fibrosis and the disease incidence is assumed to be 1/2500.

Figure 2.4 Risk to the healthy sibling of someone with cystic

fibrosis for having an affected child (Disease incidence is

assumed to be 1/2500)

3

21

15

Detail calculation:

∴ Disease Incidence = q2 = 1/2500 ; q = 1/50 = 0.02 & p = 49/50 = 0.98

Individual 1 (Father):

Healthy sibling of someone with an autosomal recessive disorder is a

carrier

∴ Carrier risk = 2/3

Individual 2 (Mother):

Assuming that the population is in Hardy-Weinberg equili brium

∴ Carrier risk = 2pq = 2 × 4/50 × 1/50 = 2 × 0.98 × 0.02 = 0.0392

Individual 3:

Risk = Father carrier risk ×× Mother carrier risk ×× the chance both

parents will both pass on the gene if they are carriers

∴ Risk of being affected = 2/3 × 0.0392 × 1/4 = 0.00653

16

2.2.3.4. Risks to the extended family

The parent and children of a patient with an autosomal recessive disorder are

obligatory carriers, while second-degree relatives (uncles, aunts, nephews, nieces, half-

sibs, grandparents) will have a 50 per cent chance of being a carrier (Figure 2.5).

Figure 2.5 The probabilit y of being a carrier in relatives of

someone with an autosomal recessive disorder

The likelihood for other family members being a carrier will be reduced by 50

per cent for each degree of relationship removed from parents, so that it is relatively

simple to estimate the probabilit y of any relative being a carrier if their closeness to the

patient is known.

1

3

2

1 1

2

1

2

1

2

1

2

1

17

Example 2.4

Figure 2.6 Risk to extended family (Disease incidence is

assumed to be 1/2500)

II .1 had Cystic Fibrosis (CF). There is no family history for I.3. Assume

CF affects 1 person in 2500 in the population. The risk that foetus II .2

will have CF is:

Detail calculation:

∴ Disease Incidence = q2 = 1/2500 ; q = 1/50 = 0.02 & p = 49/50 = 0.98

Individual I.2 (Mother):

The parent and children of a patient with an autosomal recessive

disorder are obligatory carriers

∴ Carrier risk = 1.0

Individual I.3 (Father):


∴ Carrier risk = 2pq = 2 × 4/50 × 1/50 = 2 × 0.98 × 0.02 = 0.0392

I I .1 II.2

I.1 I.2 I.3

18

Individual II .2:

Risk = Father carrier risk ×× Mother carrier risk ×× the chance both parents

will both pass on the gene if they are carriers

∴ Risk of being affected = 0.0392 × 1 × 1/4 = 0.0098

2.2.3.5. Risk to the offspring of an affected parent

The risk of a child born to an affected parent will also be affected equals half the

probabilit y that the unaffected parent is a carrier.

Example 2.5

Figure 2.7 Risk to the offspring of someone with an autosomal

recessive disorder who marries a healthy unrelated individual

Referring to Figure 2.7 the probabilit y of the first child will be affected

by the disease,

I I .1 II.2

19

Individual II .1 (Mother):


∴ Carrier risk = 2pq

Individual II .2 (Father):

The individual who is affected with the disease

∴ Carrier risk = 1.0

The child:

Risk = Father carrier risk ×× Mother carrier risk ×× the chance of the

unaffected parent will pass on the gene if he/she is a carrier

∴ Risk of being affected = 1.0 × 2pq × ½

20

Chapter 3 - Software Development

3.1. Evolutionary development

In this project an evolutionary development [11] method is being used to develop the

software. This method is chosen because the project nature is to work with the genetic

counsellors to explore their requirements and deliver a final system. The software

development starts by developing prototype of the parts of the risk assessment

processes which are understood. Later, the software could be evolved by adding new

features as they are proposed by the genetic counsellors.

3.2. Software Design

3.2.1. System Overview

The software main function is to automate genetic risk calculation. In order to perform

the task, the software accepts input such as of family relationships, disease parameter

and individual disease data. The output from the software is the risk value of interest.

Figure 3.1 The system overview

Genetic RisksComputat ion System

Probab i l i t y tha t ind iv idua ll i a b l e t o t r a n s m i t t h ed isease

Probab i l i t y tha t ind iv idua laf fected with the disease

Family relat ionships

Indiv idual d isease data

Disease parameter

Input Output

21

Manual calculation of genetic risk is exposed to error and inaccuracy because

of the magnitude of work involved to process the data. The software uses the same

kind of reasoning that people use to compute genetic risk. Figure 3.1 summarizes the

software input and output requirement.

3.2.2. Sub-system modules

Figure 3.2 The sub-system component for computing genetic

risk

In software development project, software is divided into modules so that these

modules can be independently developed and tested. Later, at one stage of the

development these modules are combined to make up complete software. Figure 3.2

shows the sub system modules for the current software and the followings are the

explanation.

1. User interface: this component acts as an interaction point between

genetic counsellor and the software. Inputs containing information on

Genet iccounsel lor

User in ter face

Da tabase management

Fami ly real t ionshipsdata sturcture

Disease inher i tancef low rules

Risk probabi l i tycalculat ion

System output

22

family relationships, disease parameter and individual disease data are

fed to system through the user interface. Missing data are replaced by

default parameters that were pre-programmed into the system. It is to

ensure that the system will behave as intended so that the calculated

risk assessment values are accurate.

2. Family relationships data structure: this module handles the family

relationships data which are fed into the software via user interface.

This module is responsible in linking and mapping the family

information data so that it reflects the actual pedigree. The detail

explanation on the data representation can be found in the next section.

3. Database management: the family relationships data located in the

computer memory are volatile and temporary. This module manages

the data storage and retrieval from and to the database. Without this

module the data that were manipulated and analysed cannot be stored.

This module is not available in the current software but the database

design and suggestions on implementing it are described in the ‘f urther

development and enhancement’ section under chapter 5.

4. Risk probabilit y calculation: this module performs the risk probabilit y

calculation. Rules to determine the probabilit y of someone being

affected with a disease and the probabilit y of being a carrier are

23

embedded into this module. These rules are explained later in the next

section.

5. Disease inheritance flow rules: for autosomal recessive inheritance, the

risk of someone being affected with a disease depends on the

knowledge of the ancestral and descendants family history. This

module contains the rules for monitoring the disease inheritance flow

i.e. to identify to whom the disease will be transmitted and then will

assign suitable carrier risk value to the person.

6. System output: the main function of this module is to read data from

the memory that contains the family relationship information and

display it on the GUI (graphical user interface) in the form of pedigree.

This feature is important because genetic counsellors use pedigree as

an aid during risk assessment. Furthermore the pedigree explains the

relationship of a person to another (marriage, siblings, parents, cousin

etc.).

24

3.3. Essential program logic

In order to appreciate the program logic, an understanding of object oriented

programming and design is required and the knowledge of Java programming

language is essential. The brief explanation on the program logic is just to provide an

overall view of the system algorithm. However, perusing into the software source

code is necessary in an attempt to understand the software internal working

mechanism.

3.3.1. Risk Calculation

3.3.1.1. Important key points

The genetic risk calculation is implemented using rules for calculating autosomal

recessive inheritance. The important key points that are translated into software

program logic:

1. The carrier risk for someone with no family history equals 2pq,

provided that the partner is unrelated.

2. The probabilit y that a healthy sibling of someone with an

autosomal recessive disorder is a carrier equals 2/3.

3. The probabilit y that any two unaffected individuals will have a

child with an autosomal recessive disorder equals the product of

the probabiliti es that each is a carrier multiplied by 1/4.

25

4. The probabilit y that a member of the extended pedigree is a

carrier of an autosomal recessive disorder is halved for each

degree of relationship removed from the parents.

5. The carrier risk for someone who is heterozygotes or affected

with the disease equals to 1.0.

6. The probabilit y that a child born to an affected parent will also

be affected equals half the probabilit y that the unaffected parent

is a carrier.

3.3.1.2. Determining the risk of being a carrier

The process of determining whether someone is a carrier is strictly simple. If someone

is affected with a disease, therefore the parents and children of that individual might

be a carrier too. Assignment of the carrier risk value to the individual depends on the

factor 1, 2, 4 and 5 as described in the previous section.

For monozygotic twin pairs, the risk of being a carrier are the same for all

pairs and there should be complete concordance i.e. both twins should either be a

carrier or not. Since dizygotic twins are genetically no more alike than the normal

siblings, the assignment of carrier risk value is essentially the same as for normal

siblings.

26

3.3.1.3. Determining the risk of being affected

It is observed that for every individual in the pedigree the risk of being affected by a

disease is as the following:

Risk of being affected = A ×× B ×× C

A – Father carrier risk

B – Mother carrier risk

C – The chance of being affected with the disease.

Referring to the previous section the value of A and B is determined by the key

points 1, 2, 4 and 5, while the value of C is determined by the key points 3 and 6.

For monozygotic twin pairs, the risk of being affected is the same for all pairs,

there should be complete concordance i.e. both twins should either have or not have

the disorder. There is no increased risk of the disease in subsequent children in normal

children. Since dizygotic twins are genetically no more alike than the normal siblings,

the risks are essentially the same as for normal siblings.

27

3.3.1.4. Online carrier risk computation

The manual process of calculating carrier risk for individuals in a pedigree are done

off line i.e. after the pedigree diagram of a family relationships that are related to the

propositus has been completely drawn. However, for this software the process of

calculating carrier risk for each and every individual is done online, that is to say

when new information being added or changes being made to the family or disease

information the software will automatically updates the carrier risk probabilit y of

individuals who are related with the person whose data being updated or added.

3.3.2. Drawing pedigree: Connecting elements

On the system GUI elements are drawn and connected together, so that it will

represent a complete pedigree. In order to connect elements together, connection

property is assigned to each individual element and it is shown in Figure 3.2.

Figure 3.3 Graphics elements properties: Basic symbols used in

pedigree to represent individuals

In Figure 3.3 the dimension of each element is 20 × 20. The centre point of the

element acts as a reference point to connect from one element to another, while

Centre point

20

20

20

20

20

20

Position

28

the position property will determine the location of the element displayed on the GUI.

Figure 3.4 Graphics elements properties: Connecting children

and parent

Figure 3.4 shows the property connection for connecting parent and children. It

can be seen that elements are connected to each other by relationships that are

represented by lines. Basically in a pedigree there are four types of relationships which

are marriage, parent, sibling and twins. Therefore there are four types of lines that

represent these types of relationships and each of this line has its own centre point that

act as a reference point for connection. The rules for connecting elements in a pedigree

are as the following:

1. Married individuals are connected to their partner using the marriage

relationship line.

2. Siblings are connected together using the children relationship line.

(x,y)

(x,y + 40)Parent link position

Var

iabl

ele

ngth

Var iable length

Children link centreposition

Twins relationship centrepoint

Marriage link position

29

3. To connect the parents and their children, a connection line is drawn

from the centre point of the marriage line to the centre point of the

children relationship line.

4. When there are twins among the children, the twins are connected

together using twins relationship line. The centre point for this

connection is then connected to the children relationship line.

3.3.3. Data structure

Since the software was developed using Java, therefore object oriented programming

approach is used in designing the software data structure. In the software codes,

Figure 3.5 Pedigree and its data structure representation

Person

Marriage

Person

Children

Person Person Person Person

Children

Marriage

Person Person

30

there are objects representing various kind of information. However, the most

important is the one that represents the family relationship information and pedigree

data. The objects are:

1. Person: contains information about a person such as age, disease

information and pointers to marriages object instances (linkage to

parents and his/her marriage information).

2. Marr iage: stores marriage information such as pointers to the persons

object instances (male and female individuals) and children object

instances.

3. Children: holds pointers to the person object instances.

Figure 5.3 shows the example of implementing this data structure to represent

family information. Marriage object instance contains pointers to persons object

instances (male and female) and also this object instances will point to children object

instance resulting from that relationship. The children object instance holds several

pointers that point to numerous individuals. The linkage between these object

instances represents a pedigree.

31

Chapter 4 - Automatic risk calculation

4.1. Software test results

In order to ensure that the automatic risk calculation result is correct, the risk

calculation from the software output and manual calculation are compared. The

following sub sections compares the risk calculation results for different cases of

autosomal recessive disorder.

4.1.1. Risk to the offspring of a healthy sibling

Example 4.1

(Refer to Figure 4.1) Individual 1 had Cystic Fibrosis (CF). There is no

family history of CF in the family of individual 5. Assume that CF

affects 1 person in 2500 in the population.

The risk of being affected for individual 6:

Manual calculation:

1. The probabilit y that the healthy mother (4) is a carrier equals 2/3,

because her brother sibling is affected with CF.

2. The probabilit y that the father (5) is a carrier equals the

frequency of carriers in the general population i.e. 2pq =

2(0.02)(0.98) = 0.0392.

32

3. The probabilit y that individual 6 will be affected equals the

product of the probabiliti es that the parents are carrier multiplied

by 1 in 4, the chance that they will both pass on the gene if they

are carriers. This therefore gives a risk of 2/3 x 0.0392 x ¼ which

equals 0.00653

Automatic calculation result:

Figure 4.1 Software output (Risk to the offspring of a healthy

sibling): Carrier risk values

33

Figure 4.2 Software output (Risk to the offspring of a healthy

sibling): Risk of being affected for individual 6

4.1.2. Risks to the extended family

Example 4.2

(Refer to Figure 4.3) Individual 1 had Cystic Fibrosis (CF). There is no

family history of CF in the family of individual 4. Assume that CF

affects 1 person in 2500 in the population.

34


Manual calculation:

1. The probabilit y that the healthy mother (3) is a carrier equals 1.0,

because the parents of someone with an autosomal recessive

disorder must be carriers.

2. The probabilit y that the father (4) is a carrier equals the

frequency of carriers in the general population i.e. 2pq =

2(0.02)(0.98) = 0.0392.

3. The probabilit y that individual 5 will be affected equals the

product of the probabiliti es that the parents are carrier multiplied

by 1 in 4, the chance that they will both pass on the gene if they

are carriers. This therefore gives a risk of 1 x 0.0392 x ¼ which

equals 0.0098.

35


Figure 4.3 Software output (Risks to the extended family):

Carrier risk values


Risk of being affected for individual 5

36

Example 4.3

See Example 4.2. This relates to the same situation as in

Figure 4.4, but the ‘at risk’ half sibling already has two healthy

full siblings.


Manual calculation:

Probabilit y Both 3 and 4 are carriers 4 is not a carrier

Prior 0.0392 0.9608

Conditional

2 health children (3/4)

2 = 9/16 1

Joint 0.02205 0.9608

1. The probabilit y that both parents are carriers equals

02243.09608.002205.0

02205.0 =+

2. The probabilit y that individual 5 will be affected equals

02243.0 x 0.25 = 0.005608. Clearly this value is less than the

one derived in Example 4.2.

37


Figure 4.5 Software output (Risks to the extended family): This

relates to the same situation as in Figure 4.4 , but the ‘at risk’

half sibling already has two healthy full siblings

38

Example 4.4

Figure 4.7 demonstrates the software capabilit y of determining the

probabilit y of being a carrier for various members of extended family.

The probabilit y that a member of the extended pedigree is a carrier of an

autosomal recessive disorder is halved for each degree of relationship

removed from the parents.



Carrier risk probabilit y for various members of extended

family

39

4.1.3. Risk to the offspring of an affected homozygote

Example 4.5

(Refer to Figure 4.7) Individual 6 whose father is healthy, while the

mother is affected with an autosomal recessive disease. Assume that the

disease affects 1 person in 2500 in the population.

Manual calculation:

Here the probabilit y that individual 6 will be affected will equal

1.0 x 2pq x ½ = 1.0 x 2(0.02)(0.98) x ½ = 0.0196


Figure 4.7 Software output (Risk to the offspring of an affected

homozygote): Risk to the offspring of someone with an

autosomal recessive disorder

40

Example 4.6

(Refer to Figure 4.9) An individual with a rare, autosomal recessive

disorder marries unaffected first cousin.

Manual calculation:

1. The probabilit y of heterozygosity in the first cousin equals 1 in 4

2. The probabilit y of the first child being affected will equal

1.0 x ¼ x ½ = 1/8 = 0.125



homozygote): Carrier risk probabilit y for various members of

extended family

41


homozygote): Risk to the offspring of someone with an

autosomal recessive disorder who marries a healthy first cousin

42

Chapter 5 - Conclusions and Recommendations

5.1. Conclusions

Genetic counselli ng is the process by which individuals and/or their relatives at risk

for a transmissible disorder are advised of the consequences of the disorder, of the

likelihood of developing the disorder and transmitting to offspring and of the ways in

which the disorder may be prevented. The genetic counselli ng process involves risk

calculations and genetic counsellors can be easily overwhelmed by the size of work

involved in carrying out the calculations in order to produce a complete analysis. Risk

assessment plays an important role in genetic counselli ng because the correctness of

the advice given to counselee depends on the result taken from the calculated risk.

Inaccurate risk estimation will humiliate genetic counsellors and might lead the

counselee to misery.

Software development requires knowledge in genetic risk assessment

techniques and software engineering. Given a limited period time enough information

has been gathered on both areas suff icient to get the project started and develop a

prototype risk assessment software. The software development starts with the process

of getting user requirements and understands the flow of genetic counselli ng process

and its risk calculation. Later, based on the user requirement, the software is designed

and developed. The software design and development involves the process of

43

identifying software development approach, software design methodology and the

selection of programming tool and language.

After taking into consideration of time constraints and workload, software

development was reduced from developing full -scale risk assessment software to

prototype software that can solve simple genetic disorder problems and as a

framework which permits further enhancement. For the next stage of the development

several important features have to be included as part of the software and are

described later in the next section.

The project objective is to automate risk assessment task, therefore translating

manual process of risk calculation into computer program logic is necessary. Since the

actual process of risk assessment starts by collecting family information, therefore the

first module written into the software is the one that handles user inputs. Much time

is spent on designing data structure and writing the program for this module. In order

to ensure that information represented by the data structure is correct, a program to

display windows of user-friendly GUI (Graphical User Interface) also being written.

The GUI accepts inputs and draw up the stored information in the form pedigree.

Subsequent to the completion of GUI development and the data input module,

the next stage of the software development is to write program for risk calculation.

Rules used by genetic counsellors to calculate genetic risk and assignment of carrier

risk to related individuals in the family are programmed into system. The rule

44

that was written into the software does not cater for all types of genetic disorders but

only for specific cases of autosomal recessive disorder.

Even though the outcome of this project is a prototype software, but the

software could be evolved by adding new features to it. Recommendations on the

software enhancements and improvements are explained in the next section. The most

important achievement that I would like to highlight; from this project I have gathered

vast amount of knowledge in software development especially on the object oriented

software design and development and Java programming. Besides that, I also gained

experience on the practical use of probabilit y theory to support decision making as

this is shown by the use of Bayesian conditional probabilit y in risk assessment.

5.2. Further Development and enhancement

5.2.1. Risk calculation

This software does not solve the risk assessment for all cases of autosomal recessive

disease inheritance. Young [3] in his book Introduction to Risk Calculation In Genetic

Counselli ng explains the detail on this topic. The other cases that need to be tackled

are explained in the following sub sections.

5.2.1.1. Consanguinity

Consanguinity, or marriage of close relatives, is common and important problem in

genetic counselli ng. The presence of consanguinity influences the risks where an

inherited disorder is present in the family. In this software where consanguinity

present, the automatic calculation of the risk value is correct when at least one person

45

in the family is affected with a disease. However, for cases where consanguineous

couples with no history of autosomal recessive disease in the family the calculation

result for the risk of their child being affected is not accurate.

To tackle the risk assessment for the child of a consanguineous relationship

two other variables need to taken into the calculation, the coeff icient of inbreeding

and coeff icient of relationship. The coeff icient of inbreeding relates to child of a

consanguineous relationship and indicates the probabilit y that the child will be

homozygous for a specific gene derived from a common ancestor. The coeff icient of

relationship relates to child of a consanguineous couple and indicates the proportion

of genes which on average they would be expected to share by descent from common

ancestor.

5.2.1.2. The use of screening result for carrier detection

The carrier risks that are presently being used in the software is either based on the

assumption that the individual is a carrier because of someone in the family is affected

with the disease or for an individual with no family history the carrier risk is taken

from carrier frequency (2pq).

When carrier detection test has been done on the individual, the conditional

probabilit y derived from the test result and with the use of the ‘prior’ probabilit y will

significantly modify the ’posterior’ probabilit y of the individual being a carrier and

will make the risk assessment value more accurate and reliable. Therefore in the next

stage of the software development it is important to include this feature.

46

5.2.1.3. Siblings with different autosomal disorders

Figure 5.1 Healthy parents have two children each with a

different autosomal recessive disorder.

The above figure shows the case where parents had two children with a different

autosomal recessive disorder. In this system it is assumed that parents only had

children with same disorders, therefore the risk calculation does not consider the case

of someone with siblings of different disorders. The detail procedure for calculating

the risk of this type of case is explained in detail by Young in his previously

mentioned book.

5.2.1.4. Separate mutations consideration

For the current software, in risk calculation it has been assumed that the disorder is so

rare that the possibilit y of heterozygosity in family members due to inheritance of a

different mutation is very low that it can be ignored. However, for diseases that are so

common as is the case for cystic fibrosis, then an effort should be made to allow for

possibilit y that a separate mutation is segregating independently in the family and

different approach of risk calculation need to be used.

?

47

5.2.2. Software enhancement

5.2.2.1. Database module

The current software does not allow the pedigree data to be stored for future reference

and analysis because the database management module is not available. The

suggestion for implementing this module and its architecture is shown below in

Figure 5.2.

Figure 5.2 Database connection and implementation

architecture

Based on the implementation architecture, for storing data the database

management module has to read family relationship data structure and convert it into

database table format and vice versa when retrieving data from the database. Before

performing any data storage and retrieval task, the system must first be connected to

the database and this can be done either using Microsoft ODBC (Open Database

Connectivity) or Java JDBC API. The database connection type chosen will determine

the type of programming codes that need to be written and added into the database

management module.

MicrosoftODBC / Java JDBC API

Databasemanagement

DatabaseSystem

(mySQL, MicrosoftAccess etc.)

Risk assessment system

Familyrelationship

data structure

48

Figure 5.3 Database design: Entity relationship diagram for the

system database tables (refer to Elmasri and Navathe [16] for

information on the ER diagram notation)

1

N

N a m e

L n a m e

Min i t

F n a m e

Sex

Sib l ing

M a l e I D

Fema le ID

ParentMar r iage ID

N P A R E N TO F

I N V O L V EI N

1

P E D I G R E E

N

1

H A V E

H A S1

N

Pedigree ID

I D

Ped igree ID

Pedigree ID Date Crea ted Creator C o m m e n t

A g e

Mar r iage ID

Disease In fo

S ib l ing ID

Re la t ionsh ipType

YX

Graph icproper t ies

M A R R I A G E

Suspec tedCarr ie r

Carrierprobab i l i t y

S ta tus

Mar r iage ID

I N D I V I D U A L

N

49

The table format for the data stored in the database will be based on the ER

diagram in Figure 5.3 and the detail description of the tables and its attributes are:

Table: INDIVIDUAL

Contains information on individuals i.e. disease information, sibling and parents.

Attribute Description

ID Primary key.

Age Age.

Sex Will be indicated by the character ‘M’ – Male and ‘F’ – female

Sibling ID The next younger sibling ID.

Relationship

Type

The relationship between the person and his next younger sibling. N –

normal, M – monozygotic twins and D – dizygotic twins.

Lname Last name.

Minit Initial of middle name.

Fname Forename.

Suspected Carrier A type of boolean flag to indicate whether the person is suspected to be

a disease carrier.

Carrier

Probabilit y The person’s carrier risk probabilit y.

Status Disease information status. H – Heterozygotes (Carrier), N – Normal,

A - Affected

Pedigree ID Foreign key originated from pedigree table.

Marriage ID Foreign key originated from marriage table. Link the person to his or

her spouse.

Parent Marriage

ID Foreign key originated from marriage table to identify his/her parents.

X The X position of the element on GUI for pedigree drawing purposes.

Y The Y position of the element on GUI for pedigree drawing purposes.

Table 5.1 Database design: Individual table with attributes

50

Table: PEDIGREE

Stores the pedigree information.


Pedigree ID Primary key

Date Created Date when the pedigree is created.

Creator The person who draw the pedigree.

Comment Comments made the author on the pedigree or the analysis.

Table 5.2 Database design: Pedigree table with attributes

Table: MARRIAGE

Data on marriages between individuals.


Marriage ID Primary key

Male ID The husband.

Female ID The wife.

Pedigree ID Foreign key originated from pedigree table.

Table 5.3 Database design: Marriage table with attributes

5.2.2.2. Risk calculation for other Mendelian inheritance disorders

Autosomal recessive is not the only pattern of disease transmission that falls under the

category of Mendelian inheritance disorders. The others are autosomal dominant and

X-linked recessive. Basically, the calculation of risk for Mendelian inheritance

disorders starts by collecting family history information and draw up family tree or

pedigree. Next, is to identify individuals who are affected with the disease and then

determine the pattern of disorder transmission in the family whether it is an X-linked

recessive or autosomal dominant or autosomal recessive.

51

In this software the faciliti es to draw up pedigree is already available and with

some minor modification and addition to the software codes, modules to handle risk

calculation for autosomal dominant and X-linked recessive can be included as part of

the system facilit y. The inclusion of both type of disease inheritance will make the

system an irresistible instrument in assisting genetic counsellors during genetic

counselli ng.

52

BIBLIOGRAPHY

1. Pearl, J. (1988). Probabili stic Reasoning in Intelli gent Systems. Morgan

Kaufmann.

2. Hodge, S.E (1998). A simple unified approach to Bayesian risk calculations.

Journal of Genetic Counselli ng , 7, 235-261.

3. Young, I. D. (1999). Introduction to Risk Calculation In Genetic Counselli ng.

Oxford University Press.

4. Pathak, K. D. et al (1994). Automatic computation of Genetic Risk. Proceeding

of the 10th Conference on Artificial Intelli gence for Applications, 164-170.

5. Damme, V. J. (1992). An automatic surveill ance of genetic risk in primary

care linkage of routinely used individual patients records. Proceeding of the

7th World Congress on Medical Genetics, 756-60 vol. 1.

6. Harper, P. S. (1999). Practical Genetic Counselli ng (5th Edition.). Butterworth

Heinemann.

7. Rivas, M. L. et al (1987). Risk XLR: Microcomputer Based Genetic Risk

Program for X-Linked Recessive Traits. Proceedings of the 11th Annual

Symposium on Computer Applications in Medical Care, 193-198.

53

8. Emery, A. E. H. and Mueller, R. F. (1992). Elements of Medical Genetics (8th

Edition). Churchill Livingstone.

9. Hayter, A. J. (1996). Probabilit y and Statistics for engineers and scientist.

Pws Publishing Company.

10. Pressman, R. S. (2000). Software Engineering: A Practitioner’s Approach

(European Adaptation). McGraw Hill .

11. Sommervill e, I. (2001). Software Engineering (6th Edition). Addison-Wesley.

12. Shtern, V. (2000). C++ : A Software Engineering Approach. Prentice Hall

PTR.

13. Horton, I. (2001). Beginning Java 2. Wrox Press.

14. Castagnetto, J., Rawat, H., Schumann, S., Scollo, C. and Veliath, D. (2000).

Professional PHP Programming. Wrox Press.

15. MySQL Reference Manual (2000). TcX AB, Detro HB and MySQL Finland

AB.

16. Elmasri R. and Navathe, S. B. (1994), Fundamentals Of Database Systems

(2nd Edition). Addison Wesley.

development of a software tool for risk assessment in ... · development of a software tool for...

Documents