development of a software tool for risk assessment in ... · development of a software tool for...
TRANSCRIPT
I
Development Of A Software Tool For Risk
Assessment In Genetic Disorders
Yusman Yusof
MSc. in Control Systems Engineering: August 2001
II
ABSTRACT
This dissertation describes the process of developing a prototype software tool to
automatically calculate risks for autosomal recessive inheritance diseases. The process
involves the understanding of genetic risk calculation and software design and
development. To calculate genetic risk, genetic counsellors consider a variety of data
including family history and disease characteristics. However, with regard to the
massive data involved, manual processing is not only tedious, it is also error prone.
The key element of the project is the translation of the manual process of risk
calculation into program logic. Probabilit y law, utili zing Bayesian conditional
probabilit y is used for the genetic risk calculation. The software was developed using
Java, while the database design is based on the relationship database system. The
purpose of opting for such method is to make the software platform independent
which can later be upgraded to a web based client server system.
The current software provides a general framework which allows further
development of an automatic risk assessment software. The next phase of software
development is to emphasize on the risk calculation for consanguinity cases, the
posterior probabilit y of a person being affected after screening process on an
individual has been done and etc. Other than that, is to include utilit y features as part
of the software in order to make it a better supporting tool for genetic risk assessment.
III
EXECUTIVE SUMMARY
INTRODUCTION/BACKGROUND
Genetic counsellors begin risk assessment process by collecting varieties of
information including family history, disease characteristics and etc. Next, these
information are analysed and the risk for various genetic disorders will be calculated.
With regards to the massive amount of information involved, manual process of risk
calculation is not only tedious, it is also error prone.
AIMS AND OBJECTIVES
The aim and objective of the project is to develop a software tool that will
automatically calculate the risk for autosomal recessive disorders.
ACHIEVEMENTS
So far, the developed software is considered a prototype because it only allows risk
calculations for certain cases of autosomal recessive disorder.
CONCLUSIONS AND RECOMMENDATIONS
The current prototype software provides a general framework which allows further
development of an automatic risk assessment software. The prototype software will
evolve into a fully functional software when enhancements and improvements on the
software are made.
IV
ACKNOWLEDGEMENTS
Upon the completion of my dissertation, I would like thank my project
supervisor, Dr. R. F. Harrison who has been generous with his time and
ideas. I have been most influenced by his ideas.
In addition I would to thank Dr. Ann Dalton for her exhaustive introduction to
medical genetics especially on the practical side of genetic counselling and
risk calculation.
Finally I owe a great debt to the rest of my family and friends: reminding me
why it all matters, my parents: for sheltering me from travails of the real world
and surrounding me with so much love, support and hope.
TABLE OF CONTENTS
Chapter 1 - Introduction ........................................................................................... 1
1.1. A brief introduction......................................................................................1
1.2. Structure of the dissertation.........................................................................2
1.3. Related work ................................................................................................3
Chapter 2 - Risk Assessment..................................................................................... 5
2.1. Genetic risk: a brief introduction................................................................. 5
2.2. The estimation of risks.................................................................................7
2.2.1. Laws of the probabilit y ........................................................................7
2.2.2. Bayes’ theorem.....................................................................................8
2.2.3. Risk calculation: Autosomal recessive inheritance............................11
Chapter 3 - Software Development ........................................................................ 20
3.1. Evolutionary development .........................................................................20
3.2. Software Design.........................................................................................20
3.2.1. System Overview ...............................................................................20
3.2.2. Sub-system modules..........................................................................21
3.3. Essential program logic..............................................................................24
3.3.1. Risk Calculation.................................................................................24
3.3.2. Drawing pedigree: Connecting elements...........................................27
3.3.3. Data structure.....................................................................................29
Chapter 4 - Automatic risk calculation.................................................................. 31
4.1. Software test results...................................................................................31
4.1.1. Risk to the offspring of a healthy sibling...........................................31
4.1.2. Risks to the extended family..............................................................33
4.1.3. Risk to the offspring of an affected homozygote...............................39
Chapter 5 - Conclusions and Recommendations .................................................. 42
5.1. Conclusions................................................................................................42
5.2. Further Development and enhancement ....................................................44
5.2.1. Risk calculation..................................................................................44
5.2.2. Software enhancement .......................................................................47
Bibliography ............................................................................................................. 52
LIST OF FIGURES
Figure 2.1 Symbols used in drawing a pedigree..........................................................6
Figure 2.2 Autosomal recessive inheritance: carrier risk to the healthy sibling for
someone with an autosomal recessive disorder .................................................10
Figure 2.3 Carrier risk using Hardy-Weinberg equili brium.......................................13
Figure 2.4 Risk to the healthy sibling of someone with cystic fibrosis for having an
affected child (Disease incidence is assumed to be 1/2500) ................................. 14
Figure 2.5 The probabilit y of being a carrier in relatives of someone with an
autosomal recessive disorder .............................................................................16
Figure 2.6 Risk to extended family (Disease incidence is assumed to be 1/2500) .......17
Figure 2.7 Risk to the offspring of someone with an autosomal recessive disorder who
marries a healthy unrelated individual ...............................................................18
Figure 3.1 The system overview ................................................................................20
Figure 3.2 The sub-system component for computing genetic risk ...........................21
Figure 3.3 Graphics elements properties: Basic symbols used in pedigree to represent
individuals..........................................................................................................27
Figure 3.4 Graphics elements properties: Connecting children and parent ...............28
Figure 3.5 Pedigree and its data structure representation ..........................................29
Figure 4.1 Software output (Risk to the offspring of a healthy sibling): Carrier risk
values.................................................................................................................32
Figure 4.2 Software output (Risk to the offspring of a healthy sibling): Risk of being
affected for individual 6.....................................................................................33
Figure 4.3 Software output (Risks to the extended family): Carrier risk values........35
Figure 4.4 Software output (Risks to the extended family): Risk of being affected for
individual 5 ........................................................................................................35
Figure 4.5 Software output (Risks to the extended family): This relates to the same
situation as in Figure 4.4 , but the ‘at risk’ half sibling already has two healthy
full siblings.........................................................................................................37
Figure 4.6 Software output (Risks to the extended family): Carrier risk probabilit y for
various members of extended family................................................................. 38
Figure 4.7 Software output (Risk to the offspring of an affected homozygote): Risk to
the offspring of someone with an autosomal recessive disorder........................39
Figure 4.8 Software output (Risk to the offspring of an affected homozygote): Carrier
risk probabilit y for various members of extended family..................................40
Figure 4.9 Software output (Risk to the offspring of an affected homozygote): Risk to
the offspring of someone with an autosomal recessive disorder who marries a
healthy first cousin.............................................................................................41
Figure 5.1 Healthy parents have two children each with a different autosomal
recessive disorder...............................................................................................46
Figure 5.2 Database connection and implementation architecture............................47
Figure 5.3 Database design: Entity relationship diagram for the system database tables
(refer to Elmasri and Navathe [16] for information on the ER diagram notation)
............................................................................................................................48
LIST OF TABLES
Table 2.1 Hardy-Weinberg equili brium: disease frequency ......................................13
Table 5.1 Database design: Individual table with attributes......................................49
Table 5.2 Database design: Pedigree table with attributes.........................................50
Table 5.3 Database design: Marriage table with attributes........................................50
1
Chapter 1 - Introduction
1.1. A brief introduction
This dissertation describes the process of developing a prototype software tool to
automatically calculate risks for autosomal recessive inheritance diseases. The process
involves understanding of genetic risk calculation and software design and
development. To compute genetic risk, genetic counsellors consider a variety of data
including family history, disease characteristics and etc. However, to manually
process all the information is an error prone and tedious task.
The software uses the same kind of reasoning that people use to compute risk.
The key difference is that while people can be easily overwhelmed by the size of work
involved the program is able to carry out a complete analysis. The availabilit y of a
user friendly automatic risk assessment software will be a valuable aid to genetic
counsellors.
The project is centred around the process of translating the manual process of
calculating genetic risk into program logic. The software is designed using object-
oriented approached and the program is written in Java, while the database design is
based on the relationship database system. The purpose of opting for such method is
to make the software platform independent and later could be enhanced to a web
based client server system.
2
1.2. Structure of the dissertation
This section briefly describes the contents of this dissertation and the sequence that has
been followed in the presentation and discussion of the information.
Chapter 1 contains the introduction to the project and reviews on the previous
work on solving the problem.
Chapter 2 introduces the probabilit y theory and outlines the details of the
genetic risk assessments.
Chapter 3 reviews on the steps taken in realizing the software development
and includes the software design.
Chapter 4 compares the risk assessment calculation result between the
software outputs and manual calculation.
Chapter 5 draws the conclusion from the project work and describes the
important features that need to be added to the software and suggests enhancements to
the software for future use.
3
1.3. Related work
Pathak and Perlin [4] describe a software for genetic counselli ng. Their software
utili zes the use of DNA linked marker data of patients. Even though this practice
increases the confidence during risk assessment but according to Young [3],
The increasing use of linked DNA markers and the availabilit y of DNA mutation
analysis often serve to complicate, rather than simplify, risk calculations which
require careful consideration and a relatively high level of numerical
competence if the provision of incorrect information is to be avoided.
The use of DNA markers is more complex and contradicts with the real
situation that occurs during information gathering in genetic counselli ng, where in
practice it is almost impossible to obtain the DNA information of a person. It is either
that the participating individuals in the family are reluctant to be examined or
individuals involved is dead and there is no DNA information about the deceased or
the data collection involves distant relatives and it is diff icult to perform test on them.
Therefore, to make it simple, the software only considers classical data which is
already suff icient for genetic risk assessment and proven to be reliable.
A computer program, RISK-XLR which calculates genetic risk for carrier
status of an X-linked recessive condition has been written by Rivas and Martens [7].
Even though the purpose of this project is to develop an automatic risk assessment for
autosomal recessive conditions, but the technique used is almost similar because both
X-linked and autosomal recessive are Mendelian disorders. The technique used
4
incorporates family information and carrier test results and utili zes Bayesian
conditional probabilit y. The implementation of Bayesian conditional probabilit y in the
RISK-XLR software provides the groundwork to the programming logic for the
software that is going to be developed.
An essential feature of an automatic risk assessment software is the abilit y to
link the collected data in a proper manner, in other words the collected data are to be
represented by a pedigree. A pedigree is a diagram that helps genetic counsellors to
visualize family information (number of sons, daughters and granddaughters with
normal sons, relatives, etc.) Previous work for this purpose has been done by Damme
[5]. He suggested that in the patients database each patient is given a unique
identification number. In order to realize a linkage between patients records, some
additional tables will have to be created in which information such as the parent and
child relationship are stored. His method of linking the patient database for genetic
counselli ng purposes leads to the design of the data structure and database for the
automatic risk assessment software.
5
Chapter 2 - Risk Assessment
2.1. Genetic risk: a brief introduction
In order to understand genetic risk one must know from where this problem is derived
from. The following is a definition of genetic counselli ng by Harper in his valuable
book Practical Genetic Counselli ng (1999) (see Bibliography).
Genetic counselli ng is the process by which patient or relatives at risk of a
disorder that may be hereditary are advised of the consequences of the, the
probabilit y of developing or transmitting it and of the ways in which this may be
prevented, avoided or ameliorated [6].
A second definition by Young to describe the genetic risk process is as follows.
Genetic risk refers to the probabilit y that patients may either develop a genetics
disorder or transmit a genetic disease to their children [3].
Comparing these two definitions it is clear that the genetic risk is a sub-process
from the genetic counselli ng. Genetic counselli ng is the process when genetic
counsellors collect varieties of data including family history and disease characteristic.
From these information genetic counsellors calculates the genetic risk i.e. to determine
the possibilit y of a person inheriting or transmitting the disease to others in the family.
6
In assisting genetic counsellors a special diagram, pedigree is being used as an
aid to represent the gathered family information. The symbols used in the pedigree are
ill ustrated in Figure 2.1.
Figure 2.1 Symbols used in drawing a pedigree
3 2
M a l e
Fema le
Ma t i ng
Parents and 1 boy, 1 g i r l in b i r th order
Dizygot ic ( two-eggs) tw ins
Monozygot ic (one-egg) tw ins
Sex unspec i f ied
Number o f ch i ld ren o f sex ind icated
Af fected ind iv idua ls
Heterozygotes for autosomal recess ive
Carr ier o f sex- l inked recess ive
Dea th
Abort ion or s t i l l b i r th (sex unspeci f ied)
Propos i tus
M e t h o d s o f i d e n t i f y i n g p e r s o n s i n a p e d i g r e e ; h e r e t h eproposi tus is ch i ld 2 in generat ion II or I I .2
Cosangu ineous marr iage (mar r iage o fb lood re la t ives)
7
Using the collected information and subsequently transform it into pedigree
representation, genetic counsellors then proceed with the risk assessment and finally
advise the counselee on the outcome.
2.2. The estimation of risks
The correct estimation and interpretation of risk is important in genetic counselli ng; it
is similar to the situation where a doctor must prescribe a correct dosage of medicine
to a patient. An incorrect or inaccurate estimation and interpretation might lead
counselees to misery.
Medical genetics problems do not provide an absolute ‘yes’ or ‘no’ solution;
information associated with genetic risks works entirely in terms of probabiliti es and
odds. The following sub-section will further discuss the details of genetic risks
calculations.
2.2.1. Laws of the probabilit y
The starting point in genetic risk calculation is to be able to grasp and understand the
basic laws of probabilit y. In genetic risk calculation there are two basic laws, the laws
of addition and multiplication.
2.2.1.1. Law of addition
In probabilit y this law applies when where no two events have any outcomes in
common, in other words the events are mutually exclusive the probabilit y of the union
of the events. The probabilit y of the union of the events can be obtained by summing
the probabiliti es of the individual events. For example, if the probabilit y that a
8
newborn child is a male equals 0.5 and the probabilit y that a newborn child is a female
equals 0.5. Then the probabilit y that the newborn will be either male or female equals
to 1 i.e. 0.5 + 0.5 = 1.0.
2.2.1.2. Law of multiplication
This law is applied to independent events i.e. the occurrence of one event does not
affect the probabilit y of the occurrence of the second event. For example, if a pregnant
woman is expected to give birth to dizygotic twins, the probabilit y that both the twins
are male equals the product that both the newborn are male i.e. ½ x ½ = ¼.
2.2.2. Bayes’ theorem
In genetic risk calculation Bayes’ theorem offers a method for considering all
possibiliti es or events and then modifying the probabiliti es for each of these by
incorporating information which sheds light on which is most likely (Young, 1999). In
practice this theorem provides an extremely effective means of quantifying genetic
risks because it shows how new information can properly be used to update or revise
an existing set of the risks.
Using gathered family data and disease characteristic, the initial or ‘prior’
probabilit y of each event, such as being a carrier or not being a carrier is calculated.
The probabilit y calculation is based on ‘anterior’ information taken from ancestral
family history. Then the probabilit y will be modified by conditional probabiliti es
derived using ‘posterior’ information, which is based on observations i.e. the results of
9
carrier tests. The use of ‘posterior’ information to modify the ‘prior’ probabilit y is
known as Bayes’ theorem.
The following steps describes the details of Bayes’ theorem,
1. If P(A) is the prior probabilit y of an event A occurring and
2. P(NA) is the prior probabilit y of event A not occurring and
3. P(O|A) is the conditional probabilit y of observation O occurring if A
occurs and
4. P(O|NA) is the conditional probabilit y of observation O occurring if A
does not occur, then the overall probabilit y of event A given that O is
observed equals
[ ] [ ])|()()|()()(
)|()(
NAOPNAPAOPAPAP
AOPAP
×+××
Or in the tabular form,
Probability Event A occurs Event A does not occur
Prior P(A) P(NA)
Conditional O occurs P(O|A) P(O|NA)
� � � � � � � � �×
� � � � � � � � �×
� � � � �
10
The posterior probabilit y of event A occurring equals
[ ] [ ])|()()|()()(
)|()(
NAOPNAPAOPAPAP
AOPAP
×+××
The posterior probabilit y of event A not occurring equals
[ ] [ ])|()()|()()(
)|()(
NAOPNAPAOPAPAP
NAOPNAP
×+××
Example 2.1
Figure 2.2 Autosomal recessive inheritance: carrier risk to the
healthy sibling for someone with an autosomal recessive
disorder
In Figure 2.2 the women’s brother is affected with a rare autosomal
recessive disease. A carrier detection test is available: assuming that
mutation screen picks up 80 per cent of the disease mutation.
11
The Bayesian calculation to calculate the carrier risk for the women,
Probabilit y Carrier Not a carrier
Prior 2/3 1/3
Conditional: normal on
screening 0.20 1.0
Joint 2/3 × 0.20 = 0.1333 1/3 × 1.0 = 0.3333
Posterior 2857.0
3333.01333.0
1333.0 =+
The above calculation shows that the women’s posterior risk of being a carrier
is reduced from its ‘prior’ value of 2/3 (0.6667) to 0.2857. This example ill ustrates the
importance of Bayesian reasoning in genetic risk calculation.
2.2.3. Risk calculation: Autosomal recessive inheritance
Autosomal recessive inheritance is a mode of disease inheritance from one individual
to another in a family. A simple explanation to describe this type of disease
inheritance; there is the possibilit y of a person inheriting a disease when there is a
family member who is affected with the disease, and there are carriers in the family
who are responsible of transmitting the disease i.e. the autosomal recessive diseases
are transmitted and inherited through family relationship.
12
In general the calculation of genetic risk involves the process of investigating
members of a pedigree. It is a process of identifying and determining obligatory and
possible carriers and individuals who are affected and are potentially to be affected
with a disease. These conditions are represented by the probabilit y value of the carrier
and affected risk that each individual holds.
2.2.3.1. Carrier detection
One of the major tasks in risk calculation is the abilit y to identify those individuals
who, while apparently healthy themselves, have a high risk of transmitting a genetic
disorder. There are two types of carriers, the obligatory and possible carriers.
Obligatory carriers for autosomal recessive disorders include all children and parents
of an affected individual, while other relatives are possible carriers.
The process of identifying carriers in a family is important because the risk of
someone being a carrier will determine the carrier risk of others in the family. In order
to calculate the probabilit y of someone will be affected with a disease, it requires the
product of the carrier risk for each parent, multiplied by 1/4, the chance that they will
both pass on the gene if they are carriers. The details of the carrier risk calculation will
be explained later in the successive sections.
2.2.3.2. Population risk: Hardy-Weinberg equili brium
The Hardy-Weinberg equili brium is a fundamental principal and is often utili zed in
risk calculation (The basis of the Hardy-Weinberg equili brium and the variation from
it are covered fully in genetics textbooks and are not given here). This concept is used
13
to estimate the disease frequency for a population and it enables the carrier frequency
of an autosomal recessive disorder to be determined if the disorder is known. Thus, if a
population is in Hardy-Weinberg equili brium:
Phenotype Frequency
Abnormal homozygotes (= disease
frequency/Affected)
q2
Normal homozygotes p2
Heterozygotes (carrier) 2pq
p = frequencies of normal genes q = frequencies of abnormal genes p + q = 1
Table 2.1 Hardy-Weinberg equili brium: disease frequency
In risk calculation, the carrier risk for someone with no family history equals
2pq, provided that the partner is unrelated.
Example 2.2
Figure 2.3 Carrier risk using Hardy-Weinberg equili brium
1
14
Referring to Figure 2.3 it is assumed that the population is in Hardy-
Weinberg equili brium, therefore the carrier risk of individual 1 equals
2pq.
2.2.3.3. Risks to the offspring of a healthy sibling
The probabilit y for a healthy sibling of someone with an autosomal recessive disorder
is a carrier equal 2/3. The probabilit y risk that his/her own child could be affected is
calculated by multiplying the independent probabiliti es of the sibling and his/her
partner are carriers and then multiplying by 1/4.
Example 2.3
The healthy brother (individual 1) of someone with cystic fibrosis wish
to know the probabilit y that his first child will be affected. Assume that
the man’s partner is not a blood relative, has no family history of cystic
fibrosis and the disease incidence is assumed to be 1/2500.
Figure 2.4 Risk to the healthy sibling of someone with cystic
fibrosis for having an affected child (Disease incidence is
assumed to be 1/2500)
3
21
15
Detail calculation:
∴ Disease Incidence = q2 = 1/2500 ; q = 1/50 = 0.02 & p = 49/50 = 0.98
Individual 1 (Father):
Healthy sibling of someone with an autosomal recessive disorder is a
carrier
∴ Carrier risk = 2/3
Individual 2 (Mother):
Assuming that the population is in Hardy-Weinberg equili brium
∴ Carrier risk = 2pq = 2 × 4/50 × 1/50 = 2 × 0.98 × 0.02 = 0.0392
Individual 3:
Risk = Father carrier risk ×× Mother carrier risk ×× the chance both
parents will both pass on the gene if they are carriers
∴ Risk of being affected = 2/3 × 0.0392 × 1/4 = 0.00653
16
2.2.3.4. Risks to the extended family
The parent and children of a patient with an autosomal recessive disorder are
obligatory carriers, while second-degree relatives (uncles, aunts, nephews, nieces, half-
sibs, grandparents) will have a 50 per cent chance of being a carrier (Figure 2.5).
Figure 2.5 The probabilit y of being a carrier in relatives of
someone with an autosomal recessive disorder
The likelihood for other family members being a carrier will be reduced by 50
per cent for each degree of relationship removed from parents, so that it is relatively
simple to estimate the probabilit y of any relative being a carrier if their closeness to the
patient is known.
1
3
2
1 1
2
1
2
1
2
1
2
1
17
Example 2.4
Figure 2.6 Risk to extended family (Disease incidence is
assumed to be 1/2500)
II .1 had Cystic Fibrosis (CF). There is no family history for I.3. Assume
CF affects 1 person in 2500 in the population. The risk that foetus II .2
will have CF is:
Detail calculation:
∴ Disease Incidence = q2 = 1/2500 ; q = 1/50 = 0.02 & p = 49/50 = 0.98
Individual I.2 (Mother):
The parent and children of a patient with an autosomal recessive
disorder are obligatory carriers
∴ Carrier risk = 1.0
Individual I.3 (Father):
Assuming that the population is in Hardy-Weinberg equili brium
∴ Carrier risk = 2pq = 2 × 4/50 × 1/50 = 2 × 0.98 × 0.02 = 0.0392
I I .1 II.2
I.1 I.2 I.3
18
Individual II .2:
Risk = Father carrier risk ×× Mother carrier risk ×× the chance both parents
will both pass on the gene if they are carriers
∴ Risk of being affected = 0.0392 × 1 × 1/4 = 0.0098
2.2.3.5. Risk to the offspring of an affected parent
The risk of a child born to an affected parent will also be affected equals half the
probabilit y that the unaffected parent is a carrier.
Example 2.5
Figure 2.7 Risk to the offspring of someone with an autosomal
recessive disorder who marries a healthy unrelated individual
Referring to Figure 2.7 the probabilit y of the first child will be affected
by the disease,
I I .1 II.2
19
Individual II .1 (Mother):
Assuming that the population is in Hardy-Weinberg equili brium
∴ Carrier risk = 2pq
Individual II .2 (Father):
The individual who is affected with the disease
∴ Carrier risk = 1.0
The child:
Risk = Father carrier risk ×× Mother carrier risk ×× the chance of the
unaffected parent will pass on the gene if he/she is a carrier
∴ Risk of being affected = 1.0 × 2pq × ½
20
Chapter 3 - Software Development
3.1. Evolutionary development
In this project an evolutionary development [11] method is being used to develop the
software. This method is chosen because the project nature is to work with the genetic
counsellors to explore their requirements and deliver a final system. The software
development starts by developing prototype of the parts of the risk assessment
processes which are understood. Later, the software could be evolved by adding new
features as they are proposed by the genetic counsellors.
3.2. Software Design
3.2.1. System Overview
The software main function is to automate genetic risk calculation. In order to perform
the task, the software accepts input such as of family relationships, disease parameter
and individual disease data. The output from the software is the risk value of interest.
Figure 3.1 The system overview
Genetic RisksComputat ion System
Probab i l i t y tha t ind iv idua ll i a b l e t o t r a n s m i t t h ed isease
Probab i l i t y tha t ind iv idua laf fected with the disease
Family relat ionships
Indiv idual d isease data
Disease parameter
Input Output
21
Manual calculation of genetic risk is exposed to error and inaccuracy because
of the magnitude of work involved to process the data. The software uses the same
kind of reasoning that people use to compute genetic risk. Figure 3.1 summarizes the
software input and output requirement.
3.2.2. Sub-system modules
Figure 3.2 The sub-system component for computing genetic
risk
In software development project, software is divided into modules so that these
modules can be independently developed and tested. Later, at one stage of the
development these modules are combined to make up complete software. Figure 3.2
shows the sub system modules for the current software and the followings are the
explanation.
1. User interface: this component acts as an interaction point between
genetic counsellor and the software. Inputs containing information on
Genet iccounsel lor
User in ter face
Da tabase management
Fami ly real t ionshipsdata sturcture
Disease inher i tancef low rules
Risk probabi l i tycalculat ion
System output
22
family relationships, disease parameter and individual disease data are
fed to system through the user interface. Missing data are replaced by
default parameters that were pre-programmed into the system. It is to
ensure that the system will behave as intended so that the calculated
risk assessment values are accurate.
2. Family relationships data structure: this module handles the family
relationships data which are fed into the software via user interface.
This module is responsible in linking and mapping the family
information data so that it reflects the actual pedigree. The detail
explanation on the data representation can be found in the next section.
3. Database management: the family relationships data located in the
computer memory are volatile and temporary. This module manages
the data storage and retrieval from and to the database. Without this
module the data that were manipulated and analysed cannot be stored.
This module is not available in the current software but the database
design and suggestions on implementing it are described in the ‘f urther
development and enhancement’ section under chapter 5.
4. Risk probabilit y calculation: this module performs the risk probabilit y
calculation. Rules to determine the probabilit y of someone being
affected with a disease and the probabilit y of being a carrier are
23
embedded into this module. These rules are explained later in the next
section.
5. Disease inheritance flow rules: for autosomal recessive inheritance, the
risk of someone being affected with a disease depends on the
knowledge of the ancestral and descendants family history. This
module contains the rules for monitoring the disease inheritance flow
i.e. to identify to whom the disease will be transmitted and then will
assign suitable carrier risk value to the person.
6. System output: the main function of this module is to read data from
the memory that contains the family relationship information and
display it on the GUI (graphical user interface) in the form of pedigree.
This feature is important because genetic counsellors use pedigree as
an aid during risk assessment. Furthermore the pedigree explains the
relationship of a person to another (marriage, siblings, parents, cousin
etc.).
24
3.3. Essential program logic
In order to appreciate the program logic, an understanding of object oriented
programming and design is required and the knowledge of Java programming
language is essential. The brief explanation on the program logic is just to provide an
overall view of the system algorithm. However, perusing into the software source
code is necessary in an attempt to understand the software internal working
mechanism.
3.3.1. Risk Calculation
3.3.1.1. Important key points
The genetic risk calculation is implemented using rules for calculating autosomal
recessive inheritance. The important key points that are translated into software
program logic:
1. The carrier risk for someone with no family history equals 2pq,
provided that the partner is unrelated.
2. The probabilit y that a healthy sibling of someone with an
autosomal recessive disorder is a carrier equals 2/3.
3. The probabilit y that any two unaffected individuals will have a
child with an autosomal recessive disorder equals the product of
the probabiliti es that each is a carrier multiplied by 1/4.
25
4. The probabilit y that a member of the extended pedigree is a
carrier of an autosomal recessive disorder is halved for each
degree of relationship removed from the parents.
5. The carrier risk for someone who is heterozygotes or affected
with the disease equals to 1.0.
6. The probabilit y that a child born to an affected parent will also
be affected equals half the probabilit y that the unaffected parent
is a carrier.
3.3.1.2. Determining the risk of being a carrier
The process of determining whether someone is a carrier is strictly simple. If someone
is affected with a disease, therefore the parents and children of that individual might
be a carrier too. Assignment of the carrier risk value to the individual depends on the
factor 1, 2, 4 and 5 as described in the previous section.
For monozygotic twin pairs, the risk of being a carrier are the same for all
pairs and there should be complete concordance i.e. both twins should either be a
carrier or not. Since dizygotic twins are genetically no more alike than the normal
siblings, the assignment of carrier risk value is essentially the same as for normal
siblings.
26
3.3.1.3. Determining the risk of being affected
It is observed that for every individual in the pedigree the risk of being affected by a
disease is as the following:
Risk of being affected = A ×× B ×× C
A – Father carrier risk
B – Mother carrier risk
C – The chance of being affected with the disease.
Referring to the previous section the value of A and B is determined by the key
points 1, 2, 4 and 5, while the value of C is determined by the key points 3 and 6.
For monozygotic twin pairs, the risk of being affected is the same for all pairs,
there should be complete concordance i.e. both twins should either have or not have
the disorder. There is no increased risk of the disease in subsequent children in normal
children. Since dizygotic twins are genetically no more alike than the normal siblings,
the risks are essentially the same as for normal siblings.
27
3.3.1.4. Online carrier risk computation
The manual process of calculating carrier risk for individuals in a pedigree are done
off line i.e. after the pedigree diagram of a family relationships that are related to the
propositus has been completely drawn. However, for this software the process of
calculating carrier risk for each and every individual is done online, that is to say
when new information being added or changes being made to the family or disease
information the software will automatically updates the carrier risk probabilit y of
individuals who are related with the person whose data being updated or added.
3.3.2. Drawing pedigree: Connecting elements
On the system GUI elements are drawn and connected together, so that it will
represent a complete pedigree. In order to connect elements together, connection
property is assigned to each individual element and it is shown in Figure 3.2.
Figure 3.3 Graphics elements properties: Basic symbols used in
pedigree to represent individuals
In Figure 3.3 the dimension of each element is 20 × 20. The centre point of the
element acts as a reference point to connect from one element to another, while
Centre point
20
20
20
20
20
20
Position
28
the position property will determine the location of the element displayed on the GUI.
Figure 3.4 Graphics elements properties: Connecting children
and parent
Figure 3.4 shows the property connection for connecting parent and children. It
can be seen that elements are connected to each other by relationships that are
represented by lines. Basically in a pedigree there are four types of relationships which
are marriage, parent, sibling and twins. Therefore there are four types of lines that
represent these types of relationships and each of this line has its own centre point that
act as a reference point for connection. The rules for connecting elements in a pedigree
are as the following:
1. Married individuals are connected to their partner using the marriage
relationship line.
2. Siblings are connected together using the children relationship line.
(x,y)
(x,y + 40)Parent link position
Var
iabl
ele
ngth
Var iable length
Children link centreposition
Twins relationship centrepoint
Marriage link position
29
3. To connect the parents and their children, a connection line is drawn
from the centre point of the marriage line to the centre point of the
children relationship line.
4. When there are twins among the children, the twins are connected
together using twins relationship line. The centre point for this
connection is then connected to the children relationship line.
3.3.3. Data structure
Since the software was developed using Java, therefore object oriented programming
approach is used in designing the software data structure. In the software codes,
Figure 3.5 Pedigree and its data structure representation
Person
Marriage
Person
Children
Person Person Person Person
Children
Marriage
Person Person
30
there are objects representing various kind of information. However, the most
important is the one that represents the family relationship information and pedigree
data. The objects are:
1. Person: contains information about a person such as age, disease
information and pointers to marriages object instances (linkage to
parents and his/her marriage information).
2. Marr iage: stores marriage information such as pointers to the persons
object instances (male and female individuals) and children object
instances.
3. Children: holds pointers to the person object instances.
Figure 5.3 shows the example of implementing this data structure to represent
family information. Marriage object instance contains pointers to persons object
instances (male and female) and also this object instances will point to children object
instance resulting from that relationship. The children object instance holds several
pointers that point to numerous individuals. The linkage between these object
instances represents a pedigree.
31
Chapter 4 - Automatic risk calculation
4.1. Software test results
In order to ensure that the automatic risk calculation result is correct, the risk
calculation from the software output and manual calculation are compared. The
following sub sections compares the risk calculation results for different cases of
autosomal recessive disorder.
4.1.1. Risk to the offspring of a healthy sibling
Example 4.1
(Refer to Figure 4.1) Individual 1 had Cystic Fibrosis (CF). There is no
family history of CF in the family of individual 5. Assume that CF
affects 1 person in 2500 in the population.
The risk of being affected for individual 6:
Manual calculation:
1. The probabilit y that the healthy mother (4) is a carrier equals 2/3,
because her brother sibling is affected with CF.
2. The probabilit y that the father (5) is a carrier equals the
frequency of carriers in the general population i.e. 2pq =
2(0.02)(0.98) = 0.0392.
32
3. The probabilit y that individual 6 will be affected equals the
product of the probabiliti es that the parents are carrier multiplied
by 1 in 4, the chance that they will both pass on the gene if they
are carriers. This therefore gives a risk of 2/3 x 0.0392 x ¼ which
equals 0.00653
Automatic calculation result:
Figure 4.1 Software output (Risk to the offspring of a healthy
sibling): Carrier risk values
33
Figure 4.2 Software output (Risk to the offspring of a healthy
sibling): Risk of being affected for individual 6
4.1.2. Risks to the extended family
Example 4.2
(Refer to Figure 4.3) Individual 1 had Cystic Fibrosis (CF). There is no
family history of CF in the family of individual 4. Assume that CF
affects 1 person in 2500 in the population.
34
The risk of being affected for individual 5:
Manual calculation:
1. The probabilit y that the healthy mother (3) is a carrier equals 1.0,
because the parents of someone with an autosomal recessive
disorder must be carriers.
2. The probabilit y that the father (4) is a carrier equals the
frequency of carriers in the general population i.e. 2pq =
2(0.02)(0.98) = 0.0392.
3. The probabilit y that individual 5 will be affected equals the
product of the probabiliti es that the parents are carrier multiplied
by 1 in 4, the chance that they will both pass on the gene if they
are carriers. This therefore gives a risk of 1 x 0.0392 x ¼ which
equals 0.0098.
35
Automatic calculation result:
Figure 4.3 Software output (Risks to the extended family):
Carrier risk values
Figure 4.4 Software output (Risks to the extended family):
Risk of being affected for individual 5
36
Example 4.3
See Example 4.2. This relates to the same situation as in
Figure 4.4, but the ‘at risk’ half sibling already has two healthy
full siblings.
The risk of being affected for individual 5:
Manual calculation:
Probabilit y Both 3 and 4 are carriers 4 is not a carrier
Prior 0.0392 0.9608
Conditional
2 health children (3/4)
2 = 9/16 1
Joint 0.02205 0.9608
1. The probabilit y that both parents are carriers equals
02243.09608.002205.0
02205.0 =+
2. The probabilit y that individual 5 will be affected equals
02243.0 x 0.25 = 0.005608. Clearly this value is less than the
one derived in Example 4.2.
37
Automatic calculation result:
Figure 4.5 Software output (Risks to the extended family): This
relates to the same situation as in Figure 4.4 , but the ‘at risk’
half sibling already has two healthy full siblings
38
Example 4.4
Figure 4.7 demonstrates the software capabilit y of determining the
probabilit y of being a carrier for various members of extended family.
The probabilit y that a member of the extended pedigree is a carrier of an
autosomal recessive disorder is halved for each degree of relationship
removed from the parents.
Automatic calculation result:
Figure 4.6 Software output (Risks to the extended family):
Carrier risk probabilit y for various members of extended
family
39
4.1.3. Risk to the offspring of an affected homozygote
Example 4.5
(Refer to Figure 4.7) Individual 6 whose father is healthy, while the
mother is affected with an autosomal recessive disease. Assume that the
disease affects 1 person in 2500 in the population.
Manual calculation:
Here the probabilit y that individual 6 will be affected will equal
1.0 x 2pq x ½ = 1.0 x 2(0.02)(0.98) x ½ = 0.0196
Automatic calculation result:
Figure 4.7 Software output (Risk to the offspring of an affected
homozygote): Risk to the offspring of someone with an
autosomal recessive disorder
40
Example 4.6
(Refer to Figure 4.9) An individual with a rare, autosomal recessive
disorder marries unaffected first cousin.
Manual calculation:
1. The probabilit y of heterozygosity in the first cousin equals 1 in 4
2. The probabilit y of the first child being affected will equal
1.0 x ¼ x ½ = 1/8 = 0.125
Automatic calculation result:
Figure 4.8 Software output (Risk to the offspring of an affected
homozygote): Carrier risk probabilit y for various members of
extended family
41
Figure 4.9 Software output (Risk to the offspring of an affected
homozygote): Risk to the offspring of someone with an
autosomal recessive disorder who marries a healthy first cousin
42
Chapter 5 - Conclusions and Recommendations
5.1. Conclusions
Genetic counselli ng is the process by which individuals and/or their relatives at risk
for a transmissible disorder are advised of the consequences of the disorder, of the
likelihood of developing the disorder and transmitting to offspring and of the ways in
which the disorder may be prevented. The genetic counselli ng process involves risk
calculations and genetic counsellors can be easily overwhelmed by the size of work
involved in carrying out the calculations in order to produce a complete analysis. Risk
assessment plays an important role in genetic counselli ng because the correctness of
the advice given to counselee depends on the result taken from the calculated risk.
Inaccurate risk estimation will humiliate genetic counsellors and might lead the
counselee to misery.
Software development requires knowledge in genetic risk assessment
techniques and software engineering. Given a limited period time enough information
has been gathered on both areas suff icient to get the project started and develop a
prototype risk assessment software. The software development starts with the process
of getting user requirements and understands the flow of genetic counselli ng process
and its risk calculation. Later, based on the user requirement, the software is designed
and developed. The software design and development involves the process of
43
identifying software development approach, software design methodology and the
selection of programming tool and language.
After taking into consideration of time constraints and workload, software
development was reduced from developing full -scale risk assessment software to
prototype software that can solve simple genetic disorder problems and as a
framework which permits further enhancement. For the next stage of the development
several important features have to be included as part of the software and are
described later in the next section.
The project objective is to automate risk assessment task, therefore translating
manual process of risk calculation into computer program logic is necessary. Since the
actual process of risk assessment starts by collecting family information, therefore the
first module written into the software is the one that handles user inputs. Much time
is spent on designing data structure and writing the program for this module. In order
to ensure that information represented by the data structure is correct, a program to
display windows of user-friendly GUI (Graphical User Interface) also being written.
The GUI accepts inputs and draw up the stored information in the form pedigree.
Subsequent to the completion of GUI development and the data input module,
the next stage of the software development is to write program for risk calculation.
Rules used by genetic counsellors to calculate genetic risk and assignment of carrier
risk to related individuals in the family are programmed into system. The rule
44
that was written into the software does not cater for all types of genetic disorders but
only for specific cases of autosomal recessive disorder.
Even though the outcome of this project is a prototype software, but the
software could be evolved by adding new features to it. Recommendations on the
software enhancements and improvements are explained in the next section. The most
important achievement that I would like to highlight; from this project I have gathered
vast amount of knowledge in software development especially on the object oriented
software design and development and Java programming. Besides that, I also gained
experience on the practical use of probabilit y theory to support decision making as
this is shown by the use of Bayesian conditional probabilit y in risk assessment.
5.2. Further Development and enhancement
5.2.1. Risk calculation
This software does not solve the risk assessment for all cases of autosomal recessive
disease inheritance. Young [3] in his book Introduction to Risk Calculation In Genetic
Counselli ng explains the detail on this topic. The other cases that need to be tackled
are explained in the following sub sections.
5.2.1.1. Consanguinity
Consanguinity, or marriage of close relatives, is common and important problem in
genetic counselli ng. The presence of consanguinity influences the risks where an
inherited disorder is present in the family. In this software where consanguinity
present, the automatic calculation of the risk value is correct when at least one person
45
in the family is affected with a disease. However, for cases where consanguineous
couples with no history of autosomal recessive disease in the family the calculation
result for the risk of their child being affected is not accurate.
To tackle the risk assessment for the child of a consanguineous relationship
two other variables need to taken into the calculation, the coeff icient of inbreeding
and coeff icient of relationship. The coeff icient of inbreeding relates to child of a
consanguineous relationship and indicates the probabilit y that the child will be
homozygous for a specific gene derived from a common ancestor. The coeff icient of
relationship relates to child of a consanguineous couple and indicates the proportion
of genes which on average they would be expected to share by descent from common
ancestor.
5.2.1.2. The use of screening result for carrier detection
The carrier risks that are presently being used in the software is either based on the
assumption that the individual is a carrier because of someone in the family is affected
with the disease or for an individual with no family history the carrier risk is taken
from carrier frequency (2pq).
When carrier detection test has been done on the individual, the conditional
probabilit y derived from the test result and with the use of the ‘prior’ probabilit y will
significantly modify the ’posterior’ probabilit y of the individual being a carrier and
will make the risk assessment value more accurate and reliable. Therefore in the next
stage of the software development it is important to include this feature.
46
5.2.1.3. Siblings with different autosomal disorders
Figure 5.1 Healthy parents have two children each with a
different autosomal recessive disorder.
The above figure shows the case where parents had two children with a different
autosomal recessive disorder. In this system it is assumed that parents only had
children with same disorders, therefore the risk calculation does not consider the case
of someone with siblings of different disorders. The detail procedure for calculating
the risk of this type of case is explained in detail by Young in his previously
mentioned book.
5.2.1.4. Separate mutations consideration
For the current software, in risk calculation it has been assumed that the disorder is so
rare that the possibilit y of heterozygosity in family members due to inheritance of a
different mutation is very low that it can be ignored. However, for diseases that are so
common as is the case for cystic fibrosis, then an effort should be made to allow for
possibilit y that a separate mutation is segregating independently in the family and
different approach of risk calculation need to be used.
?
47
5.2.2. Software enhancement
5.2.2.1. Database module
The current software does not allow the pedigree data to be stored for future reference
and analysis because the database management module is not available. The
suggestion for implementing this module and its architecture is shown below in
Figure 5.2.
Figure 5.2 Database connection and implementation
architecture
Based on the implementation architecture, for storing data the database
management module has to read family relationship data structure and convert it into
database table format and vice versa when retrieving data from the database. Before
performing any data storage and retrieval task, the system must first be connected to
the database and this can be done either using Microsoft ODBC (Open Database
Connectivity) or Java JDBC API. The database connection type chosen will determine
the type of programming codes that need to be written and added into the database
management module.
MicrosoftODBC / Java JDBC API
Databasemanagement
DatabaseSystem
(mySQL, MicrosoftAccess etc.)
Risk assessment system
Familyrelationship
data structure
48
Figure 5.3 Database design: Entity relationship diagram for the
system database tables (refer to Elmasri and Navathe [16] for
information on the ER diagram notation)
1
N
N a m e
L n a m e
Min i t
F n a m e
Sex
Sib l ing
M a l e I D
Fema le ID
ParentMar r iage ID
N P A R E N TO F
I N V O L V EI N
1
P E D I G R E E
N
1
H A V E
H A S1
N
Pedigree ID
I D
Ped igree ID
Pedigree ID Date Crea ted Creator C o m m e n t
A g e
Mar r iage ID
Disease In fo
S ib l ing ID
Re la t ionsh ipType
YX
Graph icproper t ies
M A R R I A G E
Suspec tedCarr ie r
Carrierprobab i l i t y
S ta tus
Mar r iage ID
I N D I V I D U A L
N
49
The table format for the data stored in the database will be based on the ER
diagram in Figure 5.3 and the detail description of the tables and its attributes are:
Table: INDIVIDUAL
Contains information on individuals i.e. disease information, sibling and parents.
Attribute Description
ID Primary key.
Age Age.
Sex Will be indicated by the character ‘M’ – Male and ‘F’ – female
Sibling ID The next younger sibling ID.
Relationship
Type
The relationship between the person and his next younger sibling. N –
normal, M – monozygotic twins and D – dizygotic twins.
Lname Last name.
Minit Initial of middle name.
Fname Forename.
Suspected Carrier A type of boolean flag to indicate whether the person is suspected to be
a disease carrier.
Carrier
Probabilit y The person’s carrier risk probabilit y.
Status Disease information status. H – Heterozygotes (Carrier), N – Normal,
A - Affected
Pedigree ID Foreign key originated from pedigree table.
Marriage ID Foreign key originated from marriage table. Link the person to his or
her spouse.
Parent Marriage
ID Foreign key originated from marriage table to identify his/her parents.
X The X position of the element on GUI for pedigree drawing purposes.
Y The Y position of the element on GUI for pedigree drawing purposes.
Table 5.1 Database design: Individual table with attributes
50
Table: PEDIGREE
Stores the pedigree information.
Attribute Description
Pedigree ID Primary key
Date Created Date when the pedigree is created.
Creator The person who draw the pedigree.
Comment Comments made the author on the pedigree or the analysis.
Table 5.2 Database design: Pedigree table with attributes
Table: MARRIAGE
Data on marriages between individuals.
Attribute Description
Marriage ID Primary key
Male ID The husband.
Female ID The wife.
Pedigree ID Foreign key originated from pedigree table.
Table 5.3 Database design: Marriage table with attributes
5.2.2.2. Risk calculation for other Mendelian inheritance disorders
Autosomal recessive is not the only pattern of disease transmission that falls under the
category of Mendelian inheritance disorders. The others are autosomal dominant and
X-linked recessive. Basically, the calculation of risk for Mendelian inheritance
disorders starts by collecting family history information and draw up family tree or
pedigree. Next, is to identify individuals who are affected with the disease and then
determine the pattern of disorder transmission in the family whether it is an X-linked
recessive or autosomal dominant or autosomal recessive.
51
In this software the faciliti es to draw up pedigree is already available and with
some minor modification and addition to the software codes, modules to handle risk
calculation for autosomal dominant and X-linked recessive can be included as part of
the system facilit y. The inclusion of both type of disease inheritance will make the
system an irresistible instrument in assisting genetic counsellors during genetic
counselli ng.
52
BIBLIOGRAPHY
1. Pearl, J. (1988). Probabili stic Reasoning in Intelli gent Systems. Morgan
Kaufmann.
2. Hodge, S.E (1998). A simple unified approach to Bayesian risk calculations.
Journal of Genetic Counselli ng , 7, 235-261.
3. Young, I. D. (1999). Introduction to Risk Calculation In Genetic Counselli ng.
Oxford University Press.
4. Pathak, K. D. et al (1994). Automatic computation of Genetic Risk. Proceeding
of the 10th Conference on Artificial Intelli gence for Applications, 164-170.
5. Damme, V. J. (1992). An automatic surveill ance of genetic risk in primary
care linkage of routinely used individual patients records. Proceeding of the
7th World Congress on Medical Genetics, 756-60 vol. 1.
6. Harper, P. S. (1999). Practical Genetic Counselli ng (5th Edition.). Butterworth
Heinemann.
7. Rivas, M. L. et al (1987). Risk XLR: Microcomputer Based Genetic Risk
Program for X-Linked Recessive Traits. Proceedings of the 11th Annual
Symposium on Computer Applications in Medical Care, 193-198.
53
8. Emery, A. E. H. and Mueller, R. F. (1992). Elements of Medical Genetics (8th
Edition). Churchill Livingstone.
9. Hayter, A. J. (1996). Probabilit y and Statistics for engineers and scientist.
Pws Publishing Company.
10. Pressman, R. S. (2000). Software Engineering: A Practitioner’s Approach
(European Adaptation). McGraw Hill .
11. Sommervill e, I. (2001). Software Engineering (6th Edition). Addison-Wesley.
12. Shtern, V. (2000). C++ : A Software Engineering Approach. Prentice Hall
PTR.
13. Horton, I. (2001). Beginning Java 2. Wrox Press.
14. Castagnetto, J., Rawat, H., Schumann, S., Scollo, C. and Veliath, D. (2000).
Professional PHP Programming. Wrox Press.
15. MySQL Reference Manual (2000). TcX AB, Detro HB and MySQL Finland
AB.
16. Elmasri R. and Navathe, S. B. (1994), Fundamentals Of Database Systems
(2nd Edition). Addison Wesley.