ss 2008lecture 5 biological sequence analysis 1 v5 forward genetics – molecular markers review of...

22
SS 2008 lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4 ... - Arabidopsis genome contains larger number of gene duplications - More than 50% of genome duplicated - TAIR website contains full sequence of columbia ecotype

Post on 19-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis1

V5 Forward genetics – molecular markers

Review of lecture V4 ...

- Arabidopsis genome contains larger number of gene duplications

- More than 50% of genome duplicated

- TAIR website contains full sequence of columbia ecotype

Page 2: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis2

Reverse genetics

Reverse genetics approach tries to identify the function of a particular gene

through the study of the impacts of a manipulation in its sequence.

Possible manipulations:

- random insertions or deletions- site-directed mutagenesis (point mutations)- gene knockout (yeast or mouse)- RNA silencing

After an alteration, the method attempts to find a possible phenotype that may

have derived from this sequence change.

If variations become observable, conclusions can be drawn about the normal

underlying function of the mutated gene.

Modifying sequence of a gene requires sequence information retrieved from

genome sequencing, EST sequencing or transcript profiling projects.M.Sc. thesis S. Pfeifer

Page 3: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis3

Reverse genetics

After choosing a specific target sequence, select mutations that inactivate

the gene or disrupt its function and thus hopefully lead to a mutated visible

phenotype.

Main advantage of reverse genetic studies:

concerned gene is already known beforehand.

Regrettably, the used mutations often result in reduced function

(thus gain-of-function mutations can not be identified)

and the discovery of redundant pathways is not possible.

Unfortunately, also only a small portion of the mutations exhibit

informative phenotypes and even fewer display morphological changes

providing a direct clue about gene function.

M.Sc. thesis S. Pfeifer

Page 4: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis4

Forward genetics

Instead one often uses forward genetic (also called classical genetic) approach

to discover the function(s) of a gene

Its allows - to consider gain-of function mutations, - identifying genes acting within a common pathway as well as genes encoding

for interacting proteins and - it is not restricted to any tissue type.

Because of its wide area of applications, this method is often the preferred

strategy in functional studies.

M.Sc. thesis S. Pfeifer

Page 5: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis5

Meiosis

Meiosis can be divided into the first

and the second meiosis.

First meiosis: segregation of the

homologous chromosomes from

each other and division of the

diploid cell into two haploid cells

each containing one of the

segregates.

Second meiosis: decouples each

chromosome’s sister strands, the

chromatids and the segregation of

the DNA into two sets of strands

(each containing one of each

homologue).

It further divides both haploid,

duplicated cells to produce 4

gametes which can fuse with other

haploid cells during fertilisation to

create a new diploid cell, or zygote.M.Sc. thesis S. Pfeifer

Page 6: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis6

Meiosis terms

A zygote is a cell that is the result of fertilization.

The haploid number is the number of chromosomes in a gamete of an individual.

Diploid cells have two homologous copies of each chromosome, usually one from

the mother and one from the father.

Plants and some algae switch between a haploid and a diploid or polyploid state,

with one of the stages emphasized over the other.

Page 7: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis7

Meiosis terms

Zygosity describes the similarity or dissimilarity of DNA between homologous

chromosomes at a specific allelic position or gene. Every gene in a diploid

organism has two alleles at the gene's locus. These alleles are defined as

dominant or recessive, depending on the phenotype resulting from the two alleles.

An organism is called homozygous at a specific locus when it carries two identical

copies of the gene affecting a given trait on the two corresponding homologous

chromosomes (e.g., the genotype is PP or pp when P and p refer to different

possible alleles of the same gene).

An organism is heterozygous at a locus or gene when it has different alleles

occupying the gene's position in each of the homologous chromosomes.

In diploid organisms, the two different alleles were inherited from the organism's

two parents. For example a heterozygous individual would have the allele

combination Pp.

Page 8: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis8

Meiosis

During the pairing of the

homologue chromosomes in

the first meiosis, the synapsis,

two copies of each chromosome

pair become physically close.

A process named recombination

or crossover can happen, if the

homologue chromosome arms

undergo a breakage and an

exchange of DNA segments,

resulting in gametic

chromosomes consisting of

material from both members of

the chromosome pair.

M.Sc. thesis S. Pfeifer

Page 9: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis9

Meiosis

The crossover directly affects the inheritance pattern of the involved genes as it

determines whether two genes will remain linked and inherited together or

whether they will be separated and inherited independently.

meiosis not only ensures proper chromosome disjunction but also contributes to

genetic diversity among the gametes.

Because recombination events are able to give an insight on the distance of two

genes, they are capable to assist map-based cloning approaches.

Map-based cloning relies on this high frequency genetic exchange events of meiotic

recombination because two closely adjacent markers are separated less frequently

than two markers which are more distant to each other during a random occurring

recombination.

In general, the crossover probability between two markers increases monotonically

as the distance between the two markers increases along the chromosome.

M.Sc. thesis S. Pfeifer

Page 10: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis10

Map-based cloning

M.Sc. thesis S. PfeiferOutcrossing : practice of introducing unrelated genetic material into a breeding line.

Page 11: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis11

Marker in F2 generation

M.Sc. thesis S. Pfeifer

Page 12: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis12

Bulk analysis of mutation effect

The mutation created in Ler is linked to markers ciw 1 and nga 280, because both markers show only the Ler specific band. In contrast, all other used markers show approximately the same ratio of Col and Ler amplification in both lanes. This indicates that the mutation is not linked with these loci. Lukowitz et al. [2000].

M.Sc. thesis S. Pfeifer

Below: schematic representation of the marker positions used in the mapping experiment. Open circles: centromeres.Right: gel of PCR products for these markers. In each panel, the left lane shows the result for the heterozygous control sample, and the right lane that for a pooled mutant sample is given on the right side. Bands specific for Ler ecotype are marked with an asterisk.

Page 13: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis13

In most organisms, SNPs comprise the largest set of sequence variants.

SNP: a single nucleotide replaces one of the other three nucleotides between

members (see Figure below).

transitions (substitutions between purines A and G or between pyrimidines C and T

and

transversions (substitutions between a purine and a pyrimidine).

In Arabidopsis both kinds are equally abundant in the genome (see Table).

SNPs

M.Sc. thesis S. Pfeifer

Page 14: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis14

SNPs

M.Sc. thesis S. Pfeifer

SNPs can be found - in intergenic regions (frequency 1 SNP per 3.5 kb),

- in coding (frequency 1 SNP per 2.2 kb) as well as - in non-coding areas (frequency 1 SNP per 3.1 kb)

of genes.

SNPs falling within coding zones are of particular

interest. Due to redundancy in the genetic code not

every modification mandatory results in a different

amino acid.

Page 15: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis15

Molecular markers: SNPs

M.Sc. thesis S. Pfeifer

Page 16: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis16

RFLPs – Restriction fragment length polymorphisms

M.Sc. thesis S. Pfeifer

RFLPs [Botstein et al., 1980], were

one of the first developed types of

DNA markers. They exploit the

circumstance that variant accessions

have almost identical genomes but

they always differ at a few nucleotides

(due to base substitutions, insertions,

deletions or sequence rearrangements

during the evolution).

Idea: Use these variations to

distinguish between ecotypes.

Employ restriction endonucleases that

recognise specific nucleic acid

sequences in the DNA and cleave

given sequences at these (or

adjacent) sites.

Some restriction enzymes and their recognition sites (arrows indicates the cut site). Some enzymes recognise not only one particular sequence but also allow variations of certain nucleotides within their recognition site. E.g. N stands for any nucleotide. Source: Restriction Enzyme Database (REBASE) [2007]

Page 17: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis17

Effect of RFLPs

M.Sc. thesis S. Pfeifer

Page 18: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis18

RFLPs – Restriction fragment length polymorphisms

After a cut, the obtained fragments may show differences in their sizes (due to

insertions or deletions) and also the number of produced pieces may vary (through

an alteration of a recognition site in the sequence by base change) between

dissimilar accessions (see Fig.).

M.Sc. thesis S. Pfeifer

Page 19: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis19

Cleaved amplified polymorphic sequence (CAPS) markers

CAPS markers detect single base changes that create or remove a recognition site

for a restriction enzyme in one of a pair of alleles.

M.Sc. thesis S. Pfeifer

Page 20: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis20

Molecular markers: short sequence repeats

Simple sequence repeats, SSR, (also called short tandem repeats, STR, simple

sequence length polymorphisms, SSLP, or microsatellites) are highly polymorphic

loci present in DNA consisting of short 2-4 bp long sequence motifs repeating

multiple times embedded in DNA with unique sequences.

M.Sc. thesis S. Pfeifer

Minisatellites (also named variable number tandem repeats, VNTR) are similar to

SSRs, but their repeated sequence is longer (about 10-100 base pairs). Both often

arise from tandem duplications or slipped strand mispairing (slippage) occurring

during replication or DNA repair on a single DNA double helix.

Page 21: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis21

Microsatellites

Classify microsatellites according to the number of nucleotides in the repeat unit.Mononucleotide and dinucleotide repeat elements are quite common, longer fragments become increasingly unlikely.

Alternative classification: - perfect repeats, containing a single uninterrupted repeat element flanked on both sides by non-repeated sequences, and - imperfect ones with two or more runs of the same repeat unit interrupted by short stretches of other sequences. Besides these simple perfect repeats (such as (CA)n) and simple imperfect repeats (for example (CA)nGT(CA)m), composed perfect repeats (for instance (AC)n(TC)m) and composed imperfect repeats (such as (CA)nA(AC)mA(GA)o) also arise in the genome of most organisms.

M.Sc. thesis S. Pfeifer

Page 22: SS 2008lecture 5 Biological Sequence Analysis 1 V5 Forward genetics – molecular markers Review of lecture V4... - Arabidopsis genome contains larger number

SS 2008 lecture 5

Biological Sequence Analysis22

Molecular markers: microsatellites

Already in 1984, Tautz and Renz showed that all possible types of perfect simple

sequence repeats composed of only one or two nucleotide(s) are present to at least

some extent in eukaryotic genomes and that one can expect to encounter at least

one simple sequence stretch every 10 kb of DNA sequence.

In 1994, Bell and Ecker addressed mono- or dinucleotide repeats which are greater

than 20 nucleotides long in the Arabidopsis accessions Columbia and Landsberg

erecta. most of them display polymorphisms between these ecotypes due to

variation in the number of the repeat units.

In 2000, this result was affirmed by a study of Lukowitz et al. showing that there is a

likelihood of 40 % that such DNA segments are polymorphic between different

accessions.

M.Sc. thesis S. Pfeifer