sequence optimization for synthetic genes using genetic algorithms
Post on 10-Feb-2016
44 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
Sequence Optimization For Synthetic GenesUsing Genetic Algorithms
David Sigfredo Angulo1
Rob Vogelbacher1, Benjamin R. Capraro2, Tobin Sosnick2,Shohei Koide2
1 School of Computer Science Telecommunications and Information Systems DePaul University2 Department of Biochemistry and Molecular Biology The University of Chicago
Introduction
• Genetic Algorithms:– Using ideas based on the biology of genes– Create software to use such a stochastic
means to search through large searchspaces– Resulting algorithm has nothing to do with
genes• Designing Genes
– This search space is huge– REALLY NOVEL IDEA:
• Use Genetic Algorithms based on genes to design genes!!
3
Outline
• Short biology Tutorial• DNA Sequence Generation
– Why is the problem difficult?• IBG Gene Designer
– Genetic Algorithm (GA) solution– Heuristics and Fitness Evaluation
First
• Before the problem can be described– Must give some background biochemistry principles
• Tutorial outline– DNA– Codons– Protein
• Synthetic genes– What are they and what are they used for?
– Restriction Enzymes– Expressing Proteins using Vectors
Transcription/Translation
Transcription Translation
DNA RNA Protein RNA Polymerase Ribosomes
Central Dogma of Molecular Biology
DNA
• Deoxyribonucleic acid• Strand backbone is made
of sugar & phosphate molecules
• Strands connected by nitrogen containing nucleotide bases
• Two strands join making a double helix
• Each strand is made of nucleotides joined together
2 nm
11 nm
30 nm
300 nm
700 nm
1100 nm
Short region of DNA 2bl helix
"beads on a string" form of Chromatin
30 nm chromatin fiber of packed nucleosomes
Section of chromosome in an extended form
Condensed section of chromosome
Entire mitotic chromosome
DNA
Four Nucleotides:AGTC
DNA: Base Pairing
Short Biology Tutorial
• Tutorial outline– DNA– Codons– Protein– Restriction Enzymes– Expressing Proteins using Vectors
11
DNA Sequence Generation:Codon to Amino Acid Translation
http://campus.queens.edu/faculty/jannr/Genetics/images/codon.jpg
Short Biology Tutorial
• Tutorial outline– DNA– Codons– Protein– Restriction Enzymes– Expressing Proteins using Vectors
Proteins: AA Chains
Proteins
• Amino Acid Chains Fold Into complex 3D Structures• Functional properties depend on
3D structure• Usefulness depends on
functional properties– E.g. designing drugs
Designed/Expressed Proteins Extremely Useful
• Designed Proteins– Can be used to study protein
structure– Can be used to study effects of
otther proteins• Can be designed to “knock
out” other proteins• Can be designed to “block” the
acgtion of other proteins• Expressed proteins
– Expressed in cow’s milk or chicken eggs
– Can manufacture drugs on large scales in this way
• E.g. insulin
16
Synthetic Genes• DNA sequences
– “backtranslated” from a novel Protein or Amino Acid sequence
Transcription Translation
DNA RNA Protein RNA Polymerase Ribosomes
• We’ll put the DNA for our designed protein into an organism (a vector)• Then that vector will make (express) our protein• But, how do we get the DNA into an organism???
Short Biology Tutorial
• Tutorial outline– DNA– Codons– Protein– Restriction Enzymes– Expressing Proteins using Vectors
Restriction Enzyme Digests
• Watson – Crick 1953• Took 20 years to be able to do anything with DNA• H. Smith (and others) made a discovery that allowed manipulation and
deciphering of DNA• Discovery was that bacteria produced enzymes that introduce breaks in
double stranded DNA molecules whenever they encountered a specific string of nucleotides
• These enzymes are called Restriction Enzymes• Restriction Enzymes can be used as precise scissors
– They let biologists cut (and paste) portions of DNA
EcoRI• EcoRI was the very first Restriction
Enzyme discovered– "Eco" because it was isolated
from E. Coli (Escherichia Coli)– "R" because it is a Restriction
Enzyme– "I" because it was the first
Restriction Enzyme from E. Coli
– Now over 300 Restriction Enzymes known
• EcoRI cleaves (restricts, digests) DNA– Between the G and A
nucleotides– Only when it encounters them
in the string 5'-GAATTC-3'– This is called the
restriction site
5'-GAATTC-3'3'-CTTAAG-5'
5'-G AATTC-3'3'-CTTAA G-5'
Regulated by EcoRI
Sticky Ends
• Many restriction enzymes in such a way that some single stranded DNA is left at both ends• These nucleotide sequences
– Are complimentary to each other– Are 5'-AATT-3' in the case of EcoRI– Can base pair with other nucleotides in a sequence– Thus, are called "sticky ends"– Can temporarily hold two
DNA strands together– The enzyme ligase
will permanently jointhose strands
– This is calledligation
5'-GAATTC-3'3'-CTTAAG-5'
5'-G AATTC-3'3'-CTTAA G-5'
Regulated by EcoRI
Short Biology Tutorial
• Tutorial outline– DNA– Codons– Protein– Restriction Enzymes– Expressing Proteins using Vectors
22
Gene Synthesis:On the Lab Bench
• Initial Sequence Construction– Oligonucleotides (short strands of DNA) are defined with complementary
overlapping sites • The “sticky ends”
– Assembly PCR• Oligonucleotides and polymerase are mixed and placed in a
thermocycler• Creates contiguous DNA sequence from component oligos
23
Gene Synthesis:On the Lab Bench (cont)
• After PCR, generated DNA sequence cut with restriction enzymes• Expression hosts's plasmid cut with restriction enzymes• Synthetic gene inserted into plasmid and plasmid repaired• Expression Vectors
– Host organisms used to express the synthetic genes (make the protein)– Typically E. Coli
• Possibly Chickens or Cows• Expression vector can now express protein coded for by synthetic gene
– A bit more complicated than described above!!!
24
DNA Sequence Generation:Gene Insertion
25
Outline
• Short biology Tutorial• DNA Sequence Generation
– Why is the problem difficult?• IBG Gene Designer
– Genetic Algorithm (GA) solution– Heuristics and Fitness Evaluation
26
DNA Sequence Generation:The Computational Problem
• Why is the problem difficult?– Conflicting goals
• Avoid restriction sites• Maximizing Codon Preference• Thus, cannot use deterministic algorithm
– Degeneracy (redundancy) of the DNA code – 64 codons, 20 (21) amino acids (see next slide)
• Several synonymous codons are translated into the same amino acid• Synonymous codons per AA vary from one to six (average is four
codons per AA)• Huge number of possible DNA Sequences
– Average 2N for protein of amino acid length n– Codon Preference
• Varying levels of tRNA assembly components in organisms• Codon usage for a particular AA greatly influence protein expression
– (continued)
27
DNA Sequence Generation:Codon to Amino Acid Translation
http://campus.queens.edu/faculty/jannr/Genetics/images/codon.jpg
28
DNA Sequence Generation:The Computational Problem (cont)
• Why is the problem difficult?– (continued)– Restriction Enzymes
• The vector will contain many restriction enzymes– If these cut up our DNA, we won’t express our proteins– We must design the DNA string using synonymous codons so that there are no
restriction sites
• Helpful to include some other restriction sites – We must design the DNA string using synonymous codons so that these are
included
– (continued)
29
DNA Sequence Generation:The Computational Problem (cont)
• Why is the problem difficult?– (continued)– mRNA Secondary Structure
• In prokaryotes, mRNA can fold into complex shapes
• This inhibits protein creation– Oligonucleotide generation
• Want a specific melting temperature so that the complex folding doesn’t take place
• The “sticky ends” must have the same melting temperature so that they will bind together.
30
Outline
• Short biology Tutorial• DNA Sequence Generation
– Why is the problem difficult?• IBG Gene Designer
– Genetic Algorithm (GA) solution– Heuristics and Fitness Evaluation
31
IBG GeneDesigner:Our Solution
•IBG GeneDesigner
32
IBG GeneDesigner:Genetic Algorithm
• Uses a Genetic Algorithm for sequence optimization– Tournament selection model– Uniform and single-point crossover (behind the scenes – not user selectable
at present.)– Mutation causes codon “wobbling”– Sequence “fitness” determined by heuristic evaluation
33
IBG GeneDesigner:Fitness Evaluation
• GeneDesigner heuristics– Manipulation of nucleotide percentages/ratios to reduce mRNA secondary
structure formation– Inclusion and Exclusion of restriction sites
• Restriction sites requested for inclusion should only occur once– Matching of codon preference– Oligonucleotide generation
• Fitness determined by melting points, start and end nucleotide
34
IBG GeneDesigner:Future Work
• Algorithm parameters– Systematically manipulate GA parameters to identify default values for
sequence optimization• Population size• Number of generations• Mutation rate• Convergence criteria
– Modify heuristic weighting scheme• Selection models
– Experiment with alternative selection models (Roulette wheel, elitism, limit population replacement)
35
IBG GeneDesigner:Future Work
• Move algorithm to ECJ architecture– Use the Strength-Pareto multi-objective optimization algorithm
• Create web-based version of application• Explore island model effects on optimization
Results
• IBG GeneDesigner utilized to generate a nucleotide sequence for the SH3 domain of a-spectrin1.
• The codon optimization option was set for expression in E. coli with a 40% G/C bias
• We also used the application to generate four assembly PCR template oligonucleotide sequences to produce the protein coding sequence flanked by desired restriction enzyme recognition sites.
• The calculated Tm values of the three overlapping regions were within 1.6oC– Promoting similar annealing behavior between strands. – Success of the reaction was confirmed by DNA sequencing of a pUC19
expression vector containing the PCR product cloned between restriction sites included in the gene design.
• Summary: Protein Made!!!
Input: Protein Sequnce, Vector, Restriction Enzymes
Input: Flanking Sequences
Input: Algorithm Parameters and Fitness Scores
Output: Generation of Oligonucleotides
42
Acknowledgements
• Graduate student who did much of the coding• Rob Vogelbacher
• University of Chicago undergraduate who used it to build a protein• Benjamin R. Capraro
• His advisor• Tobin Sosnick
• Our collaborator at University of chicago• Shohei Koide
top related