ist 444 bioinformatics high throughput genomic dna sequencing and bioinformatics
TRANSCRIPT
![Page 1: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/1.jpg)
IST 444 Bioinformatics
High Throughput Genomic DNA Sequencing and
Bioinformatics
![Page 2: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/2.jpg)
The Human Genome Project
• The Human genome is now officially sequenced. That was a big job, how did they do it?
• Is there anything that a knowledge of bioinformatics tells us that we should watch out for in the human genome sequence?
![Page 3: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/3.jpg)
What is DNA Sequencing?
• A DNA sequence is the order of the bases on one strand.
• By convention, we order the DNA sequence from 5’ to 3’, from left to right.
• Often, only one strand of the DNA sequence is written, BUT usually both strands have been sequenced as a check.
![Page 4: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/4.jpg)
Two Methods of DNA Sequencing
• Maxam - Gilbert Method, in which a DNA sequence is end-labeled with [P-32] phosphate and chemically cleaved to leave a signature pattern of bands.
• Sanger Method, in which a DNA sequence is annealed to an oligonucleotide primer, which is then extended by DNA polymerase using a mixture of dNTP and ddNTP (chain terminating) substrates.
![Page 5: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/5.jpg)
Sanger Method is a Form of DNA Synthesis
• DNA to be sequenced acts as a template for the enzymatic synthesis of new DNA strand starting at a defined primer.
• Polymerases used are Pol I type polymerases.
• Incorporation of a dideoxynucleotide blocks further synthesis of the new DNA strand.
![Page 6: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/6.jpg)
How the Reaction Works
• If the DNA is double stranded, the reaction is started by heating until the two strands of DNA separate.
• Lower the temperature and the primer sticks to its intended location by H bonds.
• DNA polymerase starts elongating the primer.
• If allowed to go to completion, a new strand of DNA would be the result.
![Page 7: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/7.jpg)
How the Reaction Works
• If we start with a billion identical pieces of template DNA, we'll get a billion new copies of one of its strands.
• We run the reactions, however, in the presence of a dideoxyribonucleotide.
• This is just like regular DNA, except it has no 3' hydroxyl group - once it's added to the end of a DNA strand, there's no way to continue elongating it.
![Page 8: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/8.jpg)
![Page 9: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/9.jpg)
![Page 10: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/10.jpg)
Original Sanger Sequencing
• A mixture of dNTPs and a single ddNTP is used in the reaction tubes.
• We start with 4 different reaction tubes, each with all four dNTPS and ONLY one of either 1% (more or less) ddA, ddC, ddG and ddT.
• The key to this is that MOST of the nucleotides are regular ones, and just a fraction of them are dideoxy nucleotides....
![Page 11: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/11.jpg)
An Example of a T tube:
• MOST of the time when a 'T' is required to make the new strand, the enzyme will get a good one and there's no problem.
• MOST of the time after adding a T, the enzyme will go ahead and add more nucleotides.
• However, 1% of the time, the enzyme will get a dideoxy-T, and that strand can never again be elongated.
• It eventually breaks away from the enzyme, a dead end product.
![Page 12: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/12.jpg)
Original Sanger Sequencing
• Sooner or later ALL of the copies will get terminated by a T.
• But each time the enzyme makes a new strand, the place it gets stopped will be random.
• In millions of starts, there will be strands stopping at every possible T along the way.
![Page 13: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/13.jpg)
Specific Primers Start the Sequence
• ALL of the strands we make started at one exact position.
• ALL of them end with a T. There are billions of them ... many millions at each possible T position.
• To find out where all the T's are in our newly synthesized strand, all we have to do is find out the sizes of all the terminated products!
![Page 14: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/14.jpg)
![Page 15: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/15.jpg)
Non-Radioactive DNA Labels• Add a chemical tag to each ddNTP that
can emit a fluorescent color when excited by a laser.
• We can add a different dye to each ddNTP and each is excited by a different laser wave length.
• Run the reactions in only one tube, not 4 tubes!
• This is easier and faster. A big contribution to high throughput sequencing.
![Page 16: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/16.jpg)
Automated DNA Sequencing
• We don't even have to 'read' the sequence from the gel - the computer does that for us!
• This is a plot of the colors detected in one 'lane' of a gel (one sample), scanned from smallest fragments to largest.
• The computer even interprets the colors by printing the nucleotide sequence across the top of the plot.
• This is just a fragment of the entire file, which would span around 700 or so nucleotides of accurate sequence.
![Page 17: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/17.jpg)
Automated DNA Sequence Readouts
![Page 18: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/18.jpg)
![Page 19: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/19.jpg)
The Biological Basis of DNA Sequencing Technology
• Virtually all DNA sequencing, (both automated and manual) relies on the Sanger method – DNA replication with dideoxy chain termination – separation of the resulting molecules by
polyacrylamide gel electrophoresis. • The DNA fragment to be sequenced must first be
cloned into a vector (plasmid or lambda).• Then the cloned DNA must be copied in a test
tube (in vitro ) by a DNA polymerase enzyme to obtain a sufficient quantity to be sequenced.
![Page 20: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/20.jpg)
![Page 21: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/21.jpg)
Sample DNA Sequencefrom ABI sequencer
![Page 22: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/22.jpg)
Automated sequencing machines,particularly those made by PE Applied Biosystems, use 4 colors, so they can read all 4 bases at once.
![Page 23: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/23.jpg)
Challenges of DNA Sequencing
• One technician with an automated DNA sequencer can produce over 20 KB of raw sequence data per day.
• The real challenge of DNA sequencing is in the analysis of the data
![Page 24: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/24.jpg)
J. Craig Venter• Proposed a whole-genome shotgun
sequencing method to NIH in 1991. Proposal rejected.
• Sets up The Institute for Genomic Research (TIGR) in 1992 (private and non-profit)
• TIGR publishes the first complete genome sequence in 1995 (Haemophilis influenzae)
• Forms Celera Genomics in 1998 to sequence human genome in three years (private, for-profit)
• The Sequence of the Human Genome is published in Science. February 2001
• Venter departs Celera. 2002
![Page 25: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/25.jpg)
HGP Sequencing Strategy
• Clone-based physical mapping• Digest genome and make Bacterial
Artificial Chromosomes (BACs, 150,000 bp each)
• Digest BACs to create fingerprints• Organize BACs to form contigs• Select BAC clones for sequencing• Shear BACs and shotgun clone• Sequence clones and assemble overlaps
![Page 26: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/26.jpg)
![Page 27: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/27.jpg)
![Page 28: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/28.jpg)
Celera Sequencing Strategy
• Whole-genome shotgun sequencing of five individuals with 5 fold coverage
• Computer assembles overlapping sequences to form contigs
• Contigs are assembled into scaffolds• Scaffolds are mapped to the genome
by two or more Sequence Tagged Site (STS) markers
![Page 29: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/29.jpg)
![Page 30: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/30.jpg)
Technology Breakthroughs• Development of Expressed Sequence Tag
(EST) method to discover and map human genes
• Development of Bacterial Artificial Chromosomes (BACs) to clone large DNA fragments
• Development of an automated high-throughput capillary DNA sequencer in 1998 (Applied Biosystems ABI PRISM 3700 DNA Analyzer)
• Development of powerful computers and software to analyze sequence data
![Page 31: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/31.jpg)
Genome Questions• Has every base in our genome been
sequenced?• What is the total number of genes and where
are they located?• How many genes have an unknown function?• What percent of our DNA encodes genes and
what is the remainder?• Do we share DNA sequences with other
organisms?• How much sequence variation is there
between individuals?
![Page 32: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/32.jpg)
Genome SequencingHTG, GSS,(WGS)
Draft Sequence (HTG division)
shredding
Whole BAC insert (or genome)
cloning isolating
assembly
sequencing
GSS divisionor trace archive
![Page 33: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/33.jpg)
GSS Division: Genome Survey Sequences
•Genomic equivalent of ESTs•BAC and other first pass surveys•BAC end sequences•Whole Genome Shotgun (some)•RAPIDS and other anonymous loci
Genomic Clone (BAC)
T7 endSP6 end
![Page 34: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/34.jpg)
Working Draft Sequence
gaps
![Page 35: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/35.jpg)
Limitations of the technology
• Sequences can only be determined in approximately 400-800 base pair chunks known as “reads.”– This is due to both the biochemistry of the
DNA polymerase enzyme and the resolution of polyacrylamide gel electrophoresis.
– most genes contain many thousands of bp and many modern sequencing projects are intended to produce complete sequences of large genomic regions (millions of bp)
![Page 36: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/36.jpg)
Assembly of Contigs
• As a result, all sequencing projects must involve the division of the target DNA into a set of overlapping ~500 bp fragments.
• and then the assembly of these fragments into complete sequences (contigs)
Contig = contiguous sequenced region
• Assembly of overlapping fragments is a computational problem
![Page 37: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/37.jpg)
![Page 38: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/38.jpg)
Contig Assembly Problems
1) The 500 bp reads of sequence data have errors of both incorrectly determined bases and insertions/deletions
2) The error rate is highest at the beginning and ends of the reads - precisely the regions that must be overlapped.
3) Some sequence from cloning vectors is often included at the ends of sequence reads
![Page 39: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/39.jpg)
Sequence Assembly Algorithms
• Different than similarity searching• Look for ungapped overlaps at end of
fragments – (method of Wilbur and Lipman, (SIAM J. Appl. Math. 44;
557-567, 1984)
• High degree of identity over a short region• Want to exclude chance matches, but not
be thrown off by sequencing errors• Vector removal uses similar approach, but
less stringent– should recognize small regions of identity and
tolerate more mismatches
![Page 40: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/40.jpg)
Celera Innovation: Clone End Tracking
• Create 3 libraries with 2, 10, and 50 KB inserts• Use information from clone ends: distance and
orientation– Can span some gaps between contigs and determine the
size of gaps
![Page 41: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/41.jpg)
Overlap at ends, not internal
![Page 42: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/42.jpg)
Software determines strategy
Based on their faith in the speed and reliability of sequence analysis/assembly software, researchers have generally taken one of three different approaches to planning sequencing projects:
•Ordered sub-cloning•Primer walking•Shotgun sequencing
![Page 43: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/43.jpg)
Ordered cloningPeople who don't trust software generally put a lot of time into dividing large pieces of DNA into small ordered overlapping fragments– This strategy requires much more initial
cloning work in the laboratory– but it minimizes the number of actual
sequencing reads required to complete a project
– It is easy to assemble the reads since it is known how they should fit together to form the final contig
![Page 44: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/44.jpg)
Primer Walking• Make a new primer from the end of each
new sequence read• It requires very fast and accurate analysis
of sequence reads since each step uses information from the previous read– Skips sub-cloning step entirely since all
sequencing reactions can be done on one large clone
– Expensive to make a lot of PCR primers but the price of primer synthesis keeps
dropping & there is an economy of scale• Assembly problems are minimized since
both the order and the amount of overlap of reads are known
![Page 45: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/45.jpg)
Shotgun Sequencing
• Shotgun sequencing takes maximum advantage of the speed and low cost of automated sequencing• relies totally on software to assembly a
jumble of essentially random sequence reads into a coherent and accurate contig
• TIGR demonstrated “proof of concept” on the genomes of Haemophilus influenzae, Methanococcus jannaschii, and Mycoplasma genitalium
• Celera Genomics demonstrated the ability to shotgun sequence the entire human genome (?)
![Page 46: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/46.jpg)
Human Genome Assembly• The HGP vs. Celera race to sequence the
entire human genome was a classic battle of different strategies
• The HGP used an ordered cloning approach– Breaking the genome into mapped BAC clones,
then shotgun sequencing the BACs
• Celera used a modified shotgun method – Random clones of various sizes (size selected libraries)– Plus relative mapping of clone ends (they must be
located in the assembly at the correct distance and orientations
– Created custom software to handle the assembly– Celera did make use of the “scaffold” built by the HGP
![Page 47: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/47.jpg)
Other Large Sequencing Projects
• Phylogenetic identification/analysis• medical studies of bacteria • environmental samples
• EST sequencing - differential expression
• cDNA studies• alternate splicing• full length transcripts
• Genotyping • score known alleles• identify new mutations
![Page 48: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/48.jpg)
Automation• The "pipeline" approach:
– Vector removal– Assembly of identical and/or overlapping
fragments– Identify genes
• Lookup on genome if fully sequenced organism– Or genome contigs for partially sequences organsims
• BLAST search of GeneBank for similar genes• Lookup in specialized database of "predicted
genes"– ie. ENSEMBL
• Project specific analysis• differentials between sets• Phylogenetics
![Page 49: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/49.jpg)
DATABASE!!• What these projects all share is a need to
keep track of a lot of data– Hundreds to thousands of sequences– Many fields of information about each one
» Organism, library, plate ID for each clone» the sequence itself» cluster/contig membership» best BLAST hit (accession #, e-value, alignment)» genome position
• Can't keep track just using folders and text files on your hard drive
• Design the database to include all possible fields
(it’s a lot harder to add info later)
![Page 50: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/50.jpg)
Computer tools for sequencing
• A wide variety of different software tools have been created to aid DNA sequencing projects.– Each genome project lab has built its own
custom software UNIX Based on a particular workflow design PHRED, PHRAP, and Consed
– Many packages for the individual investigator - included in most “comprehensive” molecular biology products: MacVector, LaserGene, DNA*, etc.
– I will focus on the assembly tools in GCG.
![Page 51: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/51.jpg)
The GCG Fragment Assembly System
• GCG has a complete set of programs that allow data entry, and assembly of overlapping nucleotide sequence fragments into one contig– SEQED: a single sequence editor– GELSTART: creates fragment assembly projects– GELENTER: adds sequences (reads) to an assembly project,
input of new sequences from keyboard, digitizer, or import of existing text files
– GELMERGE: assembles individual sequences into contigs, can automatically remove vector sequences
– GELASSEMBLE: multiple sequence editor for viewing and editing contigs, allows manual alignment of fragments insertion/deletion of gaps and changing of individual bases
GELDISASSEMBLE: breaks up contigs into individual sequences within a project
– GELVIEW: displays contigs as a schematic display of overlapping fragments
![Page 52: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/52.jpg)
![Page 53: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/53.jpg)
SeqLab has a Chromatogram viewer
![Page 54: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/54.jpg)
Other Chromatogram Viewers• Applied Biosystems has a free viewer/editor
program for sequence chromatograms– It is called EditView and it is a Macintosh only
program (does not work in System 9.1 and newer)http://cancer-seqbase.uchicago.edu/documents/EditView.hqx
• There are a couple of viewers for Windows machines– ABIView is free from David H. Klatte
http://bioinformatics.weizmann.ac.il/software/abiview/abiinfo.html
– Chromas is $50 shareware from Conor McCarthy, Technelysium Pty Ltd in Australiahttp://www.technelysium.com.au/chromas.html
![Page 55: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/55.jpg)
The Genome Sequencing Era
1998 2000 1997 1999 1996 2001 2002
First microbial genomeH. influenzae
First eukaryote genomeYeast
E. coli
First multicellular animalC. elegans
Fruit fly
First higher plantArabidopsis
First mammalHomo sapiens
40 microbial genomes
malaria:mosquito
andparasite
First fishFugu
mouseand
tunicate
100 microbial genomes
18 microbial genomes
![Page 56: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/56.jpg)
Complex Genomes Jan. 2003
• Chordates– Human– Mouse– Rat– Pufferfish– Sea squirt
(Ciona)
• Arthropods– D. melanogaster– D. simulans– A. gambiae
• Higher plants– Arabidopsis– Rice
• Fungi– Aspergillus terreus
![Page 57: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/57.jpg)
Coming soon …
• In progress– purple sea urchin– zebrafish
• NHGRI’s Priority Organisms– Chicken – Cow– Dog– Chimpanzee– Honeybee– Tetrahymena– Oxytrichia– Several fungi
• Over 100 bacterial genomes
![Page 58: IST 444 Bioinformatics High Throughput Genomic DNA Sequencing and Bioinformatics](https://reader035.vdocuments.mx/reader035/viewer/2022062308/56649d025503460f949d5b63/html5/thumbnails/58.jpg)
Controversy and Issues
• Does human DNA sequence information belong to everyone?
• Should publication require the release of all data?
• Did Celera use public information to complete the human sequence?
• Should a gene or life form be patented?
• Should personal genetic information be protected from public release?