![Page 1: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/1.jpg)
Human Genome
![Page 2: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/2.jpg)
Human Genome Contents: 3200 Mb
• Genes: 1200 Mb– Genes 48 Mb– Related 1152 Mb: Pseudogenes, Gene Fragments,
Introns
• Intergenic DNA 2000 Mb– Interspersed Repeats 1400 Mb– Microsatellite (short tandem repeats) 90 Mb
• Telomeres: End Sequences• Centromeres:• Single Nucleotide Polymorphisms
![Page 3: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/3.jpg)
Chromosomes
• Shorter than DNA they contain
• Histones: DNA binding proteins
• Two Copies held together by centromeres
• Telomere: Terminal region
• Two humans differ by 0.1%
![Page 4: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/4.jpg)
![Page 5: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/5.jpg)
Donors
• HGP: – Opportunity advertised near labs
– First come; First Taken
– 5-10 samples for every one used
– No link between donor and sample
• Celera: 5 subjects (three men; two women)– One Asian; One African-American; One Hispanic; Two
Caucasians
– Craig Venter
![Page 6: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/6.jpg)
Basic Technology
• Physical Mapping
• Cloning
• Shotgun Sequencing
• Computational Sequence Reassembly
![Page 7: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/7.jpg)
STS
• High Resolution, Rapid, Simple
• 100 - 500 bp
• Collection of overlapping fragments
• Each point represented multiple times in random fragments
• Sequence must be known
• Unique in chromosome under study
![Page 8: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/8.jpg)
Physical Mapping
• A set of clone fragments whose position relative to each other is known
• Restriction Maps: Relative locations of Restriction Sites• Fluorescent in situ hybridization (FISH): Marker
locations mapped by hybridizing probe to chromosomes• Sequence Tagged Sites (STS): Positions of short
sequences mapped by PCR or hybridization analysis of genome fragments
• Expressed Sequence Tags (EST): short sequences from cDNA clones
![Page 9: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/9.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Genome cut into fragments
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Cloned as library in vector (red)
![Page 10: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/10.jpg)
Hybridisation mapping:1 pick clones into a grid 2 hybridise to probe 1 3 hybridise to probe 2 4 build contigs In this case, two clones hybridised to both probes and thus they are predicted to overlap. Those hybridising to only one probe are predicted to extend out to the left or right.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
![Page 11: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/11.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Overlap by sharedbands
Fingerprinting:Digest clones and runOn gel
![Page 12: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/12.jpg)
Assembly of Contiguous DNA Sequence
• Shotgun Approach
• Contigs: Result of joining overlapping sequences
• Scaffold: Result of connecting contigs by filling in gaps
• BAC: Bacteria artificial chromosome vector: Inserts 100 - 200 kbs
![Page 13: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/13.jpg)
Regional mapping
![Page 14: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/14.jpg)
Regional mapping
![Page 15: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/15.jpg)
Minimal tiling path selected for sequencing.
Regional mapping
![Page 16: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/16.jpg)
>20 kbp
~300 bp
Molecular weightmarker every
5th laneRestriction fragmentfingerprinting
- BAC clones are grown
in 96-well format
- Hind III digest
- 1% agarose
![Page 17: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/17.jpg)
Contig assembly
Clone A B C D E F G
FPC* Overlap identification by
restriction pattern similarities Facilitated contig assembly
*Sanger Centre C. Soderlund, I Longden and R. Mott
*
*
*
*
*
*
All restriction fragments withina clone selected for the tilingpath must be verified by theirpresence in overlapping clones. : vector fragments
: insert fragments
![Page 18: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/18.jpg)
BCM-BCM-HGSCHGSC
![Page 19: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/19.jpg)
Shotgun Sequencing I :RANDOM PHASE
Bac Clone: Bac Clone: 100-200 kb100-200 kb
Sheared DNA: Sheared DNA: 1.0-2.0 kb1.0-2.0 kb
SequencingSequencingTemplates: Templates:
RandomRandomReadsReads
BCM-BCM-HGSCHGSC
![Page 20: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/20.jpg)
Shotgun Sequencing II:ASSEMBLY
ConsensusConsensusSequenceSequence
GapGap
Low Base Low Base QualityQuality
SingleSingleStrandedStrandedRegionRegion
Mis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC
![Page 21: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/21.jpg)
ConsensusConsensusSequenceSequence
GapGap
Low Base Low Base QualityQuality
SingleSingleStrandedStrandedRegionRegion
Mis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC
Shotgun Sequencing III: FINISHING
![Page 22: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/22.jpg)
ConsensusConsensusSequenceSequence
GapGap
SingleSingleStrandedStrandedRegionRegion
Mis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC
Shotgun Sequencing III: FINISHING
![Page 23: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/23.jpg)
ConsensusConsensusSequenceSequence
GapGap
Mis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC
Shotgun Sequencing III: FINISHING
![Page 24: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/24.jpg)
ConsensusConsensusMis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC
Shotgun Sequencing III: FINISHING
![Page 25: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/25.jpg)
BCM-BCM-HGSCHGSC
Shotgun Sequencing III: FINISHING
High Accuracy Sequence:High Accuracy Sequence:< 1 error/ 10,000 bases< 1 error/ 10,000 bases
![Page 26: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/26.jpg)
Whole Genome Shotgun Sequencing
Whole Genome: Whole Genome: 3,000 Mb3,000 Mb
Sheared DNA: Sheared DNA: 1.0-2.0 kb1.0-2.0 kb
SequencingSequencingTemplates: Templates:
RandomRandomReadsReads
BCM-BCM-HGSCHGSC
![Page 27: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/27.jpg)
Whole Genome Shotgun Sequencing:Assembly
ConsensusConsensusSequenceSequence
GapGap
Low Base Low Base QualityQuality
SingleSingleStrandedStrandedRegionRegion
Mis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC
![Page 28: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/28.jpg)
Whole Genome Shotgun Sequencing:Assembly
ConsensusConsensusSequenceSequence
GapGap
Low Base Low Base QualityQuality
BCM-BCM-HGSCHGSC
![Page 29: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/29.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Random fragmentation of genome produces good sampling of itssequence space. Overlaps are identified, and subassembly of sequence takes place after cloning into universal vector.
![Page 30: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/30.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Digested into RandomFragments
![Page 31: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/31.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Cloned into Vector
![Page 32: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/32.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Sequenced from know ends of plasmid (vector)
![Page 33: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/33.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Assembled into contigs. Gaps and single-stranded regions identified for further study. Targeted fornew sequencing. Double-Barreled: Both Strands.
![Page 34: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/34.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
In the gaps:
![Page 35: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/35.jpg)
![Page 36: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/36.jpg)
Whole-Genome Shotgun Sequencing
• Speed-up: Assembled Correctly?• Avoid up-front mapping• Huge amount of computer time to identify
overlaps• Have to reference a map• Repeats are a problem:
– Leave out sequence between repeats– Missing Reference End Sequence means Error
![Page 37: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/37.jpg)
![Page 38: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/38.jpg)
HGP
• Isolate large fragments in BACs with framework of landmark-based physical map
• Sequence on clone-by-clone basis
• Time-Consuming subcloning of random fragments and physical mapping
![Page 39: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/39.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 40: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/40.jpg)
Sequence Reassembly
• Phrap
• Shortest Covering Superstring
• Map Assembly
• Overlap: Finding overlapping fragments
• Layout: ordering fragments
• Consensus: Sequences from layout
![Page 41: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/41.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 42: Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb](https://reader036.vdocuments.mx/reader036/viewer/2022062407/56649f4f5503460f94c71aeb/html5/thumbnails/42.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.