lecture 3,4
DESCRIPTION
Genome projects, Secondgen and Thirdgen genome sequencing, application of genome sequencing in predicting disease genesTRANSCRIPT
Sucheta TripathyGenome Sequencing Projects, Genome Size,
Application of sequence information for identification of disease genes
Complete Genome SequencingWhole genome shotgun sequencingBAC end sequencingChromosome walkingEnd sealing
Reference: http://en.wikipedia.org/wiki/File:Genome_Sizes.png
Cost of Genome Sequencing
Nextgen sequencing methods454 sequencing methods(2006)
Principles of pyrophosphate detection(1985, 1988)
Illumina(Solexa) Genome sequencing methods(2007)Applied Biosystems ABI SOLiD System(2007)Helicos single molecule sequencing(Helioscope, 2007)Pacific Biosciences single-molecule real-time(SMRT)
technology, 2010Sequenom for Nanotechnology based sequencing.BioNanomatrixnanofluidiscsRNAP technologyhttp://www.ncbi.nlm.nih.gov/books/NBK20261/
Sequencing methods
Ref: http://www.wellcome.ac.uk/Education-resources/Teaching-and-education/Animations/DNA/WTX056046.htm
http://www.wellcome.ac.uk/Education-resources/Teaching-and-education/Animations/DNA/WTX056051.htm
http://www.wellcome.ac.uk/Education-resources/Teaching-and-education/Animations/DNA/WTDV026689.htm
Ion Torrent
SOLiD Sequencing
http://www.genomesonline.org/cgi-bin/GOLD/index.cgi
http://www.insdc.org/
http://www.ebi.ac.uk/embl/Contact/collaboration.html
• JGI – IMG [http://img.jgi.doe.gov/]
• Broad [http://www.broadinstitute.org/]
• TIGR [http://www.jcvi.org/]
• WashU [http://genome.wustl.edu/]
• VBI at Virginia Tech [www.vbi.vt.edu]
Microbial Genome Sequencing
Human Genome Project
In October 1990 Human
Genome project started
First Publication in 2000
Finished paper in 2003
NHGRI Solicited
pilot proposal
for ENCODE
First Report on Encode
Published in 2007
RFAs were sought for
full ENCODE
ENCODE published
2012
GWAS -90% lies outside coding
2005
What happens next?You have 10 million characters – what to do
with them?Locate genesDetermine the function of the gene
By similarity search By domain search By Predicting signal peptide By locating transmembrane region
Ref: http://www.nature.com/nature/journal/v406/n6797/pdf/406799a0.pdf
Genome Annotation
ATGAAGATAGACAGCATACTAGCAGCATAGAATAGATAAGAGATAGAAATAGAATAAATATAAGA
GAGA
Run 6 frame translation
Run Blastp with nr
Match
foundN
o
Make an hmmsearch
Match
found
Product found
Pathway analysisOther analysis
Repeat Finding, miRNA finding, tRNAscan etc.
NO
Unknown Genes Hypothesis
Genome SizesGametic Nuclear DNA contentRepresented as mass in pg(pico grams) or
length in mega bases
1 pg = 10^-12 gms
1mb = 10^6 bases
1 pg = 978 Mb
Ref: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1669731/
Genome SizesDatabase of Genome Sizes
http://www.cbs.dtu.dk/databases/DOGS/Plant Genome database
http://www.kew.org/genomesize/homepage.html
Mamalian genome size databasehttp://www.unipv.it/webbio/dbagsdb.htm
Animal Genome size databasewww.genomesize.com
Fungal Genome size database.www.zbi.ee/fungal-genomesize
Ref: http://www.kew.org/genomesize/homepage.html
Ref: http://www.genomesize.com/
Ref: http://www-3.unipv.it/webbio/dbagsh.htm
Ref: http://www.zbi.ee/fungal-genomesize/
Identifying Human Disease genesref: http://www.ncbi.nlm.nih.gov/books/NBK7561/
Before 1980, very few genes were recognizedReverse Genetics: Know gene product and go
back to gene and do a positional cloningGenetic Redundancy: Multiple genes have the
same function
Identification of genes through protein product
1000 genomes project1092 genomes of different individuals
sequenced.14 populationsLow coverage exome sequencing
38 million SNPs1.4 million short insertions14,000 large deletions
Ref: http://www.nature.com/nature/journal/v491/n7422/full/nature11632.html