![Page 1: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/1.jpg)
Computational Methods to study Sequencing data
-Meenakshi Sharma
![Page 2: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/2.jpg)
2
Outline
• Bioinformatics• Genomics• Motivation• Challenges• Next-Generation-Sequencing Pipeline– Sequencing– Mapping– Assembly– Blast
![Page 3: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/3.jpg)
3
Introduction
• Biology• Computer Science• Data Mining• Statistics• Applied Mathematics• Applied Chemistry• Applied Physics
Applied Sciences
Computer ScienceBiology
Bioinformatics
![Page 4: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/4.jpg)
4
Definition
• Bioinformatics definition by bioinformatics definition Committee, National Institute of Mental Health released on July 17, 2000
“Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”
![Page 5: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/5.jpg)
5
Genomics
• Determine the complete DNA sequence for all genetic material contained in an organism
• Analysis and comparison of entire genome of a single or multiple species
• Genome: set of all genes possessed by an organism
![Page 6: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/6.jpg)
6
Genome
![Page 7: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/7.jpg)
7
Motivation
• Gene and genome organization• Study protein structure and functions• Study metabolic pathways• Study ecology and environment• Find potential pathogen
![Page 8: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/8.jpg)
8
Challenges
![Page 9: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/9.jpg)
9
Challenges
![Page 10: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/10.jpg)
10
Challenges
• Knowledge acquisition and knowledge management • Methods for Information and Knowledge Processing – Information retrieval – Statistical data analysis – High-performance and large-scale computing – Applications of new devices and emerging hardware
technologies– Visualization of data and knowledge
• Legal issues, policy issues, history, ethics
![Page 11: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/11.jpg)
11
Next-Generation-Sequencing Pipeline
SequencingSample PreparationOutput: Reads
Quality AnalysisStatisticsOutput: Quality plots
AssemblyOutput: Contigs
MappingOutput: Coverage
BlastOutput: List of organisms matched
![Page 12: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/12.jpg)
12
Healthy Tissue
Infected Tissue
Library Preparation
Illumina Sequencer
Reads fromHealthy Sample
Reads from Infected Sample
ATGCGACTCACCATGGCGACTAGGGCAATTATGTAG
ATGGGTGAATTCATGCGGACTTCGCGTATGATCCGA
Sequencing
![Page 13: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/13.jpg)
13
ATGATGATGATGATGCGACTCTACCGGCGTANC_000018
ATGATGATGATGATACTTCGCGTTCTCGCGTA
NC_000018
ATGCGACTCATGCGACTC
ATGCGACTC
ATGATGATGATGATGCGACTCTACCGGCGTA
000000000000000001
0
0000000 2 2 1 5 0 0000000000 3 … 0000000 10 20 12 45 10 0000000000 10 …
ATGCGACTCACCATGGCGACTAGGGCAATTATGTAG
ATGGGTGAATTCATGCGGACTTCGCGTATGATCCGA
Reads fromHealthy Sample
Reads from Infected Sample
Mapping
![Page 14: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/14.jpg)
14
Comparing coverages in 2 samplesHealthy Tissue
Infected Tissue
Coverage Value
![Page 15: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/15.jpg)
15
ATGCGA TGCGAG TGCGAT TGCGAG
ATGAAA TGAAAA TGAAAA GAAATA
ATGCGACTCACCATGGCGACTAGGGCAATTATGTAG…
ATGGGTTTATTCATGTCGACTTGTCAGATGATCTAA…
ATGCGAACCATGACTAGATTATGTTTCGCGAACTCCCTATCGAGATTATGTTTCGCGAATGTTTCGCGAGGTGT…
ATGGGTATTCATGTCTTTGTATGATCTAATGGGTAATGGTGTGTATGATCTA…
Assembly
![Page 16: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/16.jpg)
16
ATGCGAACCATGACTAGATTATGTTTCGCGAACTCCCTATCGAGATTATGTTTCGCGAATGTTTCGCGAGGTGT…
ATGGGTATTCATGTCTTTGTATGATCTAATGGGTAATGGTGTGTATGATCTA…
ATGCGAACCATG| papilloma virus ACTAGATTATGTTTCGCGA| Ecoli ACTCCCTATCGA| human mitochondriaGATTATGTTTCGCGA| human chr 12ATGTTTCGCGAGGTGT| polio virus…
ATGGGTATTCATG| small pox virusTCTTTGTATGATCTA| human chr 21ATGGGTAATG| growth factor geneGTGTGTATGATCTA| human mitochondria…
Blast
![Page 17: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/17.jpg)
ATGCGAACCATGACTAGATTATGTA
ATGGGTATTCATGACTTGTATGATCTA
NC_989231 ATGTAATCTAGTAGATGAGATGATAG ACTAG ACTTGT
ATGCGAACCATGACTAGATTATGTA
ATGGGTATTCATGACTTGTATGATCTA
ATGCGAACCATGACTAGATTATGTTTCGCGAGATTATGTTTCGCGAATGTTTCGCGAGGTGT
ATGGGTATTCATGTCTTTGTATGATCTAATGGGTAATGGTGTGTATGATCTA
ATGCGAACCATGACTAGATTATGTTTCGCGAGATTATGTTTCGCGAATGTTTCGCGAGGTGT
ATGGGTATTCATGTCTTTGTATGATCTAATGGGTAATGGTGTGTATGATCTA
Sequencing reads
Coverage ValuesAssembled Contigs
Matched genes and Organisms
TAGATC TGAGAT TAGATC ATGTAA TGAGAT TAGATC ATGTAA TGAGAT TAGATCNC_989231 ATGTAATCTAGTAGATGAGATGATAGATCGCAT ACTAG TGAGAT TCGCAT ACTAG TGAGAT TCGCAT ACTAG TCGCAT
Differential Coverage
ATGCGAACCATGACTAGATTATGTA
ATGGGTATTCATGACTTGTATGATCTA
17
Sequencing
Assembly Mapping
Blast Coverage Analysis
![Page 18: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/18.jpg)
18
References
1) Gibas, C. and Jambec, P., Developing Bioinformatics Computer Skills, April 2001, O'Reilly & Associates, Inc. Web. 13 February 2012.
2) Kahn, Scott D., On the Future of Genomic Data Science 331, 728 (2011); DOI: 10.1126/science.1197891
3) Wetterstrand KA., DNA Sequencing Costs: Data from the NHGRI Large-Scale Genome Sequencing Program, Available at: www.genome.gov/sequencingcosts. 13 February 2012.
![Page 19: Computational Methods to study Sequencing data](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56813e65550346895da86f0e/html5/thumbnails/19.jpg)
19
Thank you!