hw09 hadoop for bioinfomatics
TRANSCRIPT
![Page 1: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/1.jpg)
Hadoop World, NYC
Hadoop for BioinformaticsDeepak Singh
Amazon Web Services
![Page 2: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/2.jpg)
![Page 3: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/3.jpg)
Via Reavel under a CC-BY-NC-ND license
![Page 4: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/4.jpg)
By ~Prescott under a CC-BY-NC license
![Page 5: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/5.jpg)
data sets
![Page 6: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/6.jpg)
many data sets
![Page 7: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/7.jpg)
PFAM
GENBANK ENSEMBL
PDB
Many Others
![Page 8: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/8.jpg)
manageable
![Page 9: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/9.jpg)
Image: Matt Wood
![Page 10: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/10.jpg)
Human genome
Image: Matt Wood
![Page 11: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/11.jpg)
![Page 12: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/12.jpg)
![Page 13: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/13.jpg)
Image: Matt Wood
![Page 14: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/14.jpg)
~100 TB/WeekImage: Matt Wood
![Page 15: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/15.jpg)
~100 TB/Week>2 PB/Year
Image: Matt Wood
![Page 16: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/16.jpg)
![Page 17: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/17.jpg)
years
![Page 18: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/18.jpg)
days
![Page 19: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/19.jpg)
hours
![Page 20: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/20.jpg)
gigabytes
![Page 21: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/21.jpg)
terabytes
![Page 22: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/22.jpg)
petabytes
![Page 23: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/23.jpg)
really fast
![Page 24: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/24.jpg)
![Page 25: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/25.jpg)
typical informatics workflow
![Page 26: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/26.jpg)
![Page 27: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/27.jpg)
![Page 28: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/28.jpg)
![Page 29: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/29.jpg)
![Page 30: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/30.jpg)
Via Christolakis under a CC-BY-NC-ND license
![Page 31: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/31.jpg)
Via Argonne National Labs under a CC-BY-SA license
![Page 32: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/32.jpg)
Via Argonne National Labs under a CC-BY-SA license
killer app
![Page 33: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/33.jpg)
Via asklar under a CC-BY license
![Page 34: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/34.jpg)
![Page 35: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/35.jpg)
![Page 36: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/36.jpg)
Image: Chris Dagdigian
![Page 37: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/37.jpg)
![Page 38: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/38.jpg)
rethink algorithms
![Page 39: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/39.jpg)
rethink computing
![Page 40: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/40.jpg)
rethink data management
![Page 41: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/41.jpg)
rethink data sharing
![Page 42: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/42.jpg)
operational mindset
![Page 43: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/43.jpg)
scalability
![Page 44: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/44.jpg)
we are data geeks not data center geeks
![Page 45: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/45.jpg)
two key trends
![Page 46: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/46.jpg)
![Page 47: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/47.jpg)
![Page 48: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/48.jpg)
![Page 49: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/49.jpg)
develop applications
![Page 50: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/50.jpg)
distribute applications
![Page 51: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/51.jpg)
use applications
![Page 52: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/52.jpg)
some work
![Page 53: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/53.jpg)
some workfilters
^
![Page 54: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/54.jpg)
High Throughput Sequence AnalysisMike Schatz, University of Maryland
![Page 55: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/55.jpg)
• Read Mapping
• Mapping & SNP Discovery
• De novo Genome Assembly
![Page 56: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/56.jpg)
Short Read Mapping
![Page 57: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/57.jpg)
Asian Individual Genome: 3.3 Billion 35bp, 104 GB (Wang et al., 2008)
African Individual Genome: 4.0 Billion 35bp, 144 GB (Bentley et al., 2008)
![Page 58: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/58.jpg)
Alignment > 10000 CPU hrs
![Page 59: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/59.jpg)
Seed & ExtendGood alignments must have significant exact alignment
Minimal exact alignment length = l/(k+1)
![Page 60: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/60.jpg)
Seed & ExtendGood alignments must have significant exact alignment
Minimal exact alignment length = l/(k+1)
Expensive to scale
![Page 61: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/61.jpg)
Seed & ExtendGood alignments must have significant exact alignment
Minimal exact alignment length = l/(k+1)
Expensive to scale
![Page 62: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/62.jpg)
Seed & ExtendGood alignments must have significant exact alignment
Minimal exact alignment length = l/(k+1)
Expensive to scale
Need parallelization framework
![Page 63: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/63.jpg)
CloudBurst
Catalog k-mers Collect seeds End-to-end alignment
![Page 64: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/64.jpg)
http://cloudburst-bio.sourceforge.net; Bioinformatics 2009 25: 1363-1369
![Page 65: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/65.jpg)
![Page 66: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/66.jpg)
CloudBurst efficiently reports every k-difference alignment of every read
![Page 67: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/67.jpg)
many applications only need the best alignment
![Page 68: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/68.jpg)
Bowtie: Ultrafast short read aligner
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10 (3): R25.
![Page 69: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/69.jpg)
SOAPSnp: Consensus alignment and SNP calling
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10 (3): R25.
![Page 70: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/70.jpg)
Crossbow: Rapid whole genome SNP analysis
Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10 (3): R25.
Ben Langmead
![Page 71: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/71.jpg)
![Page 72: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/72.jpg)
Preprocessed reads
![Page 73: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/73.jpg)
Preprocessed reads
Map: Bowtie
![Page 74: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/74.jpg)
Preprocessed reads
Map: Bowtie
Sort: Bin and partition
![Page 75: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/75.jpg)
Preprocessed reads
Map: Bowtie
Sort: Bin and partition
Reduce: SoapSNP
![Page 76: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/76.jpg)
Crossbow condenses over 1,000 hours of resequencing computa:on into a few hours without requiring the user to own or operate a computer cluster
![Page 77: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/77.jpg)
Comparing Genomes
![Page 78: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/78.jpg)
Estimating relative evolutionary rates from sequence comparisons:Identification of probable orthologs
A B C D E
S. cerevisiae C. elegans
species treegene tree
Admissible comparisons: A or B vs. DC vs. E
Inadmissible comparisons: A or B vs. EC vs. D
![Page 79: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/79.jpg)
Estimating relative evolutionary rates from sequence comparisons:
A B C D E
S. cerevisiae C. elegans
species treegene tree
1. Orthologs found using the Reciprocal smallest distance algorithm2. Build alignment between two orthologs>Sequence CMSGRTILASTIAKPFQEEVTKAVKQLNFT-----PKLVGLLSNEDPAAKMYANWTGKTCESLGFKYEL-…
>Sequence EMSGRTILASKVAETFNTEIINNVEEYKKTHNGQGPLLVGFLANNDPAAKMYATWTQKTSESMGFRYDL…
3. Estimate distance given a substitution matrix
Phe Ala Pro Leu ThrPhe Ala µπPro µπ µπ µπLeu µπ µπ µπ µπ
![Page 80: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/80.jpg)
ab
bb
cb
c
c
c
a
b
c
vs.
vs.
vs.
vs.
vs.
vs.
Align sequences &Calculate distances
D=0.2
D=0.3
D=0.1
D=1.2
D=0.1
D=0.9
Orthologs:ib - jc D = 0.1
HL Align sequences &Calculate distances
JcIb
Genome I Genome J
RSD algorithm summary
![Page 81: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/81.jpg)
Prof. Dennis WallHarvard Medical School
![Page 82: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/82.jpg)
Roundup is a database of orthologs and their evolutionary distances.To get started, click browse. Alternatively, you can read our documentation here.
Good luck, researchers!
![Page 83: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/83.jpg)
massive computational demand
![Page 84: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/84.jpg)
1000 genomes = 5,994,000 processes = 23,976,000 hours
![Page 85: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/85.jpg)
2737 years
![Page 86: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/86.jpg)
periodic task
![Page 87: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/87.jpg)
must scale up
![Page 88: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/88.jpg)
not scalability gurus
![Page 89: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/89.jpg)
hadoop streaming
![Page 90: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/90.jpg)
![Page 91: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/91.jpg)
compared 50+ genomes
![Page 92: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/92.jpg)
what’s next?
![Page 93: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/93.jpg)
de novo assembly
![Page 94: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/94.jpg)
machine learning and statistics
![Page 95: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/95.jpg)
protein structure prediction
![Page 96: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/96.jpg)
docking
![Page 97: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/97.jpg)
trajectory analysis
![Page 98: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/98.jpg)
key driving factors?
![Page 99: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/99.jpg)
the ecosystem
![Page 100: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/100.jpg)
Pig
![Page 101: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/101.jpg)
Cascading
![Page 102: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/102.jpg)
Hive
![Page 103: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/103.jpg)
RHIPE
![Page 104: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/104.jpg)
domain specific libraries and tools
![Page 105: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/105.jpg)
http://aws.amazon.com/publicdatasets/
![Page 106: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/106.jpg)
![Page 107: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/107.jpg)
http://aws.amazon.com/education/
![Page 108: Hw09 Hadoop For Bioinfomatics](https://reader034.vdocuments.mx/reader034/viewer/2022051414/55a4995e1a28ab66758b4605/html5/thumbnails/108.jpg)