molecular evolution, genomic analysis and freebsd · the central dogma of biology i dna is made up...
TRANSCRIPT
![Page 1: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/1.jpg)
Molecular Evolution, Genomic Analysisand
FreeBSD
Joseph Mingrone
Department of Mathematics and StatisticsDalhousie University
![Page 2: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/2.jpg)
About the Bielawski Group
I Small group (5 - 10 members)
I Molecular Evolution and Genomic AnalysisI We analyze DNA sequences to make
inferences
I The Research is purely computational
I Multidisciplinary
![Page 3: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/3.jpg)
Outline
I Hardware
I Research tracksI Modelling evolution at the molecular level
I MircroBiome / Metagenomics
I Software, Design decisions, Observations
![Page 4: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/4.jpg)
HardwareComputing Cluster: Awarnach
I Purchased from Sun in 2006
I Sun Fire V40z master node (two dualcore Opteron 870, 16 GB ECC RAM)
I Twenty compute nodes (X4100), twodual core Opteron 270, 4 GB ECCRAM, two 73 GB disks and four gigabitethernet ports
I 48-port gigabit SMC switch
![Page 5: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/5.jpg)
HardwareStorage
I Asus RS300-E7-PS4 1U Server
I E3-1230V2 Xeon CPU
I Four Intel 60GB SSD (520 Series)
I LSI 9205-8e SAS controller
I Supermicro SC847E16-RJB0D1 JBODwith 10 WD30EFRX 2TB Hard Drives(ZFS raidz)
![Page 6: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/6.jpg)
HardwareNew Compute Node
I New Compute Node
I Four 12-core 6348 CPUs
I 256 GB RAM
![Page 7: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/7.jpg)
Track 1: Background
I Studying evolution is studying the pastI Clues in present day to infer past processes
I Morphology from fossil record
I Studies with some organisms
I Genetic material contains many markers ofevents in evolutionary history
![Page 8: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/8.jpg)
Track 1: Molecular Evolution Modelling
I Classify selection pressure at the protein and amino acidlevel
I Purifying Selection
I Neutral Evolution
I Positive Selection
I Usually most interested in detecting sites under positiveselection
![Page 9: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/9.jpg)
Track 1: Arms Race
![Page 10: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/10.jpg)
Track 1: Arms Race
![Page 11: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/11.jpg)
Track 1: Analysis of HIV Genes
I Viral Envelop (Env) 3/91
I DNA polymerase (Pol) 11/947
I Viral infectivity factor (Vif) 10/192
![Page 12: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/12.jpg)
Track 1: BackgroundThe Central Dogma of Biology
I DNA is made up of four nucleotides with four differentbases: Adenine (A), Cytosine (C), Guanine (G), orThymine (T)
I Codons specify amino acids, the building blocks of protein
![Page 13: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/13.jpg)
Track 1: Redundant Code
I 43 = 64 possible codons (61 sense), 20 amino acids
I Code is redundant
I Nucleotide substitution may or may not mean change inamino acid
![Page 14: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/14.jpg)
Track 1: A Measure of Selection Pressure
dS = rate of synonymous substitutionsdN = rate of nonsynonymous substitutions
dN/dS = 1 Neutral EvolutiondN/dS < 1 Purifying SelectiondN/dS > 1 Positive Selection
dN/dS = ω
A measure of the strength and direction of selection pressure
![Page 15: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/15.jpg)
Track 1: Markov Process
qij =
0πjκπjωπjωκπj
![Page 16: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/16.jpg)
Track 1: Estimating Model Parameters is ComputationallyIntense
TGCCACCCCGGC...
TGCCAGCCCGGC...
AGCCACCCCGGC...
TGCCACCCCGGC...
TGCCACCTCGGC...
qij =
0πjκπjωπjωκπj
![Page 17: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/17.jpg)
Track 1: Estimating Model Parameters is ComputationallyIntense
TGCCACCCCGGC...
TGCCAGCCCGGC...
AGCCACCCCGGC...
TGCCACCCCGGC...
TGCCACCTCGGC...
• •
qij =
0πjκπjωπjωκπj
![Page 18: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/18.jpg)
Track 1: Estimating Model Parameters is ComputationallyIntense
TGCCACCCCGGC...
TGCCAGCCCGGC...
AGCCACCCCGGC...
TGCCACCCCGGC...
TGCCACCTCGGC...
• •qij =
0πjκπjωπjωκπj
![Page 19: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/19.jpg)
Track 1: Estimating Model Parameters is ComputationallyIntense
TGCCACCCCGGC...
TGCCAGCCCGGC...
AGCCACCCCGGC...
TGCCACCCCGGC...
TGCCACCTCGGC...
•
• qij =
0πjκπjωπjωκπj
![Page 20: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/20.jpg)
Track 1: Estimating Model Parameters is ComputationallyIntense
TGCCACCCCGGC...
TGCCAGCCCGGC...
AGCCACCCCGGC...
TGCCACCCCGGC...
TGCCACCTCGGC...
• •
qij =
0πjκπjωπjωκπj
![Page 21: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/21.jpg)
Track 2: MircroBiome / Metagenomics
I Culturing diverse microbial communities in the lab can bedifficult
I Sequence DNA as it exists in the natural environment(metagenomics)
I Understand associations between changes in microbiomesand changes in complex systems
I BiomeNET BioMiCo: group microbial communitiesaccording to properties
El-Swais, Heba, et al. “Seasonal assemblages and short-lived blooms incoastal north-west Atlantic Ocean bacterioplankton.” Environmentalmicrobiology (2014).Shafiei, Mahdi, et al. “BioMiCo: a supervised Bayesian model for inference ofmicrobial community structure.” Microbiome 3.1 (2015): 1-15.Shafiei, Mahdi, et al. “BiomeNet: A Bayesian model for inference of metabolicdivergence among microbial communities.” (2014): e1003918.
![Page 22: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/22.jpg)
Research and Computing Technology are Coupled
●
●
● ●●● ●●● ●
● ●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
● ●●● ● ●
●●● ●
● ● ●
●
●
●
●
●
●
● ●●●
●
●
●
●●●
●
●
●●
●
● ●●●●● ●
●
●●
●●●●●●●
●●
●●
●●●●●●
●
●●
1970 1980 1990 2000 2010
02
46
810
year
log 1
0
●
●
●
CitationsMoore's LawSequenced DNA
●
●
Intel 4004Zilog Z80ARM 1PentiumAtomIBM z13
I • Nucleotides stored in the EMBL Nucleotide SequenceDatabase
I � Number of transistors in current Intel PC processors
I
![Page 23: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/23.jpg)
Software
I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node
I Poudriere, VirtualBox (MatLab)
I ZFS (master node and storage server), NFS
I math/R, lang/perl5.20
I print/texlive-full
I biology/paml / Proteus
I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR
![Page 24: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/24.jpg)
Software
I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node
I Poudriere, VirtualBox (MatLab)
I ZFS (master node and storage server), NFS
I math/R, lang/perl5.20
I print/texlive-full
I biology/paml / Proteus
I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR
![Page 25: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/25.jpg)
Software
I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node
I Poudriere, VirtualBox (MatLab)
I ZFS (master node and storage server), NFS
I math/R, lang/perl5.20
I print/texlive-full
I biology/paml / Proteus
I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR
![Page 26: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/26.jpg)
Software
I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node
I Poudriere, VirtualBox (MatLab)
I ZFS (master node and storage server), NFS
I math/R, lang/perl5.20
I print/texlive-full
I biology/paml / Proteus
I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR
![Page 27: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/27.jpg)
Software
I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node
I Poudriere, VirtualBox (MatLab)
I ZFS (master node and storage server), NFS
I math/R, lang/perl5.20
I print/texlive-full
I biology/paml / Proteus
I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR
![Page 28: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/28.jpg)
Software
I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node
I Poudriere, VirtualBox (MatLab)
I ZFS (master node and storage server), NFS
I math/R, lang/perl5.20
I print/texlive-full
I biology/paml / Proteus
I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR
![Page 29: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/29.jpg)
Software
I Originally Solaris, since 2009 FreeBSD release (7.x -FreeBSD 10.1), STABLE on storage / new compute node
I Poudriere, VirtualBox (MatLab)
I ZFS (master node and storage server), NFS
I math/R, lang/perl5.20
I print/texlive-full
I biology/paml / Proteus
I biology/ncbi-blast+, Diamond, Humann, Qiime, Metaphlan,Biomenet, BioMico, BMTagger, PEAR
![Page 30: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/30.jpg)
Softwaresecurity/tmux-cssh
![Page 31: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/31.jpg)
Softwaresysutils/ganglia-monitor-core sysutils/ganglia-webfrontend
![Page 32: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/32.jpg)
Softwaresysutils/ganglia-monitor-core sysutils/ganglia-webfrontend
![Page 33: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/33.jpg)
SoftwareBasic Unix Tools
![Page 34: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/34.jpg)
SoftwareBatch Submission, Resource Management with Sun Grid Engine
![Page 35: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/35.jpg)
SoftwareBatch Submission, Resource Management with Sun Grid Engine
Distribution of Port Makefile Size
Number of Lines
Fre
quen
cy
0 200 400 600 800 1000
020
0040
0060
0080
0010
000
1200
0
sysutils/sge62 (257)
![Page 36: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/36.jpg)
Softwaresysutils/slurm-hpc (slurm-wlm)
I Despite the name, Simple Linux Utility for ResourceManagement (Slurm): “Portability: Written in C with a GNUautoconf configuration engine. While initially written forLinux, Slurm has been ported to a diverse assortment ofsystems.”
I “Sequoia, an IBM BlueGene/Q system at LawrenceLivermore National Laboratory with 1.6 petabytes ofmemory, 96 racks, 98,304 compute nodes, and 1.6 millioncores, with a peak performance of over 17.17 Petaflops.”
![Page 37: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/37.jpg)
SoftwarePorting Software Written by Biologists
![Page 38: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/38.jpg)
Other Cluster Options
![Page 39: Molecular Evolution, Genomic Analysis and FreeBSD · The Central Dogma of Biology I DNA is made up of four nucleotides with four different bases: Adenine (A), Cytosine (C), Guanine](https://reader034.vdocuments.mx/reader034/viewer/2022050608/5faf549f807ef10dd7232c9b/html5/thumbnails/39.jpg)
Thank You
Questions?
Image credits:Arms race: www.inkcintc.com.au
HIV images: http://evolution.berkeley.edu/evolibrary/article/medicine_04