adams shmoocon hack the genome
Post on 05-Apr-2018
229 Views
Preview:
TRANSCRIPT
-
8/2/2019 Adams ShmooCon Hack the Genome
1/49
Washington, DC
February 2009
ShmooCon Presentation
Hack the Genome!The Age of Bimolecular Cryptology
-
8/2/2019 Adams ShmooCon Hack the Genome
2/49
Presentation Agenda
Background and Inspiration
Biology Lesson
DNA as Data
The Toolbox
An Example
Interlude on DNA Computing
How to get started
-
8/2/2019 Adams ShmooCon Hack the Genome
3/49
Background and Inspiration
Biology Lesson
DNA as Data
The Toolbox
An Example
Interlude on DNA Computing
How to get started
-
8/2/2019 Adams ShmooCon Hack the Genome
4/49
The Moon Shot of our Generation
Every generation has its signature big
science projects
Ours are the collection and analysis of
the genomes of a whole range of
organisms Being able to do so can provide a
platform for solving some of the key
problems of our age
and
photo: NASA
-
8/2/2019 Adams ShmooCon Hack the Genome
5/49
Hackers and the genome
The analysis of the genome
data is a deeply collaborative
activity
Never before has a bigscience project been so
open to widespread
participation
Photo by bre pettis: http://www.flickr.com/photos/bre/3230258762/
-
8/2/2019 Adams ShmooCon Hack the Genome
6/49
Reverse engineering a genome hack the
protocols
Cell-Cell Signaling
Intracellular Pathways
Genome DNA Sequences
Network protocols
Device wiring
Design Drawings
-
8/2/2019 Adams ShmooCon Hack the Genome
7/49
Collaboration outside of the life sciences
has contributed a lot to theoretical biology
+
-
8/2/2019 Adams ShmooCon Hack the Genome
8/49
Why? Emerging Diseases.
Better antibiotics for new diseases require
new understanding
Compare the genomes of people with
the microbial genomes
Differences can be exploited with new or
existing pharmaceuticals
wikimedia
-
8/2/2019 Adams ShmooCon Hack the Genome
9/49
Why? Energy.
Genomes of microorganisms
More efficient production of ethanol,
methane or even butanol
Genomes of algae
Increase in oil production
Improve yield
Produce ethanol or butanol directly?
even engineer for light production?
photo by tochis: http://tochismochis.blogspot.com/
-
8/2/2019 Adams ShmooCon Hack the Genome
10/49
Why? Cancer.
Finding Better Anti-Cancer Drugs
Comparing the mutations in cancer to
healthy cells
Investigating the outcomes of
treatments
Correlating normally-occurring
differences in genes with risk
wikimedia
-
8/2/2019 Adams ShmooCon Hack the Genome
11/49
Why? Food.
Improve plant resistance to
drought, pathogens, heat
Improve yield
Adapt for different growingconditions
photo by Scott Kinmartin http://www.scottkinmartin.com/
-
8/2/2019 Adams ShmooCon Hack the Genome
12/49
Finally Sequenced DNA
http://www.cultivate-int.org/issue8/oriel/index.h
-
8/2/2019 Adams ShmooCon Hack the Genome
13/49
Why?
We are obtaining biological data at a steadily-increasing rate
an exponential curve steeply tilting to vertical
Converting that data to usable information is a process that is
proceeding, albeit not completely keeping up with its
acquisition Leveraging all that information to create knowledge is an open
challenge, lagging far behind our rate of data collection
Unique opportunities abound in creating an environment that
enables true understanding of this rich sea of data
-
8/2/2019 Adams ShmooCon Hack the Genome
14/49
Background and Inspiration
Biology Lesson
DNA as Data
The Toolbox
An Example Interlude on DNA Computing
How to get started
-
8/2/2019 Adams ShmooCon Hack the Genome
15/49
Biology in a nutshell
What follows is the fastest and most incomplete biology lesson
ever.
If you know all of this already, bear with me
If you dont, hold on
-
8/2/2019 Adams ShmooCon Hack the Genome
16/49
Biological Information:
From: http://www.cerezyme.com/patient/about/cz_pt_about-understanding-1.gif
Cells removed from apatient by a physician
Patient Visit
-
8/2/2019 Adams ShmooCon Hack the Genome
17/49
Your DNA and you...
Your DNA: Contains the entire plan (more
or less) for you
Compressed, about the same
size as a DVD (3.2e9 bases, in
1 ASCII byte per base) Differs about 1 every thousand
bases from the next person
-
8/2/2019 Adams ShmooCon Hack the Genome
18/49
DNA contains coded information at many
layers...
http://www.swbic.org/education/comp-bio/images/1c.gihttp://en.wikipedia.org/wiki/Protein_structure
http://www.swbic.org/education/comp-bio/images/1c.gifhttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://www.swbic.org/education/comp-bio/images/1c.gif -
8/2/2019 Adams ShmooCon Hack the Genome
19/49
... leading to fully formed and operational
structures
http://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structure -
8/2/2019 Adams ShmooCon Hack the Genome
20/49
Background and Inspiration
Biology Lesson
DNA as Data
The Toolbox
An Example Interlude on DNA Computing
How to get started
http://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structure -
8/2/2019 Adams ShmooCon Hack the Genome
21/49
How can we turn this biology into a
computable form?
http://bioinformatics.ubc.ca/about/what_is_bioinformatics/
http://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structure -
8/2/2019 Adams ShmooCon Hack the Genome
22/49
Like music, DNA is a structured and layered
string of data...
1 aacatacctg ctttatgcac tcaagcagag aagaaatcca caagtactca ccagcctcct61 ggtctgcaga gaagacagaa tcaatatgag cacagcagga aaagtaagca aaaaatatat121 tactgttgat aatatattct cttcaatata acaaataaag gaatacagta ttaataatga181 ataatttaaa atttaaaatt catatctaaa ttagatgatg aataatttaa aattcatatc241 taaattagaa atgatacact ggaatatgat atgcaatata tgctataata tgtaatgtat301 actgaactac agtggaaata agctattcct aaataccttc aaaaagaata tatagaatct361 gtatctattg tctttatttc ctacattaaa taaatttgcc gtaaagtgat agtttattcc421 aagctaatca tgactgattt gtaaacctaa agttagaaaa gtctttaatc agaagctatc481 tttatattaa gaaagcataa tttaaatgtg ttattatttg tattcatatt ttttgtgaac541 acaagaagtc tgacaaaact tttatgagag ggatttgaga agattttgaa atatactttt601 aatcttacta taaaatatct tacaaaatac tcattgatct cacagcatta gaatcatcaa661 ggttaagcaa gacatcacat tcaaattccg tttaaagggg gcccattatg acacaattca721 ggcaatttcc acagaaatct tatggaacag tatctcccct atataaaagt caatatgatc
781 ttacagaaaa ataataatgc aatttgaatc acttattagc actcagaaca caaatatttg841 ttttttcttc tataaattta tacttatttt tcaatgtgtt tacaggtgca cagaaatgca901 tgtggtcatt caatataatc aattgatatt attaattgcc taatttaaaa aaatctgtgc961 aactatttcc agccatttgt tgtgctagga gtgtatcaca caataaaaca cctcactatg
1021 ataattcagt ttaaaggttc tgaggcttac ctttatgctg tgcgacaaaa caggctcatg1081 tcaataagac tggttggaaa tcacatgagt ggcccattgg tactgttctt acaccacttc1141 actttacttt actttcattc attattgatt aatatttaca ttcctcatag aaaataatta1201 gaaaaaagaa aatttaaatt taccatttac taaactcgac ttaaaagaaa taatgagttc1261 atagagcaaa agtataaacc aatcattaat gaaaataata actgatgaaa tagataatcc1321 tcccctcttg agtgcaacat caataactta gctttttgac agcatttcat ttatgtttac1381 ccgtcctgca ttttattttc ctcaatccta aattgtgaca atactaatgt ctatttcata1441 aggtagttgt gagaattcag taaattaata gtgaaaagca cttagaatag tacctggtaa1501 ataaaaataa gtcaataaat attagccact gttattattg ttgctttata actttttgat1561 atttactacc acggagtaca gaaaacgtga ggctacatta attttttcat tcgttttttt
1621 gtttggagat ggagtatctt tctattgccc aggttggagt gcaatggtga tctcggctca1681 ctgcaacctc tgcctcccgg gttcaagtga ttctcctgcc tcagcctcac aagtagctga1741 gtttacaggt acacgccacc atgcttggct aattttagca tttttagtag agacagggtt1801 tcaccatgtt ggtcaggctg gtctccaact cttgacttca ggtgatccgc ccacctcggc1861 ctcccaaagt gctgggatta aagggatggg caactgcacc cggcctgatc tttattctct1921 ggacagccag ctttgagact tcaggaaaat tattcaatca ctgagtcagt tgatcctcaa1981 ttatttcaga tgtagtaaga ccaataattc aatagtactg tcctggtagc atccgtttta2041 ggtttaaaag taattcatat tgtttacagc agcacaattt gcaattgcaa aaatatggaa2101 acttcctaaa tgcccatcaa ccaacgagtg gataaagaga atgtggtata tgtacaccat2161 gagatactac tcagccataa aatggaacaa aataatggcc tttgcagcca cttggatgaa2221 gctgaaggcc attattttaa gtgaagtaac tcaggaatgg aaaaccaaat accatatggt
http://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structure -
8/2/2019 Adams ShmooCon Hack the Genome
23/49
Not too different
2800 opt3
2800 A2 41 ldx#65
2802 8A .l txa
2803 20 EE FF jsr&FFEE
2806 E8 inx2807 E0 5B cpx#91
2809 D0 F7 bnel
280B A9 0D lda#13
280D 20 EE FF jsr&FFEE
2810 A9 0A lda#10
2812 4C EE FF jmp&FFEE
ABCDEFGHIJKLMNOPQRSTUVWXYZ
24839539 atg #start (Met)
24839542 gcg Arg
24839545 aac Thr
24839548 gag Asn
24839548 aggt #splice_st. . .
24843113 cagg #splice_end
24843117 gct Ala
. . .
24843113 ata Ile
24843113 taa #end
Alcohol Dehydrogenase
http://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structurehttp://en.wikipedia.org/wiki/Protein_structure -
8/2/2019 Adams ShmooCon Hack the Genome
24/49
But the data looks more like this:
CDS 81..1205/gene="ADH5"/gene_synonym="FDH"/gene_synonym="ADHX"/gene_synonym="ADH-3"/gene_synonym="GSNOR"/EC_number="1.1.1.1"
/codon_start=1/product="alcohol dehydrogenase 5, chi polypeptide" /
protein_id="NP_000662.3" /db_xref="GI:71565154" /db_xref="GeneID:128"/db_xref="HGNC:253" /db_xref="HPRD:00064" /db_xref="MIM:103710"/translation="MANEVIKCKAAVAWEAGKPLSIEEIEVAPPKAHEVRIKIIATAVCHTDAYTLSGADPEGCFPVILGHEGAGIVESVGEGVTKLKAGDTVIPLYIPQCGECKFCLNPKTNLCQKIRVTQGKGLMPDGTSRFTCKGKTILHYMGTSTFSEYTVVADISVAKI
DPLAPLDKVCLLGCGISTGYGAAVNTAKLEPGSVCAVFGLGGVGLAVIMGCKVAGASRIIGVDINKDKFARAKEFGATECINPQDFSKPIQEVLIEMTDGGVDYSFECIGNVKVMRAALEACHKGWGVSVVVGVAASGEEIATRPFQLVTGRTWKGTAFGGWKSVESVPKLVSEYMSKKIKVDEFVTHNLSFDEINKAFELMHSGKSIRTVVKI"
http://en.wikipedia.org/wiki/Protein_structurehttp://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=71565153&from=81&to=1205&view=gbwithpartshttp://www.expasy.org/cgi-bin/nicezyme.pl?1.1.1.1http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NP_000662.3http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=full_report&list_uids=128http://www.genenames.org/data/hgnc_data.php?hgnc_id=253http://www.hprd.org/protein/00064http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=103710http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=103710http://www.hprd.org/protein/00064http://www.genenames.org/data/hgnc_data.php?hgnc_id=253http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=full_report&list_uids=128http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NP_000662.3http://www.expasy.org/cgi-bin/nicezyme.pl?1.1.1.1http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=71565153&from=81&to=1205&view=gbwithparts -
8/2/2019 Adams ShmooCon Hack the Genome
25/49
And in XML:
71565153NM_000671.3
9606Homo sapiensHomo sapiens alcohol dehydrogenase 5 (class III), chi
polypeptide (ADH5), mRNA2644
GCGCTCGCCACGCCCATGCCTCCGTCGCTGCGCGGCCCACCCCGGATGTCAGCCCCCC
GCGCCGACCAGACTGCAGTTGCTTGGGAGGCTGGAAAGCCTCTCTCCATAGAGGAGATAGAG
-
8/2/2019 Adams ShmooCon Hack the Genome
26/49
Background and Inspiration
Biology Lesson
DNA as Data
The Toolbox
An Example Interlude on DNA Computing
How to get started
-
8/2/2019 Adams ShmooCon Hack the Genome
27/49
What about access to tools and data?
You need a couple of things to get started:
Data like DNA sequences, Protein
Structures, and Biological outcomes are
needed for the analysis
Tools for managing, analyzing and
visualizing that data:
Everything from databases to statisticssoftware.
Photo by mandolux: http://www.flickr.com/photos/mandolux/438145176/
http://www.ncbi.nlm.nih.gov/sites/entrez?db=nuccor
-
8/2/2019 Adams ShmooCon Hack the Genome
28/49
Some sources: Entrez
http://genome.ucsc.edu/cgi-bin/hgTracks?org=huma
-
8/2/2019 Adams ShmooCon Hack the Genome
29/49
Some sources: UCSC Golden Path
http://www.rcsb.org/pdb/home/home.d
-
8/2/2019 Adams ShmooCon Hack the Genome
30/49
Some Sources: PDB
-
8/2/2019 Adams ShmooCon Hack the Genome
31/49
Some tools are extensions of conventional
programming languages:
-
8/2/2019 Adams ShmooCon Hack the Genome
32/49
Some tools are web-based:
-
8/2/2019 Adams ShmooCon Hack the Genome
33/49
And some are local applications:
-
8/2/2019 Adams ShmooCon Hack the Genome
34/49
Even tools for visualizing 3D protein
structures are available
http://www.umass.edu/molvis/workshop/imgs/polyview.htm -
8/2/2019 Adams ShmooCon Hack the Genome
35/49
Background and Inspiration
Biology Lesson
DNA as Data
The Toolbox
An Example Interlude on DNA Computing
How to get started
-
8/2/2019 Adams ShmooCon Hack the Genome
36/49
An example of what you can do: a man-in-
the middle attack on DNA control?
http://en.wikipedia.org/wiki/File:Lac_operon.p
photo by: Ben Stanfield http://flickr.com/photos/acaben/2816139/
D hP l k f i
http://flickr.com/photos/acaben/2816139/http://flickr.com/photos/acaben/2816139/ -
8/2/2019 Adams ShmooCon Hack the Genome
37/49
DashPat look for common, recurring
words and their placement
ibl id tif i d th t
-
8/2/2019 Adams ShmooCon Hack the Genome
38/49
possibly identifying words that are
statistically under- and over- represented
Can be associated with genetic
switches upstream of known
genes in a genome
Correlate the identified genes in
a known pathway associated
with metabolism
Next step would be to test the
hypothesis in a laboratory
-
8/2/2019 Adams ShmooCon Hack the Genome
39/49
Background and Inspiration
Biology Lesson
DNA as Data
The Toolbox
An Example Interlude on DNA Computing
How to get started
W ki th th t I t l d
-
8/2/2019 Adams ShmooCon Hack the Genome
40/49
Working the other way, too- Interlude on
DNA computing
Translating the problem space to biological
-
8/2/2019 Adams ShmooCon Hack the Genome
41/49
Translating the problem space to biological
molecules and processes
shows that computing can be done with
-
8/2/2019 Adams ShmooCon Hack the Genome
42/49
shows that computing can be done with
DNA
Not limited to science alone there are
-
8/2/2019 Adams ShmooCon Hack the Genome
43/49
o ed o sc e ce a o e e e a e
interesting artistic opportunities, too!
Custom pythonprograms
Rhythm MIDIFile
Melody MIDIFile
Harmony MIDIFile
Retrieve Sequence(ncicb ENTREZ)
RosegardenSequencer
Audacity
Recorder
Python-MIDI
-
8/2/2019 Adams ShmooCon Hack the Genome
44/49
Background and Inspiration
Biology Lesson
DNA as Data
The Toolbox
An Example Interlude on DNA Computing
How to get started
-
8/2/2019 Adams ShmooCon Hack the Genome
45/49
Sound interesting? How to get started
Find a buddy a biology expert to help you understand the
underlying science and highlight interesting/important
problems
Take some classes everything from basic biology to
advanced bioinformatics are available online, often by major
players in the field and often free Communicate with your peers there are all kinds of
opportunities to work with others in bioinformatics- from the
conventional (publish and present) to virtual activities like IRC,
forums, social networking, etc.
Stay current with the research many journals are availablefor reading and review, some of the best are open and freely
available
-
8/2/2019 Adams ShmooCon Hack the Genome
46/49
A sampling of online classes:
MIT OpenCourseware Examples(See: http://ocw.mit.edu/OcwWeb/web/courses/courses/index.htm)
6.092 Bioinformatics and Proteomics
6.895 / 6.095J Computational Biology:
Genomes, Networks, Evolution
6.096 Algorithms for Computational
Biology
Stanford Bioinformatics Online
(See: http://motif.stanford.edu/courses.html/)
Biochem 218 Computational Molecular
Biology
Biochem 238 Computational Proteomics Biochem 228 Computational Genomic
Biology
Some (virtual) places to meet and
-
8/2/2019 Adams ShmooCon Hack the Genome
47/49
Some (virtual) places to meet and
communicate
-
8/2/2019 Adams ShmooCon Hack the Genome
48/49
Stay Current with journal articles
PubMed (and freely-available PubMed
Central articles)
Freely-available journals at Public
Library of Science and elsewhere
Your local University library
photo by sifter: http://www.flickr.com/photos/sifter/370775225/
-
8/2/2019 Adams ShmooCon Hack the Genome
49/49
Now
Get out there and hack the genome!
top related