2014 10-15-nextbug edinburgh
TRANSCRIPT
![Page 1: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/1.jpg)
@yannick__ http://yannick.poulet.org
Social insect evolution: genomics opportunities
& approaches
2014-10-15-NextBUG
![Page 2: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/2.jpg)
![Page 3: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/3.jpg)
© Alex Wild & others
![Page 4: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/4.jpg)
![Page 5: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/5.jpg)
© National Geographic
Atta leaf-cutter ants
![Page 6: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/6.jpg)
© National Geographic
Atta leaf-cutter ants
![Page 7: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/7.jpg)
© National Geographic
Atta leaf-cutter ants
![Page 8: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/8.jpg)
![Page 9: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/9.jpg)
Oecophylla Weaver ants
© ameisenforum.de
![Page 10: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/10.jpg)
© ameisenforum.de
Fourmis tisserandes
![Page 11: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/11.jpg)
© ameisenforum.de
Oecophylla Weaver ants
![Page 12: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/12.jpg)
© forestryimages.org© wynnie@flickr
![Page 13: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/13.jpg)
Tofilski et al 2008
Forelius pusillus
![Page 14: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/14.jpg)
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
![Page 15: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/15.jpg)
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
![Page 16: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/16.jpg)
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
![Page 17: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/17.jpg)
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
![Page 18: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/18.jpg)
Avant
Workers staying outside die« preventive self-sacrifice »
Tofilski et al 2008
Forelius pusillus hides the nest entrance at night
![Page 19: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/19.jpg)
Dorylus driver ants: ants with no home
© BBC
![Page 20: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/20.jpg)
© Dirk Mezger
Ritualized fighting
© Carsten BrühlCamponotus gigas Pfeiffer & Linsenmair 2001
![Page 21: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/21.jpg)
Army ant milling - “spiral of death”
![Page 22: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/22.jpg)
Animal biomass (Brazilian rainforest)
from Fittkau & Klinge 1973
Other insects 49.6
Amphibians 2.8
Reptiles 3.7
Birds 5.3
Mammals 14.5
!Earthworms
17.3
!!
Spiders 4.7
Soil fauna excluding earthworms,
ants & termites 148
Ants & termites 114
![Page 23: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/23.jpg)
![Page 24: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/24.jpg)
Well-studied:
• behavior
• morphology
• evolutionary context
• ecology
![Page 25: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/25.jpg)
This changes everything.454
Illumina Solid...
Any lab can sequence anything!
![Page 26: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/26.jpg)
Major research areasGenes/mechanisms for evolution of
social behavior?
![Page 27: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/27.jpg)
www.sciencemag.org SCIENCE VOL 331 25 FEBRUARY 2011 1067
REPORTS
on
Mar
ch 1
2, 2
013
ww
w.s
cien
cem
ag.o
rgD
ownl
oade
d fro
m
Solenopsis invicta fire ants are a big problem!very well studied!
Ascunce et al 2011
![Page 28: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/28.jpg)
Solenopsis invicta fire ant: two social forms
!
•1 large queen •Independent founding •Highly territorial •Many sizes of workers
!
•2-100 smaller queens •Dependent founding •No inter-colony aggression •All workers similar size
Single-queen form: Multiple-queen form:
![Page 29: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/29.jpg)
Fire ants+
Population genetics: Allozyme screen
Ken Ross L. Keller
“starch gel”+
1 2 3=> “Gp-9” locus associated to social form
![Page 30: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/30.jpg)
![Page 31: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/31.jpg)
Single queen form Multiple queen form
Ken Ross and colleagues Laurent Keller and colleagues
Social form completely associated to Gp-9 locus
![Page 32: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/32.jpg)
bbbbBB BB Bb bb
Ken Ross and colleagues Laurent Keller and colleagues
Single queen form Multiple queen form
Social form completely associated to Gp-9 locus
(>15% ) (< 5% )
![Page 33: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/33.jpg)
bbBB BB Bb
x
Gp-9 bb females rareKen Ross and colleagues
Laurent Keller and colleagues
Single queen form Multiple queen form
Social form completely associated to Gp-9 locus
(>15% ) (< 5% )
![Page 34: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/34.jpg)
BB BB Bb
Ken Ross and colleagues Laurent Keller and colleagues
Single queen form Multiple queen form
Social form completely associated to Gp-9 locus
(>15% ) (< 5% )
![Page 35: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/35.jpg)
BB BB Bb
xKen Ross and colleagues
Laurent Keller and colleagues
Single queen form Multiple queen form
Social form completely associated to Gp-9 locus
(>15% ) (< 5% )
![Page 36: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/36.jpg)
BB BB Bb
x xKen Ross and colleagues
Laurent Keller and colleagues
Social form completely associated to Gp-9 locus
Single queen form Multiple queen form(>15% ) (< 5% )
![Page 37: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/37.jpg)
BB BB Bb
x x xKen Ross and colleagues
Laurent Keller and colleagues
Single queen form Multiple queen form(>15% ) (< 5% )
Social form completely associated to Gp-9 locus
![Page 38: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/38.jpg)
Sex chromosomes
X Y
Gp-9 B
Gp-9 b
SB Sb
“Social chromosomes”
?
Wang et al Nature 2013
![Page 39: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/39.jpg)
Major research areas
Genes/mechanisms for differences (e.g., lifespan?)?
Genes/mechanisms for evolution of social behavior?
genome evolution social evolution
![Page 40: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/40.jpg)
![Page 41: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/41.jpg)
![Page 42: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/42.jpg)
This changes everything.454
Illumina Solid...
Any lab can sequence anything!
![Page 43: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/43.jpg)
Genomics is hard.
![Page 44: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/44.jpg)
• Biology/life is complex • Field is young. • Biologists lack computational training. • Generally, analysis tools suck.
• badly written • badly tested • hard to install • output quality… often questionable.
• Understanding/visualizing/massaging data is hard. • Datasets continue to grow!
Genomics is hard.
![Page 45: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/45.jpg)
Inspiration?
![Page 46: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/46.jpg)
![Page 47: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/47.jpg)
arX
iv:1
210.
0530
v3 [
cs.M
S] 2
9 N
ov 2
012
Best Practices for Scientific ComputingGreg Wilson ∗, D.A. Aruliah †, C. Titus Brown ‡, Neil P. Chue Hong §, Matt Davis ¶, Richard T. Guy ∥,Steven H.D. Haddock ∗∗, Katy Huff ††, Ian M. Mitchell ‡‡, Mark D. Plumbley §§, Ben Waugh ¶¶,Ethan P. White ∗∗∗, Paul Wilson †††
∗Software Carpentry ([email protected]),†University of Ontario Institute of Technology ([email protected]),‡MichiganState University ([email protected]),§Software Sustainability Institute ([email protected]),¶Space Telescope Science Institute([email protected]),∥University of Toronto ([email protected]),∗∗Monterey Bay Aquarium Research Institute([email protected]),††University of Wisconsin ([email protected]),‡‡University of British Columbia ([email protected]),§§QueenMary University of London ([email protected]),¶¶University College London ([email protected]),∗∗∗Utah StateUniversity ([email protected]), and †††University of Wisconsin ([email protected])
Scientists spend an increasing amount of time building and usingsoftware. However, most scientists are never taught how to do thisefficiently. As a result, many are unaware of tools and practices thatwould allow them to write more reliable and maintainable code withless effort. We describe a set of best practices for scientific softwaredevelopment that have solid foundations in research and experience,and that improve scientists’ productivity and the reliability of theirsoftware.
Software is as important to modern scientific research astelescopes and test tubes. From groups that work exclusivelyon computational problems, to traditional laboratory and fieldscientists, more and more of the daily operation of science re-volves around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsof data that are generated in single research projects, andcombining disparate datasets to assess synthetic problems.
Scientists typically develop their own software for thesepurposes because doing so requires substantial domain-specificknowledge. As a result, recent studies have found that scien-tists typically spend 30% or more of their time developingsoftware [19, 52]. However, 90% or more of them are primar-ily self-taught [19, 52], and therefore lack exposure to basicsoftware development practices such as writing maintainablecode, using version control and issue trackers, code reviews,unit testing, and task automation.
We believe that software is just another kind of experi-mental apparatus [63] and should be built, checked, and usedas carefully as any physical apparatus. However, while mostscientists are careful to validate their laboratory and fieldequipment, most do not know how reliable their software is[21, 20]. This can lead to serious errors impacting the cen-tral conclusions of published research [43]: recent high-profileretractions, technical comments, and corrections because oferrors in computational methods include papers in Science[6], PNAS [39], the Journal of Molecular Biology [5], EcologyLetters [37, 8], the Journal of Mammalogy [33], and Hyper-tension [26].
In addition, because software is often used for more than asingle project, and is often reused by other scientists, comput-ing errors can have disproportional impacts on the scientificprocess. This type of cascading impact caused several promi-nent retractions when an error from another group’s code wasnot discovered until after publication [43]. As with bench ex-periments, not everything must be done to the most exactingstandards; however, scientists need to be aware of best prac-tices both to improve their own approaches and for reviewingcomputational work by others.
This paper describes a set of practices that are easy toadopt and have proven effective in many research settings.Our recommendations are based on several decades of collec-tive experience both building scientific software and teach-ing computing to scientists [1, 65], reports from many othergroups [22, 29, 30, 35, 41, 50, 51], guidelines for commercial
and open source software development [61, 14], and on empir-ical studies of scientific computing [4, 31, 59, 57] and softwaredevelopment in general (summarized in [48]). None of thesepractices will guarantee efficient, error-free software develop-ment, but used in concert they will reduce the number oferrors in scientific software, make it easier to reuse, and savethe authors of the software time and effort that can used forfocusing on the underlying scientific questions.
1. Write programs for people, not computers.Scientists writing software need to write code that both exe-cutes correctly and can be easily read and understood by otherprogrammers (especially the author’s future self). If softwarecannot be easily read and understood it is much more difficultto know that it is actually doing what it is intended to do. Tobe productive, software developers must therefore take severalaspects of human cognition into account: in particular, thathuman working memory is limited, human pattern matchingabilities are finely tuned, and human attention span is short[2, 23, 38, 3, 55].
First, a program should not require its readers to hold morethan a handful of facts in memory at once (1.1). Human work-ing memory can hold only a handful of items at a time, whereeach item is either a single fact or a “chunk” aggregating sev-eral facts [2, 23], so programs should limit the total number ofitems to be remembered to accomplish a task. The primaryway to accomplish this is to break programs up into easilyunderstood functions, each of which conducts a single, easilyunderstood, task. This serves to make each piece of the pro-gram easier to understand in the same way that breaking up ascientific paper using sections and paragraphs makes it easierto read. For example, a function to calculate the area of arectangle can be written to take four separate coordinates:def rect_area(x1, y1, x2, y2):
...calculation...
or to take two points:def rect_area(point1, point2):
...calculation...
The latter function is significantly easier for people to readand remember, while the former is likely to lead to errors, not
Reserved for Publication Footnotes
1–7
arX
iv:1
210.
0530
v3 [
cs.M
S] 2
9 N
ov 2
012
Best Practices for Scientific ComputingGreg Wilson ∗, D.A. Aruliah †, C. Titus Brown ‡, Neil P. Chue Hong §, Matt Davis ¶, Richard T. Guy ∥,Steven H.D. Haddock ∗∗, Katy Huff ††, Ian M. Mitchell ‡‡, Mark D. Plumbley §§, Ben Waugh ¶¶,Ethan P. White ∗∗∗, Paul Wilson †††
∗Software Carpentry ([email protected]),†University of Ontario Institute of Technology ([email protected]),‡MichiganState University ([email protected]),§Software Sustainability Institute ([email protected]),¶Space Telescope Science Institute([email protected]),∥University of Toronto ([email protected]),∗∗Monterey Bay Aquarium Research Institute([email protected]),††University of Wisconsin ([email protected]),‡‡University of British Columbia ([email protected]),§§QueenMary University of London ([email protected]),¶¶University College London ([email protected]),∗∗∗Utah StateUniversity ([email protected]), and †††University of Wisconsin ([email protected])
Scientists spend an increasing amount of time building and usingsoftware. However, most scientists are never taught how to do thisefficiently. As a result, many are unaware of tools and practices thatwould allow them to write more reliable and maintainable code withless effort. We describe a set of best practices for scientific softwaredevelopment that have solid foundations in research and experience,and that improve scientists’ productivity and the reliability of theirsoftware.
Software is as important to modern scientific research astelescopes and test tubes. From groups that work exclusivelyon computational problems, to traditional laboratory and fieldscientists, more and more of the daily operation of science re-volves around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsof data that are generated in single research projects, andcombining disparate datasets to assess synthetic problems.
Scientists typically develop their own software for thesepurposes because doing so requires substantial domain-specificknowledge. As a result, recent studies have found that scien-tists typically spend 30% or more of their time developingsoftware [19, 52]. However, 90% or more of them are primar-ily self-taught [19, 52], and therefore lack exposure to basicsoftware development practices such as writing maintainablecode, using version control and issue trackers, code reviews,unit testing, and task automation.
We believe that software is just another kind of experi-mental apparatus [63] and should be built, checked, and usedas carefully as any physical apparatus. However, while mostscientists are careful to validate their laboratory and fieldequipment, most do not know how reliable their software is[21, 20]. This can lead to serious errors impacting the cen-tral conclusions of published research [43]: recent high-profileretractions, technical comments, and corrections because oferrors in computational methods include papers in Science[6], PNAS [39], the Journal of Molecular Biology [5], EcologyLetters [37, 8], the Journal of Mammalogy [33], and Hyper-tension [26].
In addition, because software is often used for more than asingle project, and is often reused by other scientists, comput-ing errors can have disproportional impacts on the scientificprocess. This type of cascading impact caused several promi-nent retractions when an error from another group’s code wasnot discovered until after publication [43]. As with bench ex-periments, not everything must be done to the most exactingstandards; however, scientists need to be aware of best prac-tices both to improve their own approaches and for reviewingcomputational work by others.
This paper describes a set of practices that are easy toadopt and have proven effective in many research settings.Our recommendations are based on several decades of collec-tive experience both building scientific software and teach-ing computing to scientists [1, 65], reports from many othergroups [22, 29, 30, 35, 41, 50, 51], guidelines for commercial
and open source software development [61, 14], and on empir-ical studies of scientific computing [4, 31, 59, 57] and softwaredevelopment in general (summarized in [48]). None of thesepractices will guarantee efficient, error-free software develop-ment, but used in concert they will reduce the number oferrors in scientific software, make it easier to reuse, and savethe authors of the software time and effort that can used forfocusing on the underlying scientific questions.
1. Write programs for people, not computers.Scientists writing software need to write code that both exe-cutes correctly and can be easily read and understood by otherprogrammers (especially the author’s future self). If softwarecannot be easily read and understood it is much more difficultto know that it is actually doing what it is intended to do. Tobe productive, software developers must therefore take severalaspects of human cognition into account: in particular, thathuman working memory is limited, human pattern matchingabilities are finely tuned, and human attention span is short[2, 23, 38, 3, 55].
First, a program should not require its readers to hold morethan a handful of facts in memory at once (1.1). Human work-ing memory can hold only a handful of items at a time, whereeach item is either a single fact or a “chunk” aggregating sev-eral facts [2, 23], so programs should limit the total number ofitems to be remembered to accomplish a task. The primaryway to accomplish this is to break programs up into easilyunderstood functions, each of which conducts a single, easilyunderstood, task. This serves to make each piece of the pro-gram easier to understand in the same way that breaking up ascientific paper using sections and paragraphs makes it easierto read. For example, a function to calculate the area of arectangle can be written to take four separate coordinates:def rect_area(x1, y1, x2, y2):
...calculation...
or to take two points:def rect_area(point1, point2):
...calculation...
The latter function is significantly easier for people to readand remember, while the former is likely to lead to errors, not
Reserved for Publication Footnotes
1–7
arX
iv:1
210.
0530
v3 [
cs.M
S] 2
9 N
ov 2
012
Best Practices for Scientific ComputingGreg Wilson ∗, D.A. Aruliah †, C. Titus Brown ‡, Neil P. Chue Hong §, Matt Davis ¶, Richard T. Guy ∥,Steven H.D. Haddock ∗∗, Katy Huff ††, Ian M. Mitchell ‡‡, Mark D. Plumbley §§, Ben Waugh ¶¶,Ethan P. White ∗∗∗, Paul Wilson †††
∗Software Carpentry ([email protected]),†University of Ontario Institute of Technology ([email protected]),‡MichiganState University ([email protected]),§Software Sustainability Institute ([email protected]),¶Space Telescope Science Institute([email protected]),∥University of Toronto ([email protected]),∗∗Monterey Bay Aquarium Research Institute([email protected]),††University of Wisconsin ([email protected]),‡‡University of British Columbia ([email protected]),§§QueenMary University of London ([email protected]),¶¶University College London ([email protected]),∗∗∗Utah StateUniversity ([email protected]), and †††University of Wisconsin ([email protected])
Scientists spend an increasing amount of time building and usingsoftware. However, most scientists are never taught how to do thisefficiently. As a result, many are unaware of tools and practices thatwould allow them to write more reliable and maintainable code withless effort. We describe a set of best practices for scientific softwaredevelopment that have solid foundations in research and experience,and that improve scientists’ productivity and the reliability of theirsoftware.
Software is as important to modern scientific research astelescopes and test tubes. From groups that work exclusivelyon computational problems, to traditional laboratory and fieldscientists, more and more of the daily operation of science re-volves around computers. This includes the development ofnew algorithms, managing and analyzing the large amountsof data that are generated in single research projects, andcombining disparate datasets to assess synthetic problems.
Scientists typically develop their own software for thesepurposes because doing so requires substantial domain-specificknowledge. As a result, recent studies have found that scien-tists typically spend 30% or more of their time developingsoftware [19, 52]. However, 90% or more of them are primar-ily self-taught [19, 52], and therefore lack exposure to basicsoftware development practices such as writing maintainablecode, using version control and issue trackers, code reviews,unit testing, and task automation.
We believe that software is just another kind of experi-mental apparatus [63] and should be built, checked, and usedas carefully as any physical apparatus. However, while mostscientists are careful to validate their laboratory and fieldequipment, most do not know how reliable their software is[21, 20]. This can lead to serious errors impacting the cen-tral conclusions of published research [43]: recent high-profileretractions, technical comments, and corrections because oferrors in computational methods include papers in Science[6], PNAS [39], the Journal of Molecular Biology [5], EcologyLetters [37, 8], the Journal of Mammalogy [33], and Hyper-tension [26].
In addition, because software is often used for more than asingle project, and is often reused by other scientists, comput-ing errors can have disproportional impacts on the scientificprocess. This type of cascading impact caused several promi-nent retractions when an error from another group’s code wasnot discovered until after publication [43]. As with bench ex-periments, not everything must be done to the most exactingstandards; however, scientists need to be aware of best prac-tices both to improve their own approaches and for reviewingcomputational work by others.
This paper describes a set of practices that are easy toadopt and have proven effective in many research settings.Our recommendations are based on several decades of collec-tive experience both building scientific software and teach-ing computing to scientists [1, 65], reports from many othergroups [22, 29, 30, 35, 41, 50, 51], guidelines for commercial
and open source software development [61, 14], and on empir-ical studies of scientific computing [4, 31, 59, 57] and softwaredevelopment in general (summarized in [48]). None of thesepractices will guarantee efficient, error-free software develop-ment, but used in concert they will reduce the number oferrors in scientific software, make it easier to reuse, and savethe authors of the software time and effort that can used forfocusing on the underlying scientific questions.
1. Write programs for people, not computers.Scientists writing software need to write code that both exe-cutes correctly and can be easily read and understood by otherprogrammers (especially the author’s future self). If softwarecannot be easily read and understood it is much more difficultto know that it is actually doing what it is intended to do. Tobe productive, software developers must therefore take severalaspects of human cognition into account: in particular, thathuman working memory is limited, human pattern matchingabilities are finely tuned, and human attention span is short[2, 23, 38, 3, 55].
First, a program should not require its readers to hold morethan a handful of facts in memory at once (1.1). Human work-ing memory can hold only a handful of items at a time, whereeach item is either a single fact or a “chunk” aggregating sev-eral facts [2, 23], so programs should limit the total number ofitems to be remembered to accomplish a task. The primaryway to accomplish this is to break programs up into easilyunderstood functions, each of which conducts a single, easilyunderstood, task. This serves to make each piece of the pro-gram easier to understand in the same way that breaking up ascientific paper using sections and paragraphs makes it easierto read. For example, a function to calculate the area of arectangle can be written to take four separate coordinates:def rect_area(x1, y1, x2, y2):
...calculation...
or to take two points:def rect_area(point1, point2):
...calculation...
The latter function is significantly easier for people to readand remember, while the former is likely to lead to errors, not
Reserved for Publication Footnotes
1–7
1. Write programs for people, not computers. 2. Automate repetitive tasks. 3. Use the computer to record history. 4. Make incremental changes. 5. Use version control. 6. Don’t repeat yourself (or others). 7. Plan for mistakes. 8. Optimize software only after it works correctly. 9. Document the design and purpose of code rather than its mechanics.!10. Conduct code reviews.
![Page 48: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/48.jpg)
![Page 49: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/49.jpg)
![Page 50: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/50.jpg)
Inspiration?
• Technologies
• Planning for mistakes
• Automated testing
• Continuous
• Writing for people: use style guide
![Page 51: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/51.jpg)
Code for people: Use a style guide• For R: http://r-pkgs.had.co.nz/style.html
![Page 52: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/52.jpg)
R style guide extract
![Page 53: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/53.jpg)
Coding for people: Indent your code!
Programming better
• variable naming
• coding width: 100 characters
• indenting
• Follow conventions -eg “Google R Style”
• Versioning: DropBox & http://github.com/
• Automated testing
• “being able to use understand and improve your code in 6 months & in 60 years” - approximate Damian Conway
preprocess_snps <- function(snp_table, testing=FALSE) { if (testing) { # run a bunch of tests of extreme situations. # quit if a test gives a weird result. } # real part of function. }
Friday, 22 June 12
![Page 54: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/54.jpg)
Line length Strive to limit your code to 80 characters per line. This fits comfortably on a printed page with a reasonably sized font. If you find yourself running out of room, this is a good indication that you should encapsulate some of the work in a separate function.
R style guide extract
!ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header=TRUE, sep='\t', col.names = c('colony', 'individual', 'headwidth', ‘mass'))
!ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header = TRUE, sep = '\t', col.names = c('colony', 'individual', 'headwidth', 'mass') )
!ant_measurements <- read.table(file = '~/Downloads/Web/ant_measurements.txt', header=TRUE, sep='\t', col.names = c('colony', 'individual', 'headwidth', ‘mass'))
![Page 55: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/55.jpg)
Code for people: Use a style guide• For R: http://r-pkgs.had.co.nz/style.html • For Ruby: https://github.com/bbatsov/ruby-style-guide
Automatically check your code:install.packages(“lint”) # once
library(lint) # everytime lint(“file_to_check.R”)
![Page 56: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/56.jpg)
![Page 57: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/57.jpg)
Four tools
![Page 58: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/58.jpg)
suck less. Four tools that
![Page 59: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/59.jpg)
Four tools
suck less. (hopefully)
Four tools that
![Page 60: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/60.jpg)
1. SequenceServer
![Page 61: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/61.jpg)
“Can you BLAST this for me?”
![Page 62: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/62.jpg)
• Once I wanted to set up a BLAST server.
Anurag Priyam, Mechanical engineering student, Kharagpur
Aim: An open source idiot-proof web-interface
for custom BLASTFriday, 22 June 12
Anurag Priyam, Mechanical engineering student, IIT Kharagpur
Sure, I can help you…
![Page 63: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/63.jpg)
“Can you BLAST this for me?”
Antgenomes.org SequenceServer BLAST made easy
(well, we’re trying...)
![Page 64: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/64.jpg)
http://www.sequenceserver.com/
(requires a BLAST+ install)
Do you have BLAST-formatted databases? If not: sequenceserver format-databases /path/to/fastas
1. Installinggem install sequenceserver
# ~/.sequenceserver.conf bin: ~/ncbi-blast-2.2.25+/bin/ database: /Users/me/blast_databases/
2. Configure.
sequenceserver ### Launched SequenceServer at: http://0.0.0.0:4567
3. Launch.
![Page 66: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/66.jpg)
“Can you BLAST this for me?”
Antgenomes.org SequenceServer BLAST made easy
(well, we’re trying...)
Web server :Anurag Priyam & Git community - http://sequenceserver.com
blast on 48-core 512gig fat machine
via ssh
![Page 67: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/67.jpg)
2. Bionode
![Page 68: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/68.jpg)
Module countsNode = “NPM”
![Page 69: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/69.jpg)
![Page 70: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/70.jpg)
Reusable, small and testedmodules
![Page 71: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/71.jpg)
ExamplesBASH
JavaScript
bionode.io (online shell)
bionode-ncbi urls assembly Solenopsis invicta | grep genomic.fna
http://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000188075.1_Si_gnG/ GCA_000188075.1_Si_gnG_genomic.fna.gz
bionode-ncbi download sra arthropoda | bionode-sra
bionode-ncbi download gff bacteria
var ncbi = require('bionode-ncbi') ncbi.urls('assembly', 'Solenopsis invicta'), gotData) function gotData(urls) { var genome = urls[0].genomic.fna download(genome) })
# Get descriptions for papers related to SRA search !bionode ncbi search sra Solenopsis invicta | tool-‐stream extractProperty uid | bionode ncbi link sra pubmed | tool-‐stream extractProperty destUID | bionode ncbi search pubmed !
![Page 72: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/72.jpg)
Difficulty writing scalable, reproducible andcomplex bioinformatic pipelines.Solution: Node.js everywhereStreams var ncbi = require('bionode-ncbi') var tool = require('tool-stream') var through = require('through2') var fork1 = through.obj() var fork2 = through.obj()
ncbi .search('sra', 'Solenopsis invicta') .pipe(fork1) .pipe(dat.reads)
fork1 .pipe(tool.extractProperty('expxml.Biosample.id')) .pipe(ncbi.search('biosample')) .pipe(dat.samples)
fork1 .pipe(tool.extractProperty('uid')) .pipe(ncbi.link('sra', 'pubmed'))
![Page 73: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/73.jpg)
![Page 74: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/74.jpg)
![Page 75: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/75.jpg)
Working with Gene predictions
![Page 76: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/76.jpg)
Gene predictionDozens of software algorithms: dozens of predictions
20% failure rate: •missing pieces •extra pieces •incorrect merging •incorrect splitting
Visual inspection... and manual fixing required.
1 gene = 5 minutes to 3 days
Yand
ell &
Enc
e 20
13 N
RG
GTCTACAATGCGATTGTAAAATAGCACGAgAGGTGCATATGATGAACGACTATGTTCCACAACCACAGCTCATATATAACATGATTTtGTTTGCCGAATTCATACACGCATTACAACACACATTGAATTCAATAATAATATCAAATTCACATTCAAAGCTTTCAAGTTAGACAAAAGTTTTAATGCCGTTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATTATGTTGAATaTTAGGGTTTTTATAAAGAATGTGTATATTGUTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTTATTTATTGTTCATTGTTTGTTCTTTATTTTGTTATTTGTAAATAATGAAA
Evidence
Evidence
Consensus:
![Page 77: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/77.jpg)
![Page 78: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/78.jpg)
3. GeneValidator
![Page 79: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/79.jpg)
Monica Dragan
Ismail Moghul
https://github.com/monicadragan/GeneValidatorhttps://github.com/IsmailM/GeneValidatorApp
![Page 80: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/80.jpg)
Monica Draganhttps://github.com/monicadragan/GeneValidatorhttps://github.com/IsmailM/GeneValidatorApp
Ismail Moghul
![Page 81: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/81.jpg)
GeneValidator
Run on:
★whole geneset: identify most problematic predictions
★alternative models for a gene (choose best)
★individual genes (while manually curating)
![Page 82: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/82.jpg)
Warning: Work in Progress
gem install GeneValidator gem install GeneValidatorApp
http://afra.sbcs.qmul.ac.uk/genevalidator
![Page 83: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/83.jpg)
3. Afra: Crowdsourcing gene model curation
![Page 84: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/84.jpg)
Gene predictionDozens of software algorithms: dozens of predictions
20% failure rate: •missing pieces •extra pieces •incorrect merging •incorrect splitting
Visual inspection... and manual fixing required. 1 gene = 20 minutes to 3 days 15,000 genes * 20 species = impossible. Ya
ndell
& E
nce
2013
NRG
GTCTACAATGCGATTGTAAAATAGCACGAgAGGTGCATATGATGAACGACTATGTTCCACAACCACAGCTCATATATAACATGATTTtGTTTGCCGAATTCATACACGCATTACAACACACATTGAATTCAATAATAATATCAAATTCACATTCAAAGCTTTCAAGTTAGACAAAAGTTTTAATGCCGTTTTtACCTGTTTTtGAAAAGGTAATTTTCTTTAGATATATACAGTTTGTAATaTTAGGTATTTTATAAACAGTGTGTATATTTCTTACAATATAAAAGACACAATTGCAAACTAGCATGATTGTAAACAATTGCTAAACGGATCAATATAAATTAAAATTGTAATATTAAGTATCAAACCGATAATTTTTATTTATTGTTCATTGTTTGTTCTTTATTTTGTTATTTGTAAATAATGAAA
Evidence
Evidence
Consensus:
![Page 85: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/85.jpg)
![Page 86: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/86.jpg)
Algorithm discovery by protein folding game playersFiras Khatiba, Seth Cooperb, Michael D. Tykaa, Kefan Xub, Ilya Makedonb, Zoran Popovićb,David Bakera,c,1, and Foldit PlayersaDepartment of Biochemistry; bDepartment of Computer Science and Engineering; and cHoward Hughes Medical Institute, University of Washington,Box 357370, Seattle, WA 98195
Contributed by David Baker, October 5, 2011 (sent for review June 29, 2011)
Foldit is a multiplayer online game in which players collaborateand compete to create accurate protein structure models. For spe-cific hard problems, Foldit player solutions can in some cases out-perform state-of-the-art computational methods. However, verylittle is known about how collaborative gameplay produces theseresults and whether Foldit player strategies can be formalized andstructured so that they can be used by computers. To determinewhether high performing player strategies could be collectivelycodified, we augmented the Foldit gameplay mechanics with toolsfor players to encode their folding strategies as “recipes” and toshare their recipes with other players, who are able to further mod-ify and redistribute them. Here we describe the rapid social evolu-tion of player-developed folding algorithms that took place in theyear following the introduction of these tools. Players developedover 5,400 different recipes, both by creating new algorithms andby modifying and recombining successful recipes developed byother players. The most successful recipes rapidly spread throughthe Foldit player population, and two of the recipes became parti-cularly dominant. Examination of the algorithms encoded in thesetwo recipes revealed a striking similarity to an unpublished algo-rithm developed by scientists over the same period. Benchmarkcalculations show that the new algorithm independently discov-ered by scientists and by Foldit players outperforms previouslypublished methods. Thus, online scientific game frameworks havethe potential not only to solve hard scientific problems, but also todiscover and formalize effective new strategies and algorithms.
citizen science ∣ crowd-sourcing ∣ optimization ∣ structure prediction ∣strategy
Citizen science is an approach to leveraging natural humanabilities for scientific purposes. Most such efforts involve
visual tasks such as tagging images or locating image features(1–3). In contrast, Foldit is a multiplayer online scientific discoverygame, in which players become highly skilled at creating accurateprotein structure models through extended game play (4, 5). Folditrecruits online gamers to optimize the computed Rosetta energyusing human spatial problem-solving skills. Players manipulateprotein structures with a palette of interactive tools and manipula-tions. Through their interactive exploration Foldit players also uti-lize user-friendly versions of algorithms from the Rosetta structureprediction methodology (6) such as wiggle (gradient-based energyminimization) and shake (combinatorial side chain rotamer pack-ing). The potential of gamers to solve more complex scientific pro-blems was recently highlighted by the solution of a long-standingprotein structure determination problem by Foldit players (7).
One of the key strengths of game-based human problem ex-ploration is the human ability to search over the space of possiblestrategies and adapt those strategies to the type of problem andstage of problem solving (5). The variability of tactics andstrategies stems from the individuality of each player as well asmultiple methods of sharing and evolution within the game(group play, game chat), and outside of the game [wiki pages (8)].One way to arrive at algorithmic methods underlying successfulhuman Foldit play would be to apply machine learning techniquesto the detailed logs of expert Foldit players (9). We chose insteadto rely on a superior learning machine: Foldit players themselves.
As the players themselves understand their strategies better thananyone, we decided to allow them to codify their algorithmsdirectly, rather than attempting to automatically learn approxi-mations. We augmented standard Foldit play with the ability tocreate, edit, share, and rate gameplay macros, referred to as“recipes” within the Foldit game (10). In the game each playerhas their own “cookbook” of such recipes, from which they caninvoke a variety of interactive automated strategies. Players canshare recipes they write with the rest of the Foldit community orthey can choose to keep their creations to themselves.
In this paper we describe the quite unexpected evolution ofrecipes in the year after they were released, and the striking con-vergence of this very short evolution on an algorithm very similarto an unpublished algorithm recently developed independentlyby scientific experts that improves over previous methods.
ResultsIn the social development environment provided by Foldit,players evolved a wide variety of recipes to codify their diversestrategies to problem solving. During the three and a half monthstudy period (see Materials and Methods), 721 Foldit players ran5,488 unique recipes 158,682 times and 568 players wrote 5,202recipes. We studied these algorithms and found that they fellinto four main categories: (i) perturb and minimize, (ii) aggressiverebuilding, (iii) local optimize, and (iv) set constraints. The firstcategory goes beyond the deterministic minimize functionprovided to Foldit players, which has the disadvantage of readilybeing trapped in local minima, by adding in perturbations to leadthe minimizer in different directions (11). The second categoryuses the rebuild tool, which performs fragment insertion withloop closure, to search different areas of conformation space;these recipes are often run for long periods of time as they aredesigned to rebuild entire regions of a protein rather than justrefining them (Fig. S1). The third category of recipes performslocal minimizations along the protein backbone in order to im-prove the Rosetta energy for every segment of a protein. The finalcategory of recipes assigns constraints between beta strands orpairs of residues (rubber bands), or changes the secondary struc-ture assignment to guide subsequent optimization.
Different algorithms were used with very different frequenciesduring the experiment. Some are designated by the authors aspublic and are available for use by all Foldit players, whereasothers are private and available only to their creator or theirFoldit team. The distribution of recipe usage among differentplayers is shown in Fig. 1 for the 26 recipes that were run over1,000 times. Some recipes, such as the one represented by theleftmost bar, were used many times by many different players,while others, such as the one represented by the pink bar in the
Author contributions: F.K., S.C., Z.P., and D.B. designed research; F.K., S.C., M.D.T., andF.P. performed research; F.K., S.C., M.D.T., K.X., and I.M. analyzed data; and F.K., S.C., Z.P.,and D.B. wrote the paper.
The authors declare no conflict of interest.
Freely available online through the PNAS open access option.1To whom correspondence should be addressed. E-mail: [email protected].
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1115898108/-/DCSupplemental.
www.pnas.org/cgi/doi/10.1073/pnas.1115898108 PNAS ∣ November 22, 2011 ∣ vol. 108 ∣ no. 47 ∣ 18949–18953
BIOPH
YSICSAND
COMPU
TATIONALBIOLO
GY
PSYC
HOLO
GICALAND
COGNITIVESC
IENCE
S
http://Fold.it
![Page 87: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/87.jpg)
• Recruiting & retaining contributors
Crowd-sourcing the visual inspection + correction of gene models.
Challenges
![Page 88: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/88.jpg)
Recruiting & retaining contributorsPlan A: get students. • Increase accessibility:
• Make tasks small & simple • Need excellent tutorials & training • Need an intelligent “mothering” user interface.
• Provide rewards: • Better grades • Learning experience • Good karma (helping science) • Prestige & pride (on facebook; points & badges “leaderboard”, with
certificates, in publications) • Opportunities to develop expertise & responsibilities
![Page 89: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/89.jpg)
Crowd-sourcing the visual inspection + correction of gene models.
Challenges
• Recruiting & retaining contributors
• Ensuring quality
![Page 90: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/90.jpg)
Ensuring quality
• Excellent tutorials/training
• Make tasks small & simple
• Redundancy
• Review of conflicts by senior users.
Begin
EĞĞĚƐ�ĐƵƌĂƟŽŶ
�ƌĞĂƚĞ�ŝŶŝƟĂů�ƚĂƐŬƐ
Being curated
Curate
Being curated
Curate
Being curated
Curate
Submit Submit Submit
�ƵƚŽͲĐŚĞĐŬ
�ŽŶĞ
/ŶĐŽŶƐŝƐƚ
ĞŶƚ͗�ĐƌĞĂ
ƚĞ�
“ƌĞǀŝĞǁ͟
�ƚĂƐŬ�
�ŽŶƐŝƐƚĞŶƚ͗�create nexƚ�ƌĞƋƵŝƌĞĚ�ƚĂƐŬ
![Page 91: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/91.jpg)
Crowd-sourcing the visual inspection + correction.
Challenges
http://afra.sbcs.qmul.ac.ukAnurag Priyam http://github.com/yeban/afra
• Recruiting & retaining contributors
• Ensuring quality
![Page 92: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/92.jpg)
Warning: Work in Progress
![Page 93: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/93.jpg)
![Page 94: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/94.jpg)
![Page 95: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/95.jpg)
![Page 96: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/96.jpg)
Timelines• Rolled out to:
• 8 MSc students
• 20 3rd year students
• Need to improve tutorials/guidance/documentation
• Roll out to 200 first years (few months)
• Expand
![Page 97: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/97.jpg)
Summary• Ants are cool
• Exciting times & big challenges
• Inspiration from people working with computers more/longer
• SequenceServer - set up custom BLAST servers
• Bionode -modular streams for bioinformatics
• GeneValidator - identifying problems with gene predictions
• Afra - infrastructure to crowdsource gene curation to the masses
![Page 98: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/98.jpg)
Recruiting Genomehacker/Bioinformatics support
![Page 99: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/99.jpg)
GitHub
![Page 100: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/100.jpg)
Thanks!
[email protected]@yannick__
http://yannick.poulet.org
Colleagues & Collaborators @ QMUL & UNIL Anurag Priyam @yeban Monica Dragan Ismail Moghul Vivek Rai Bruno Vieira @bmpvieira
![Page 101: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/101.jpg)
![Page 102: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/102.jpg)
Maybe
![Page 103: 2014 10-15-Nextbug edinburgh](https://reader037.vdocuments.mx/reader037/viewer/2022110312/55b9a1bcbb61eb27408b4724/html5/thumbnails/103.jpg)
genome evolution social evolutionGenerally
Single- vs. Multiple queennessin fire antsin similar independent species
•one or many loci? •one or many genes? •convergence?
Social parasitism
Strengths of selection in social evolution
concepts & mechanisms
Medically relevant questionsCandidate gene studies
VitellogeninSex determination genes
functional testing....
Tools for genomics work on emerging model organisms
Molecular response to social upheaval