2016 10-27 timbers
Post on 07-Jan-2017
325 Views
Preview:
TRANSCRIPT
Combining phenome and genome to uncover the genetic basis for naturally occurring differences in development
and behavior
CSHL Biological Data Science Meeting, 2016
Tiffany Timbers, Ph.D.
Dept. of Statistics & Master’s of Data Science Program, University of British Columbia
@TiffanyTimbers #BIODATA161
Caenorhabditis elegans as a model in the lab
• Easy to work with and store in lab
• Easy genetics/genome
• Complete neuronal wiring diagram
• Exhibits many well characterized behaviours
5
Caenorhabditis elegans also is a wild animal!
individuals can shift during the L1 stage to analternative developmental route and enter a pre-dauer stage (L2d), followed by the non-feedingdiapause stage called dauer (an alternative L3stage) (Figure 3). Dauer larvae are resistant tovarious stresses and can survive for several monthswithout food. Upon their return to more favorableconditions, dauer larvae feed again and resumedevelopment.
Population demographic surveys at the localscale in orchards and woods indicate that C.elegans has a boom-and-bust lifestyle (Felix andDuveau, 2012). C. elegansmetapopulations evolvein a fluctuating environment where optimal habitatsare randomly distributed in space and time(Figure 4A). A cycle of colonization of a foodsource likely begins when one to several dauerlarvae discover a fruit or stem, exit the dauer stageand seed a growing population of up to 104
feeding nematodes at different life-cycle stages(Figure 3). Some moderate-sized populationsfound in rotting fruits and stems do not containany dauer larvae, but larger ones always includeonly adults, L1, L2d and dauers (Felix and Duveau,2012). As a food source runs low, dauers may leave
it to explore the neighboring environment for newislands of resources. Most of them will fail.
Developmental regulation and the behavior ofdauer larvae are central to the C. eleganslifestyle. Dauer larvae display active locomotionand a specific behavior called nictation, wherethey stand on their tail and wave their body inthe air. Remarkably, dauers may also congregateto form a column and nictate as a group(Felix and Duveau, 2012) (see the video;http://www.wormatlas.org/dauer/behavior/Images/DBehaviorVID4.mov). These behaviorsare thought to help dauers to find passinginvertebrate hosts that they can use for theirdispersal, such as isopods, snails and slugs.Together, dauer physiology and behavior sug-gest that this developmental stage plays a keyrole in C. elegans’ stress resistance, long-distance dispersal, and possibly its overwinteringcapacity.
Over the year, in surveys performed in Franceand Germany, C. elegans populations in rottingfruits typically peak in the fall, with proliferationpossible in spring through to early winter (Felixand Duveau, 2012; Petersen et al., 2014). This
Figure 2. The habitat of C. elegans at different scales. (A–D) Landscapes that correspond to the macroscale C. elegans habitat; all are relatively humid
areas where C. elegans has been found: (A) wet shrubland; (B) urban garden; (C) riverbank; and (D) fruit trees. (E–G) Bacteria-rich decomposing vegetal
substrates, corresponding to the microscale C. elegans habitat: (E) Arum stem; (F) oranges and (G) plums. (H) Detail of a rotting apple at the stage where
C. elegans proliferates. Springtails (white) and a mite are examples of animals that share the bacteria-rich habitat of C. elegans and that are potential
carriers and/or predators (see also Table 1). (I) C. elegans nematodes on an E. coli lawn, just coming out of a rotten fruit. (J) Scanning electron micrograph
of C. elegans infected with the fungus Drechmeria coniospora. Image credits: Marie-Anne Felix.
DOI: 10.7554/eLife.05849.003
Frezal and Felix. eLife 2015;4:e05849. DOI: 10.7554/eLife.05849 3 of 14
Feature article The natural history of model organisms | C. elegans outside the Petri dish
Frezal & Felix, eLife, 2015
6
−50
0
50
−100 0 100 200long
lat
~ 40 genetically diverse wild-isolate C. elegans strains
C. Loucks
•Genomes sequenced to > 25X •630 541 unique single nucleotide variants (SNVs)•65 360 missense, 1015 nonsense and 545 splice mutations•14 602 genes have at least one non-synonymous or splice mutation
7
• CO2 may serve as an important predictor of food, mates, and/or predators.
C. elegans can sense, and generally avoid CO2
0 sec 100 sec
4 psi
750 msec
300 sec
CO2
rapid burstof turns
normalmovement
8
stimulus delivery
Swierczek & Giles et al. Nature Methods 2011
image extraction
post-experimentanalysis
High-content behavioural and morphological screening using the Multi-Worm Tracker
CO2
The Multi-worm tracker records many features as a time series (~ 25 frames/sec):
• area • speed • angular speed • length • width • kink • direction bias • path length
• curve • path length • direction
consistency • x-y coordinates • orientation • sideways rolling
speed
11
Are we looking at all the important aspects of the phenotype?
How would do we know if we are?
The behavioural phenome is very large, how do we focus without losing potentially very important
information?
16
A machine learning approach: Iterative denoising trees (IDT)
to reduce dimensionality of time series behaviour data
17
- Discovery of potentially meaningful relationships and structures in large heterogeneous datasets
- Originally demonstrated for use in text mining to identify meaningful, implicit, and previously unknown information in an unstructured corpus (Giles et al., 2008)
- Used successfully to reduce dimensionality of behavioral time series from Drosophila larvae (Vogelstein et al., Science, 2013)
A machine learning approach: Iterative denoising trees (IDT)
to reduce dimensionality of time series behaviour data
18
|worm1|strain1|val1 @ time1 |...|val1 @ timei|...|valj @ time1|valj @ timei| ...|wormn|strain1|val1 @ time1 |...|val1 @ timei|...|valj @ time1|valj @ timei|...|worm1|strainm|val1 @ time1 |...|val1 @ timei|...|valj @ time1|valj @ timei|...|wormn|strainm|val1 @ time1 |...|val1 @ timei|...|valj @ time1|valj @ timei|
7800 X 9600
A machine learning approach: Iterative denoising trees (IDT)
to reduce dimensionality of time series behaviour data
dist
ingu
isha
ble
CO
2 beh
avio
r-typ
es
123456789
1011121314151617
• CO2 responses from ~40 wild isolates may results in ~20 distinct sub-behaviours
• Groupings still need to be optimized & verified (work in progress)
19
Use the behavior-tree to derive phenotypic profiles for
each strain
strain behav1 behav2 behav3 behav4 behav5 … behavk
strain1 0.85 0.05 0.02 0.04 0.03 … 0.01strain2 0.05 0.02 0.04 0.03 0.01 … 0.85strain3 0.90 0.01 0.02 0.00 0.01 0.01
… … … … … … … …strainm 0.02 0.00 0.01 0.2 0.00 … 0.90
total1.01.01.0…1.0
20
covariants(e.g. age, sex)
variant sets(e.g. genes)
Linear model:
null model:
disease/phenotype
Wu et al., Am. J. Hum. Genet., 2011
Sequence kernel association test (SKAT) (think GWAS for NGS data)
• Whole-genome sequencing can detect rare variants
• Group variants into genes/windows and tests for association
• Variants can be assigned weights: f(Gi)
• Can easily obtain a p-value for each gene, which will need to be adjusted for multiple comparisons
SKAT is a successful technique to identify causal genes for single phenotypes in C. elegans (Timbers et al., PLoS Genetics, 2016)
21
Multivariate Rare-Variant Association Test to derive a list of candidate genes driving CO2 behavior-types
Ho: ß = 0
MURAT - a multi-variate SKAT method
Sun et al., Eur. J. Hum. Genet. 2016
22
Summary of a work in progress:
1. IDT to reduce dimensionality of time series behaviour data
2. Perform MURAT rare-variant analysis to identify candidate genes/regions
3. Confirm roles of candidates using standard genetic and cell biology methods
4. Crispr-swap of variants between wild-isolate strains
https://github.com/LerouxLab/Celegans_wild_isolate_behaviour23
top related