brendel group presentation: 17 oct 2013
TRANSCRIPT
Differential expression in the paper wasp Polistes dominulaDaniel S. Standage, Brendel Group Meeting, 17 Oct 2013
Experimental design
6 queen samples
6 worker samples
Queen x and worker x from same colony (x ∈ [1 .. 6])
Hypothesis: identify handful of critical caste-related genes/transcripts
Initial (naïve) analysis with RSEM/EBSeq
209,675 transcripts (assembled by Trinity)
RSEM and EBSeq completed without warnings
80-85% reads mapped
Many DE transcripts reported
5,769 (FDR=.05)
4,763 (FDR=.01)
3,878 (FDR=.001)
Permutation testing
Randomly shuffle caste labels (queen or worker)
Re-run differential expression analysis
Repeat test
Compare number of transcripts reported as DE for each permutation
https://github.com/standage/dept
Permutation testing
Real data 4,763 (FDR=.01)
Permutation 1 5,112
Permutation 2 4,174
Permutation 3 4,474
Permutation 4 4,307
Permutation 5 4,718
Permutation 6 4,312
Permutation 7 4,171
Permutation 8 4,714
Permutation 9 3,828
Permutation 10 5,192
Some observations
Some expression levels very low
Some transcripts had very few reads mapped
Some transcripts had many read mapped
Difficulty normalizing over large dynamic range?
Filter transcripts
Reads mapped
queen/worker reads mapped > 2,500
overall reads < 1,000,000
Samples
4+ queen/worker samples with > 0 reads mapped
Distribution of reads mapped
mean(queen/worker reads mapped) * 0.9 > stdev(queen/worker reads mapped)
DE analysis on filtered transcripts
40,498 transcripts
RSEM/EBSeq completed without warnings
20-35% reads mapped
Still many DE transcripts reported
1,680 (FDR=.05)
1,328 (FDR=.01)
1,037 (FDR=.001)
PdomTSAr1.1-034114 (FC=126)
Sample Expression Reads mapped Reads (adjusted)
Q1 0.00 5232 5669.09
Q2 0.00 10046 5148.89
Q3 51.18 9188 6644.97
Q4 136.68 7920 6901.36
Q5 698.51 27862 6712.76
Q6 0.00 2582 5739.05
W1 0.00 5866 6920.72
W2 0.00 2046 5029.50
W3 0.00 2628 5879.19
W4 0.00 4308 5022.74
W5 0.00 7396 5983.82
W6 0.00 9132 6467.88
PdomTSAr1.1-007723 (FC=2)
Sample Expression Reads mapped Reads (adjusted)
Q1 198.82 928 1005.53
Q2 445.48 1864 955.36
Q3 335.03 1330 961.89
Q4 267.42 1048 913.21
Q5 908.57 3988 960.82
Q6 114.54 458 1018.00
W1 125.65 714 842.38
W2 0.00 318 781.71
W3 78.41 426 953.02
W4 116.07 650 757.84
W5 161.56 1028 831.72
W6 147.01 1262 893.83
RSEM expected count
'expected_count' is the sum of the posterior probability of each read comes from this transcript over all reads. Because 1) each read aligning to this transcript has a probability of being generated from background noise; 2) RSEM may filter some alignable low quality reads, the sum of expected counts for all transcript are generally less than the total number of reads aligned.