brendel group presentation: 17 oct 2013

Differential expression in the paper wasp Polistes dominulaDaniel S. Standage, Brendel Group Meeting, 17 Oct 2013

Experimental design

6 queen samples

6 worker samples

Queen x and worker x from same colony (x ∈ [1 .. 6])

Hypothesis: identify handful of critical caste-related genes/transcripts

Initial (naïve) analysis with RSEM/EBSeq

209,675 transcripts (assembled by Trinity)

RSEM and EBSeq completed without warnings

80-85% reads mapped

Many DE transcripts reported

5,769 (FDR=.05)

4,763 (FDR=.01)

3,878 (FDR=.001)

Permutation testing

Randomly shuffle caste labels (queen or worker)

Re-run differential expression analysis

Repeat test

Compare number of transcripts reported as DE for each permutation

https://github.com/standage/dept

https://github.com/standage/dept

Permutation testing

Real data 4,763 (FDR=.01)

Permutation 1 5,112

Permutation 2 4,174

Permutation 3 4,474

Permutation 4 4,307

Permutation 5 4,718

Permutation 6 4,312

Permutation 7 4,171

Permutation 8 4,714

Permutation 9 3,828

Permutation 10 5,192

Some observations

Some expression levels very low

Some transcripts had very few reads mapped

Some transcripts had many read mapped

Difficulty normalizing over large dynamic range?

Filter transcripts

Reads mapped

queen/worker reads mapped > 2,500

overall reads < 1,000,000

Samples

4+ queen/worker samples with > 0 reads mapped

Distribution of reads mapped

mean(queen/worker reads mapped) * 0.9 > stdev(queen/worker reads mapped)

DE analysis on filtered transcripts

40,498 transcripts

RSEM/EBSeq completed without warnings

20-35% reads mapped

Still many DE transcripts reported

1,680 (FDR=.05)

1,328 (FDR=.01)

1,037 (FDR=.001)

PdomTSAr1.1-034114 (FC=126)

Sample Expression Reads mapped Reads (adjusted)

Q1 0.00 5232 5669.09

Q2 0.00 10046 5148.89

Q3 51.18 9188 6644.97

Q4 136.68 7920 6901.36

Q5 698.51 27862 6712.76

Q6 0.00 2582 5739.05

W1 0.00 5866 6920.72

W2 0.00 2046 5029.50

W3 0.00 2628 5879.19

W4 0.00 4308 5022.74

W5 0.00 7396 5983.82

W6 0.00 9132 6467.88

PdomTSAr1.1-007723 (FC=2)

Sample Expression Reads mapped Reads (adjusted)

Q1 198.82 928 1005.53

Q2 445.48 1864 955.36

Q3 335.03 1330 961.89

Q4 267.42 1048 913.21

Q5 908.57 3988 960.82

Q6 114.54 458 1018.00

W1 125.65 714 842.38

W2 0.00 318 781.71

W3 78.41 426 953.02

W4 116.07 650 757.84

W5 161.56 1028 831.72

W6 147.01 1262 893.83

RSEM expected count

'expected_count' is the sum of the posterior probability of each read comes from this transcript over all reads. Because 1) each read aligning to this transcript has a probability of being generated from background noise; 2) RSEM may filter some alignable low quality reads, the sum of expected counts for all transcript are generally less than the total number of reads aligned.

Next (final) steps

Look more into “expected count”

Additional filtering?

Publish!

brendel group presentation: 17 oct 2013

Technology

overall reads

reads mappedsome transcripts

total number of reads

alignable low quality

lowsome transcripts

transcripts reported1

transcripts reported5

worker x