eran yanowski, eran hornstein’s: monitor drug impact on the transcriptome of mouse beta cells...

Post on 01-Jan-2016

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Eran Yanowski, Eran Hornstein’s:

Monitor drug impact on the transcriptome of mouse beta cells

(primary and cell-line) using Transeq/RNA-Seq

17.08.15Report I

Overview of basic parameters of your

NGS run (per sample):

samples origin: Mouse Beta cells,

line

SR 60

FASTQC report for sample 1000_0A(randomly chosen)

FASTQC report for sample 1000_0A : Most reads start with 5’ GGG, typical in Transeq

procedure

FASTQC report for sample :overrepresentation of Ins2 reads

Pre-Pipeline data processing

Þ Need to remove 5’GGG sequences ÞNeed to remove reads containing either poly-A or poly-T

Reads summary following removal of poly-A & poly-T reads (Mouse beta cells, line)

Sample Name # total reads # filtered based on polyA# filtered based on

poly T

# filtered based on poly A/T

% reads lost due to polyA/T read

content

# of reads post-polyA/T exclusion

(input for mapping)

1000_0A 4522647 213478 42572 256050 5.66% 42665971000_0B 4157116 180098 38352 218450 5.25% 39386661000_0C 3566132 151943 30774 182717 5.12% 3383415

1000_12A 5799749 275383 51874 327257 5.64% 54724921000_12B 5056278 240559 46506 287065 5.68% 47692131000_12C 5194233 233382 44016 277398 5.34% 49168351000_1A 4533427 199358 35904 235262 5.19% 42981651000_1B 5526596 238100 52237 290337 5.25% 52362591000_1C 5515394 250700 46548 297248 5.39% 52181461000_6A 14039334 661855 96809 758664 5.40% 132806701000_6B 5168398 232440 38577 271017 5.24% 48973811000_6C 5600189 265320 47521 312841 5.59% 5287348100__0A 4922399 239109 54170 293279 5.96% 4629120100_0B 5200307 262254 49176 311430 5.99% 4888877100_0C 5252913 266526 58464 324990 6.19% 4927923

100_12A 3748359 163938 33309 197247 5.26% 3551112100_12B 5900067 263716 48099 311815 5.28% 5588252100_12C 5346386 239766 47480 287246 5.37% 5059140100_1A 5809281 312301 61939 374240 6.44% 5435041100_1B 4582181 202274 46677 248951 5.43% 4333230100_1C 5494900 274244 51519 325763 5.93% 5169137100_6A 5075956 251945 49379 301324 5.94% 4774632100_6B 5073253 261229 46717 307946 6.07% 4765307100_6C_ 4875816 227749 46709 274458 5.63% 4601358

FASTQC report following removal of poly-A & poly-T reads & 5’ GGG trimming: (sample

1000_0A )

‘normal’ frequency of reads’ 5’ GGG reduced frequency of

%A

Data processing (pipeline) workflow (done using Mouse_mm9_v1 base repository)

1. If each sample has more than one fastq file (per sequencing read) then fastq files merging-step is performed

2. Transeq Reads pre-processing (5’ GGG trimming & polyA & polyT removal)

3. Processed-reads Mapping (using TopHat)4. TES reads coverage profile (Transeq protocol QC step) 5. Reads Count (per 3’UTR) (using HTSeq-count)6. Data Normalization and Differential Gene Expression (using DESeq2)7. QC: Principal Component Analysis (PCA) & Hierarchical Clustering

Reads mapping summary_Exp4_1_Mouse Beta cells, line

Sample Name# of reads post-polyA/T

exclusion (input for mapping)

# mapped reads % mapping# Uniquely

mapped reads% unique mapping Total reads count % read counted/mapped %read counted/Total

1000_0A 4266597 2878974 67.50% 2659357 62.33% 2134781 74.15% 47.20%1000_0B 3938666 2611453 66.30% 2416329 61.35% 1932450 74.00% 46.49%1000_0C 3383415 2289845 67.70% 2084541 61.61% 1661576 72.56% 46.59%

1000_12A 5472492 3787628 69.20% 3497840 63.92% 2816961 74.37% 48.57%1000_12B 4769213 3190164 66.90% 2899057 60.79% 2314249 72.54% 45.77%1000_12C 4916835 3445350 70.10% 3184374 64.76% 2590155 75.18% 49.87%1000_1A 4298165 3001892 69.80% 2777304 64.62% 2245920 74.82% 49.54%1000_1B 5236259 3554120 67.90% 3279772 62.64% 2623326 73.81% 47.47%1000_1C 5218146 3594977 68.90% 3316174 63.55% 2670961 74.30% 48.43%1000_6A 13280670 6544448 49.30% 6061735 45.64% 4841756 73.98% 34.49%1000_6B 4897381 3386634 69.20% 3080472 62.90% 2481236 73.27% 48.01%1000_6C 5287348 3617658 68.40% 3286712 62.16% 2642159 73.04% 47.18%100__0A 4629120 3052511 65.90% 2771379 59.87% 2202901 72.17% 44.75%100_0B 4888877 3234356 66.20% 2987823 61.11% 2394370 74.03% 46.04%100_0C 4927923 3266384 66.30% 2959649 60.06% 2355123 72.10% 44.83%

100_12A 3551112 2315894 65.20% 2139258 60.24% 1707222 73.72% 45.55%100_12B 5588252 3511934 62.80% 3191989 57.12% 2551211 72.64% 43.24%100_12C 5059140 3403456 67.30% 3091671 61.11% 2474196 72.70% 46.28%100_1A 5435041 3607340 66.40% 3284228 60.43% 2616466 72.53% 45.04%100_1B 4333230 2860763 66.00% 2601528 60.04% 2066053 72.22% 45.09%100_1C 5169137 3455625 66.90% 3187479 61.66% 2545546 73.66% 46.33%100_6A 4774632 3237959 67.80% 2999976 62.83% 2410481 74.44% 47.49%100_6B 4765307 3222634 67.60% 2982099 62.58% 2417139 75.01% 47.64%100_6C_ 4601358 3096982 67.30% 2817727 61.24% 2258438 72.92% 46.32%

Reads mapping summary_Exp4_1_Mouse Beta cells, line: 23-24% of total reads count were mapped to Ins2/Ins1 genes

Sample Name(Ins2+Ins1) read

count%(Ins2+Ins1) read

count/Total read countTotal_Read_counts_without_(Ins2+I

ns1)

%Read_counts_without_(Ins2+Ins1)/mappe

d

# of UMI-filtered reads counted

% UMI-filtered reads counted

Experiment code

1000_0A 521168 24.4% 1613613 56.05% 1303102 49.00% Exp4_1_beta cell-line1000_0B 465186 24.1% 1467264 56.19% 1193399 49.39% Exp4_1_beta cell-line1000_0C 369931 22.3% 1291645 56.41% 1049199 50.33% Exp4_1_beta cell-line

1000_12A 677079 24.0% 2139882 56.50% 1670334 47.75% Exp4_1_beta cell-line1000_12B 517458 22.4% 1796791 56.32% 1451853 50.08% Exp4_1_beta cell-line1000_12C 613419 23.7% 1976736 57.37% 1550960 48.71% Exp4_1_beta cell-line1000_1A 543266 24.2% 1702654 56.72% 1354037 48.75% Exp4_1_beta cell-line1000_1B 634653 24.2% 1988673 55.95% 1558003 47.50% Exp4_1_beta cell-line1000_1C 648519 24.3% 2022442 56.26% 1610963 48.58% Exp4_1_beta cell-line1000_6A 1155934 23.9% 3685822 56.32% 2897404 47.80% Exp4_1_beta cell-line1000_6B 551344 22.2% 1929892 56.99% 1524828 49.50% Exp4_1_beta cell-line1000_6C 596020 22.6% 2046139 56.56% 1611762 49.04% Exp4_1_beta cell-line100__0A 497816 22.6% 1705085 55.86% 1380827 49.82% Exp4_1_beta cell-line100_0B 578813 24.2% 1815557 56.13% 1461077 48.90% Exp4_1_beta cell-line100_0C 544482 23.1% 1810641 55.43% 1451192 49.03% Exp4_1_beta cell-line

100_12A 404082 23.7% 1303140 56.27% 1067542 49.90% Exp4_1_beta cell-line100_12B 574329 22.5% 1976882 56.29% 1568245 49.13% Exp4_1_beta cell-line100_12C 568210 23.0% 1905986 56.00% 1507206 48.75% Exp4_1_beta cell-line100_1A 618801 23.7% 1997665 55.38% 1576598 48.01% Exp4_1_beta cell-line100_1B 467712 22.6% 1598341 55.87% 1277871 49.12% Exp4_1_beta cell-line100_1C 618901 24.3% 1926645 55.75% 1519110 47.66% Exp4_1_beta cell-line100_6A 577676 24.0% 1832805 56.60% 1465024 48.83% Exp4_1_beta cell-line100_6B 586016 24.2% 1831123 56.82% 1440583 48.31% Exp4_1_beta cell-line100_6C_ 518860 23.0% 1739578 56.17% 1386532 49.21% Exp4_1_beta cell-line

Median 1831964Average 1879375

Reads summary following removal of polyA and polyT reads (Exp4_2_Mouse Primary Beta cells):

Sample Name # total reads # filtered based on polyA# filtered based on

poly T

# filtered based on poly A/T

% reads lost due to polyA/T read

content

# of reads post-polyA/T exclusion

(input for mapping)0A 5588409 224404 20705 245109 4.39% 53433000B 5780324 203388 17500 220888 3.82% 55594360C 8911138 306635 25285 331920 3.72% 8579218

1000_12A 5116948 178673 14436 193109 3.77% 49238391000_12B 6016116 250518 22761 273279 4.54% 57428371000_12C 10153149 379818 27848 407666 4.02% 97454831000_1A 5646586 224835 19976 244811 4.34% 54017751000_1B 6618582 243771 19876 263647 3.98% 63549351000_1C 11190335 345687 32945 378632 3.38% 108117031000_6A 5424645 219720 17175 236895 4.37% 51877501000_6B 6477781 289338 21210 310548 4.79% 61672331000_6C 12366201 375283 28681 403964 3.27% 11962237100_12A 5467491 253599 19289 272888 4.99% 5194603100_12B 5310245 217038 18173 235211 4.43% 5075034100_12C 9919613 369879 27390 397269 4.00% 9522344100_1A 4642696 190171 15133 205304 4.42% 4437392100_1B 7499436 292209 27384 319593 4.26% 7179843100_1C 8211277 281292 21373 302665 3.69% 7908612100_6A 4517430 257981 18671 276652 6.12% 4240778100_6B 6134348 257537 20423 277960 4.53% 5856388100_6C 8622327 330255 32769 363024 4.21% 8259303

Reads mapping summary_Exp4_2_Mouse Primary Beta cells:

Sample Name# of reads post-

polyA/T exclusion (input for mapping)

# mapped reads % mapping# Uniquely

mapped reads% unique mapping Total reads count % read counted/mapped %read counted/Total

0A 5343300 4780133 89.50% 4594190 85.98% 4080129 85.36% 73.01%0B 5559436 5030565 90.50% 4756610 85.56% 4293252 85.34% 74.27%0C 8579218 7836698 91.30% 7542463 87.92% 6749550 86.13% 75.74%

1000_12A 4923839 4436601 90.10% 4224644 85.80% 3795044 85.54% 74.17%1000_12B 5742837 5141665 89.50% 4873186 84.86% 4314688 83.92% 71.72%1000_12C 9745483 8899011 91.30% 8470029 86.91% 7536704 84.69% 74.23%1000_1A 5401775 4848188 89.80% 4580392 84.79% 4053938 83.62% 71.79%1000_1B 6354935 5755635 90.60% 5452595 85.80% 4877657 84.75% 73.70%1000_1C 10811703 9906774 91.60% 8920246 82.51% 7843983 79.18% 70.10%1000_6A 5187750 4656630 89.80% 4427373 85.34% 3970277 85.26% 73.19%1000_6B 6167233 5540137 89.80% 5258449 85.26% 4735818 85.48% 73.11%1000_6C 11962237 11026944 92.20% 10616369 88.75% 9468819 85.87% 76.57%100_12A 5194603 4659567 89.70% 4464784 85.95% 3988557 85.60% 72.95%100_12B 5075034 4546346 89.60% 4373427 86.18% 3894705 85.67% 73.34%100_12C 9522344 8709660 91.50% 8273383 86.88% 7335857 84.23% 73.95%100_1A 4437392 3975240 89.60% 3762015 84.78% 3324750 83.64% 71.61%100_1B 7179843 6492766 90.40% 6239520 86.90% 5596841 86.20% 74.63%100_1C 7908612 7196482 91.00% 6828275 86.34% 6135099 85.25% 74.72%100_6A 4240778 3727434 87.90% 3587523 84.60% 3195721 85.74% 70.74%100_6B 5856388 5274621 90.10% 5079855 86.74% 4550876 86.28% 74.19%100_6C 8259303 7470850 90.50% 7080063 85.72% 6245548 83.60% 72.43%

Reads mapping summary_Exp4_2_Mouse Primary Beta cells: % of total reads count were mapped to Ins2/Ins1 genes

Sample Name(Ins2+Ins1) read

count%(Ins2+Ins1) read count

Total_Read_counts_without_(Ins2+Ins1)

%Read_counts_without_(Ins2+I

ns1)/mapped

# of UMI-filtered reads counted

% UMI-filtered reads counted

Experiment code

0A 2079103 51.0% 2001026 41.86% 1250310 27.22% Exp4_2_primary beta cells0B 2272032 52.9% 2021220 40.18% 1256347 26.41% Exp4_2_primary beta cells0C 3659420 54.2% 3090130 39.43% 1829003 24.25% Exp4_2_primary beta cells

1000_12A 1806555 47.6% 1988489 44.82% 1239960 29.35% Exp4_2_primary beta cells1000_12B 2137245 49.5% 2177443 42.35% 1362445 27.96% Exp4_2_primary beta cells1000_12C 3755984 49.8% 3780720 42.48% 2156596 25.46% Exp4_2_primary beta cells1000_1A 2003041 49.4% 2050897 42.30% 1275397 27.84% Exp4_2_primary beta cells1000_1B 2543057 52.1% 2334600 40.56% 1414740 25.95% Exp4_2_primary beta cells1000_1C 3897421 49.7% 3946562 39.84% 2244427 25.16% Exp4_2_primary beta cells1000_6A 1875296 47.2% 2094981 44.99% 1289816 29.13% Exp4_2_primary beta cells1000_6B 2438176 51.5% 2297642 41.47% 1403013 26.68% Exp4_2_primary beta cells1000_6C 4863790 51.4% 4605029 41.76% 2545816 23.98% Exp4_2_primary beta cells100_12A 1954496 49.0% 2034061 43.65% 1271547 28.48% Exp4_2_primary beta cells100_12B 1935262 49.7% 1959443 43.10% 1222657 27.96% Exp4_2_primary beta cells100_12C 3682705 50.2% 3653152 41.94% 2108318 25.48% Exp4_2_primary beta cells100_1A 1655797 49.8% 1668953 41.98% 1067918 28.39% Exp4_2_primary beta cells100_1B 2929336 52.3% 2667505 41.08% 1602695 25.69% Exp4_2_primary beta cells100_1C 3302584 53.8% 2832515 39.36% 1692040 24.78% Exp4_2_primary beta cells100_6A 1569769 49.1% 1625952 43.62% 1069131 29.80% Exp4_2_primary beta cells100_6B 2364828 52.0% 2186048 41.44% 1347814 26.53% Exp4_2_primary beta cells100_6C 3103130 49.7% 3142418 42.06% 1851135 26.15% Exp4_2_primary beta cells

Median 2186048Average 2578990

Hierarchical clustering: Mouse beta cells, line

Drug Con=1000Drug con=100

Hierarchical clustering: Mouse Primary beta cells

Drug Con=1000Drug con=100

Separated by processing day: A/B/C ?

• Differential gene expression data is assessed by DESeq2• DESeq output is summarized in a single

sheet per experiment• Genes differentially-expressed during each

time series were called by two independent means:

I. Using pairwise comparison vs. time zeroII. Using a tool named: maSigPro

RNA-Seq drug dose response (1000 and 100)/time series (0, 1, 6 and 12 hrs) gene filtering criteria

• The data filters used during the last analysis performed (12.08.15):I. maSigPro: • NormCounts of genes meeting MaxRawCount>50 served as input; • maSigPro output was further filtered against potential outlier genes (flagged

by maSigPro) • Genes showing FC greater than 1.5 (at least in one of the paired-comparisons); • By default maSigPro requires (BH) adjusted p-value <0.05 II. Pairwise comparison criteria:

MaxRawCount>50, adjusted-p-value<0.05 and FC greater than 1.5 (at least in one of the paired-comparisons);

Exp 4.2: Mouse primary beta cells:Genes meeting criteria: Paired-comparison yields higher

100/1000 gene overlap (than the one obtained with maSigPro)

maSigPro filtered output Pairwise-comparison filtered output

Exp 4.2: Mouse primary beta cells: Most of maSigPro shared-genes are included

in the group of paired shared genes

To determine which output is preferred data validation using an orthogonal method is essential

Partitioning clustering of genes responsive in both drug concentrations

Paired comparison, con=1000, shared_genes partitioning clustering

Paired comparison, con=100, shared_genes partitioning clustering

Exp 4.2: Mouse primary beta cells:

Exp4.1: Mouse Beta Cells, cell line

• Generally this experiment yielded less significant results• When applying the same filters used for the primary beta-cells

datasets, very few genes pass; • The possibility of using p-value (instead of adjusted-p-value should be

tested by the investigator)• The level of 100/1000 intersection (shared-genes) is lower here

compared to the one observed in the primary cells experiment

Venn diagram of unfiltered maSigPro outputs of both the primary (100, 1000) and the cell-line

TranSeq datasets

• Low overall intersection between the primary and the cell-line ‘significant’

genes;• Relatively low intersection between the

two drug concentrations tested on the beta cell line

top related