ebseq: an r package for di erential expression analysis ...kendzior/ebseq/ebseq...ebseq: an r...
TRANSCRIPT
EBSeq: An R package for differential expression
analysis using RNA-seq data
Ning Leng, John A. Dawson, Christina Kendziorski
October 9, 2012
Contents
1 Introduction 2
2 The Model 32.1 Two conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 More than two conditions . . . . . . . . . . . . . . . . . . . . . . 4
3 Quick Start 63.1 Gene Level DE Analysis (Two Conditions) . . . . . . . . . . . . . 6
3.1.1 Required input . . . . . . . . . . . . . . . . . . . . . . . . 63.1.2 Library size factor . . . . . . . . . . . . . . . . . . . . . . 63.1.3 Running EBSeq on gene expression estimations . . . . . . 7
3.2 Isoform Level DE Analysis (Two Conditions) . . . . . . . . . . . 83.2.1 Required inputs . . . . . . . . . . . . . . . . . . . . . . . 83.2.2 Library size factor . . . . . . . . . . . . . . . . . . . . . . 93.2.3 The Ig vector . . . . . . . . . . . . . . . . . . . . . . . . . 93.2.4 Running EBSeq on isoform expression estimations . . . . 9
3.3 Working on gene expression estimations with more than two con-ditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Working on isoform expression estimations with more than twoconditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4 More detailed examples 164.1 Gene Level DE Analysis (Two Conditions) . . . . . . . . . . . . . 16
4.1.1 Running EBSeq on simulated gene expression estimations 164.1.2 Calculating FC . . . . . . . . . . . . . . . . . . . . . . . . 174.1.3 Checking convergence . . . . . . . . . . . . . . . . . . . . 184.1.4 Checking the model fit and other diagnostics . . . . . . . 18
4.2 Isoform Level DE Analysis (Two Conditions) . . . . . . . . . . . 204.2.1 The Ig vector . . . . . . . . . . . . . . . . . . . . . . . . . 204.2.2 Using mappability ambiguity clusters instead of the Ig
vector when the gene-isoform relationship is unknown . . 21
1
4.2.3 Running EBSeq on simulated isoform expression estimations 224.2.4 Checking convergence . . . . . . . . . . . . . . . . . . . . 234.2.5 Checking the model fit and other diagnostics . . . . . . . 23
4.3 Working on gene expression estimations with more than two con-ditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Working on isoform expression estimations with more than twoconditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.5 Working without replicates . . . . . . . . . . . . . . . . . . . . . 374.5.1 Gene counts with two conditions . . . . . . . . . . . . . . 374.5.2 Isoform counts with two conditions . . . . . . . . . . . . . 374.5.3 Gene counts with more than two conditions . . . . . . . . 384.5.4 Isoform counts with more than two conditions . . . . . . 39
1 Introduction
EBSeq may be used to identify differentially expressed (DE) genes and isoformsin an RNA-Seq experiment. As detailed in Leng et al., 2012 [3], EBSeq isan empirical Bayesian approach that models a number of features observed inRNA-seq data. Importantly, for isoform level inference, EBSeq directly accom-modates isoform expression estimation uncertainty by modeling the differentialvariability observed in distinct groups of isoforms. Consider Figure 1, where wehave plotted variance against mean for all isoforms using RNA-Seq expressiondata from Leng et al., 2012 [3]. Also shown is the fit within three sub-groups ofisoforms defined by the number of constituent isoforms of the parent gene. Anisoform of gene g is assigned to the Ig = k group, where k = 1, 2, 3, if the totalnumber of isoforms from gene g is k (the Ig = 3 group contains all isoformsfrom genes having 3 or more isoforms). As shown in Figure 1, there is decreasedvariability in the Ig = 1 group, but increased variability in the others, due to therelative increase in uncertainty inherent in estimating isoform expression whenmultiple isoforms of a given gene are present. If this structure is not accommo-dated, there is reduced power for identifying isoforms in the Ig = 1 group (sincethe true variances in that group are lower, on average, than that derived fromthe full collection of isoforms) as well as increased false discoveries in the Ig = 2and Ig = 3 groups (since the true variances are higher, on average, than thosederived from the full collection). EBSeq directly models differential variabilityas a function of Ig providing a powerful approach for isoform level inference.As shown in Leng et al., 2012 [3], the model is also useful for identifying DEgenes. We will briefly detail the model in Section 2 and then describe the flowof analysis in Section 3 for both isoform and gene-level inference.
2
0.01 1 10 100 10000
0.01
110
01e
+06
Estimated Mean
Est
imat
ed V
aria
nce
10% 40% 90%
AllIg = 1Ig = 2Ig = 3
Figure 1: Empirical variance vs. mean for each isoform profiled in the ESCs vsiPSCs experiment detailed in the Case Study section of Leng et al., 2012 [3]. Aspline fit to all isoforms is shown in red with splines fit within the Ig = 1, Ig = 2,and Ig = 3 isoform groups shown in yellow, pink, and green, respectively.
2 The Model
2.1 Two conditions
We let XC1gi = Xgi,1, Xgi,2, ..., Xgi,S1 denote data from condition 1 and XC2
gi =Xgi,(S1+1), Xgi,(S1+2), ..., Xgi,S data from condition 2. We assume that countswithin condition C are distributed as Negative Binomial: XC
gi,s|rgi,s, qCgi βΌ
NB(rgi,s, qCgi) where
P (Xgi,s|rgi,s, qCgi) =
(Xgi,s + rgi,s β 1
Xgi,s
)(1β qCgi)
Xgi,s(qCgi)rgi,s (1)
and Β΅Cgi,s = rgi,s(1β qCgi)/q
Cgi ; (ΟC
gi,s)2 = rgi,s(1β qCgi)/(q
Cgi)
2.
We assume a prior distribution on qCgi : qCgi |Ξ±, Ξ²
Ig βΌ Beta(Ξ±, Ξ²Ig ). The hy-
perparameter Ξ± is shared by all the isoforms and Ξ²Ig is Ig specific (note this isan index, not a power). We further assume that rgi,s = rgi,0ls, where rgi,0 isan isoform specific parameter common across conditions and rgi,s depends onit through the sample-specific normalization factor ls. Of interest in this twogroup comparison is distinguishing between two cases, or what we will refer tosubsequently as two patterns of expression, namely equivalent expression (EE)and differential expression (DE):
H0 (EE) : qC1gi = qC2
gi vs H1 (DE) : qC1gi 6= qC2
gi .
3
Under the null hypothesis (EE), the data XC1,C2gi = XC1
gi , XC2gi arises from the
prior predictive distribution fIg0 (XC1,C2
gi ):
fIg0 (XC1,C2
gi ) =
[Sβ
s=1
(Xgi,s + rgi,s β 1
Xgi,s
)]Beta(Ξ±+
βSs=1 rgi,s, Ξ²
Ig +βS
s=1Xgi,s)
Beta(Ξ±, Ξ²Ig )
(2)Alternatively (in a DE scenario), XC1,C2
gi follows the prior predictive distri-
bution fIg1 (XC1,C2
gi ):
fIg1 (XC1,C2
gi ) = fIg0 (XC1
gi )fIg0 (XC2
gi ) (3)
Let the latent variable Zgi be defined so that Zgi = 1 indicates that isoformgi is DE and Zgi = 0 indicates isoform gi is EE, and Zgi βΌ Bernoulli(p). Then,the marginal distribution of XC1,C2
gi and Zgi is:
(1β p)f Ig0 (XC1,C2gi ) + pf
Ig1 (XC1,C2
gi ) (4)
The posterior probability of being DE at isoform gi is obtained by Bayesβ rule:
pfIg1 (XC1,C2
gi )
(1β p)f Ig0 (XC1,C2gi ) + pf
Ig1 (XC1,C2
gi )(5)
2.2 More than two conditions
EBSeq naturally accommodates multiple condition comparisons. For exam-ple, in a study with 3 conditions, there are K=5 possible expression patterns(P1,...,P5), or ways in which latent levels of expression may vary across condi-tions:
P1: qC1gi = qC2
gi = qC3gi
P2: qC1gi = qC2
gi 6= qC3gi
P3: qC1gi = qC3
gi 6= qC2gi
P4: qC1gi 6= qC2
gi = qC3gi
P5: qC1gi 6= qC2
gi 6= qC3gi and qC1
gi 6= qC3gi
The prior predictive distributions for these are given, respectively, by:
gIg1 (XC1,C2,C3
gi ) = fIg0 (XC1,C2,C3
gi )
gIg2 (XC1,C2,C3
gi ) = fIg0 (XC1,C2
gi )fIg0 (XC3
gi )
gIg3 (XC1,C2,C3
gi ) = fIg0 (XC1,C3
gi )fIg0 (XC2
gi )
gIg4 (XC1,C2,C3
gi ) = fIg0 (XC1
gi )fIg0 (XC2,C3
gi )
gIg5 (XC1,C2,C3
gi ) = fIg0 (XC1
gi )fIg0 (XC2
gi )fIg0 (XC3
gi )
4
where fIg0 is the same as in equation 2. Then the marginal distribution in
equation 4 becomes:
5βk=1
pkgIgk (XC1,C2,C3
gi ) (6)
whereβ5
k=1 pk = 1. Thus, the posterior probability of isoform gi coming frompattern K is readily obtained by:
pKgIgK (XC1,C2,C3
gi )β5k=1 pkg
Igk (XC1,C2,C3
gi )(7)
5
3 Quick Start
Before analysis can proceed, the EBSeq package must be loaded into the workingspace:
> library(EBSeq)
3.1 Gene Level DE Analysis (Two Conditions)
3.1.1 Required input
Data: The object Data should be a Gβ by β S matrix containing theexpression values for each gene and each lane (sample), where G is the numberof genes and S is the number of lanes. These values should exhibit raw counts,without normalization across samples. Counts of this nature may be obtainedfrom RSEM [4], Cufflinks [6] or other such pre-processing approaches.
Conditions: The object Conditions should be a Factor vector of length Sthat indicates to which condition each sample belongs. For example, if thereare two conditions and three samples in each, S = 6 and Conditions may begiven byas.factor(c("C1","C1","C1","C2","C2","C2"))
The object GeneMat is a simulated data matrix containing 10,000 rows of genesand 10 columns of samples. The genes are named Gene_1, Gene_2 ...
> data(GeneMat)
> str(GeneMat)
num [1:10000, 1:10] 1879 24 3291 97 485 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
..$ : NULL
3.1.2 Library size factor
As detailed in Section 2, EBSeq requires the library size factor ls for each samples. Here, ls may be obtained via the function MedianNorm, which reproduces themedian normalization approach in DESeq [1].
> Sizes=MedianNorm(GeneMat)
If quantile normalization is preferred, ls may be obtained via the functionQuantileNorm. (e.g. QuantileNorm(GeneMat,.75) for Upper-Quantile Nor-malization in [2])
6
3.1.3 Running EBSeq on gene expression estimations
The function EBTest is used to detect DE genes. For gene-level data, we donβtneed to specify the parameter NgVector since there are no differences in Igstructure among the different genes. Here, we simulated the first five lanes tobe in condition 1 and the other five in condition 2, so define:
Conditions=as.factor(rep(c("C1","C2"),each=5))
sizeFactors is used to define the library size factor of each lane. It could beobtained by summing up the total number of reads within each lane, MedianNormalization [1], scaling normalization [5], Up-Quantile Normalization [2], orsome other such approach. These in hand, we run the EM algorithm, settingthe number of iterations to five via maxround=5 for demonstration purposes.However, we note that in practice, additional iterations may be required. Con-vergence should always be checked (see Section 4.1.3 for details). Please notethis may take several minutes:
> EBOut=EBTest(Data=GeneMat,
+ Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5)
iteration 1 done
time 25.035
iteration 2 done
time 14.094
iteration 3 done
time 13.651
iteration 4 done
time 13.198
iteration 5 done
time 11.716
The posterior probabilities of being DE are obtained as follows, where PP is amatrix containing the posterior probabilities of being EE or DE for each of the10,000 simulated genes:
> PP=GetPPMat(EBOut)
> str(PP)
num [1:10000, 1:2] 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
..$ : chr [1:2] "PPEE" "PPDE"
> head(PP)
PPEE PPDE
Gene_1 0.000000e+00 1
Gene_2 0.000000e+00 1
Gene_3 0.000000e+00 1
7
Gene_4 0.000000e+00 1
Gene_5 0.000000e+00 1
Gene_6 3.289292e-10 1
The matrix PP contains two columns PPEE and PPDE, corresponding to the pos-terior probabilities of being EE or DE for each gene. PP may be used to forman FDR-controlled list of DE genes with a target FDR of 0.05 as follows:
> DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)]
> str(DEfound)
chr [1:991] "Gene_1" "Gene_2" "Gene_3" "Gene_4" "Gene_5" ...
EBSeq found 991 DE genes in total with target FDR 0.05.
3.2 Isoform Level DE Analysis (Two Conditions)
3.2.1 Required inputs
Data: The object Data should be a I β by β S matrix containing theexpression values for each isoform and each lane, where I is the number ofisoforms and S is the number of lanes. Again, these values should exhibit rawdata, without normalization across samples.
Conditions: The object Conditions should be a vector with length S toindicate the condition of each sample.
IsoformNames: The object IsoformNames should be a vector with length Ito indicate the isoform names.
IsosGeneNames: The object IsosGeneNames should be a vector with lengthI to indicate the gene name of each isoform. (in the same order asIsoformNames.)
IsoList contains 6,000 simulated isoforms. In which IsoList$IsoMat is adata matrix containing 6,000 rows of isoforms and 10 columns of samples;IsoList$IsoNames contains the isoform names; IsoList$IsosGeneNames con-tains the names of the genes the isoforms belong to.
> data(IsoList)
> str(IsoList)
List of 3
$ IsoMat : num [1:6000, 1:10] 176 789 1300 474 1061 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:6000] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" ...
.. ..$ : NULL
$ IsoNames : chr [1:6000] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" ...
$ IsosGeneNames: chr [1:6000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
8
> IsoMat=IsoList$IsoMat
> str(IsoMat)
num [1:6000, 1:10] 176 789 1300 474 1061 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:6000] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" ...
..$ : NULL
> IsoNames=IsoList$IsoNames
> IsosGeneNames=IsoList$IsosGeneNames
3.2.2 Library size factor
Similar to the gene-level analysis presented above, we may obtain the isoform-level library size factors via MedianNorm:
> IsoSizes=MedianNorm(IsoMat)
3.2.3 The Ig vector
Since EBSeq fits rely on Ig, we need to obtain the number of isoforms the hostgene contains (Ng) for each isoform therefore obtain the Ig groups. This can bedone using the function GetNg. The required inputs of GetNg are the isoformnames (IsoformNames) and their corresponding gene names (IsosGeneNames).
> NgList=GetNg(IsoNames, IsosGeneNames)
> IsoNgTrun=NgList$IsoformNgTrun
> IsoNgTrun[c(1:3,1001:1003,3001:3003)]
Iso_1_1 Iso_1_2 Iso_1_3 Iso_2_1 Iso_2_2 Iso_2_3 Iso_3_1 Iso_3_2 Iso_3_3
1 1 1 2 2 2 3 3 3
More details could be found in section 4.2.
3.2.4 Running EBSeq on isoform expression estimations
The EBTest function is also used to run EBSeq for two condition comparisonson isoform-level data. Below we use 5 iterations to demonstrate. However, asin the gene level analysis, we advise that additional iterations may be requiredin practice (see Section 4.2.4 for details).
> IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun,
+ Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5)
iteration 1 done
time 69.102
iteration 2 done
time 23.495
iteration 3 done
9
time 14.768
iteration 4 done
time 12.473
iteration 5 done
time 12.403
> IsoPP=GetPPMat(IsoEBOut)
> str(IsoPP)
num [1:6000, 1:2] 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:6000] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" ...
..$ : chr [1:2] "PPEE" "PPDE"
> head(IsoPP)
PPEE PPDE
Iso_1_1 0 1
Iso_1_2 0 1
Iso_1_3 0 1
Iso_1_4 0 1
Iso_1_5 0 1
Iso_1_6 0 1
> IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)]
> str(IsoDE)
chr [1:534] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" "Iso_1_5" ...
We see that EBSeq found 534 DE isoforms at the target FDR of 0.05.
3.3 Working on gene expression estimations with morethan two conditions
The object MultiGeneMat is a matrix containing 1,000 simulated genes with 6samples: the first two samples are from condition 1; the second and the thirdsample are from condition 2; the last two samples are from condition 3.
> data(MultiGeneMat)
> str(MultiGeneMat)
num [1:1000, 1:6] 411 1652 268 1873 768 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:1000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
..$ : NULL
In analyses where the data are spread over more than two conditions, the set ofpossible patterns for each gene is more complicated than simply EE and DE. As
10
noted in Section 2, when we have 3 conditions, there are 5 expression patternsto consider. In the simulated data, we have 6 samples, 2 in each of 3 conditions.The function GetPatterns allows the user to generate all possible patterns giventhe conditions. For example:
> Conditions=c("C1","C1","C2","C2","C3","C3")
> PosParti=GetPatterns(Conditions)
> PosParti
C1 C2 C3
Pattern1 1 1 1
Pattern2 1 1 2
Pattern3 1 2 1
Pattern4 1 2 2
Pattern5 1 2 3
where the first row means all three conditions have the same latent mean ex-pression level; the second row means C1 and C2 have the same latent meanexpression level but that of C3 is different; and the last row corresponds to thecase where the three conditions all have different latent mean expression lev-els. The user may use all or only some of these possible patterns as an inputto EBMultiTest (more on this function presently). For example, if we wereinterested in Patterns 1, 2, 4 and 5 only, weβd define:
> Parti=PosParti[-3,]
> Parti
C1 C2 C3
Pattern1 1 1 1
Pattern2 1 1 2
Pattern4 1 2 2
Pattern5 1 2 3
Moving on to the analysis, MedianNorm or one of its competitors should beused to determine the normalization factors. Once this is done, the formal testis performed by EBMultiTest.
> MultiSize=MedianNorm(MultiGeneMat)
> MultiOut=EBMultiTest(MultiGeneMat,NgVector=NULL,Conditions=Conditions,
+ AllParti=Parti, sizeFactors=MultiSize, maxround=5)
iteration 1 done
time 14.325
iteration 2 done
time 5.876
iteration 3 done
time 6.345
iteration 4 done
11
time 3.03800000000001
iteration 5 done
time 2.798
The posterior probobility of being in each pattern for every gene is obtained byusing the function GetMultiPP:
> MultiPP=GetMultiPP(MultiOut)
> names(MultiPP)
[1] "PP" "MAP" "Patterns"
> MultiPP$PP[1:10,]
Pattern1 Pattern2 Pattern4 Pattern5
Gene_1 1.667196e-95 0.4474787 5.035574e-73 0.55252128
Gene_2 3.110967e-20 0.9697363 2.313379e-19 0.03026369
Gene_3 2.082947e-159 0.9679974 1.603355e-105 0.03200255
Gene_4 1.207559e-252 0.5357814 3.144834e-185 0.46421856
Gene_5 6.347984e-27 0.9371062 1.550532e-20 0.06289380
Gene_6 2.586284e-18 0.8383464 1.363798e-19 0.16165359
Gene_7 0.000000e+00 0.4467407 0.000000e+00 0.55325926
Gene_8 2.192617e-16 0.9850391 5.905463e-17 0.01496091
Gene_9 1.180902e-15 0.9453487 3.636282e-15 0.05465126
Gene_10 8.208183e-72 0.9219273 1.217895e-67 0.07807272
> MultiPP$MAP[1:10]
Gene_1 Gene_2 Gene_3 Gene_4 Gene_5 Gene_6 Gene_7
"Pattern5" "Pattern2" "Pattern2" "Pattern2" "Pattern2" "Pattern2" "Pattern5"
Gene_8 Gene_9 Gene_10
"Pattern2" "Pattern2" "Pattern2"
> MultiPP$Patterns
C1 C2 C3
Pattern1 1 1 1
Pattern2 1 1 2
Pattern4 1 2 2
Pattern5 1 2 3
where MultiPP$PP provides the posterior probobility of being in each pattern forevery gene. MultiPP$MAP provides the most likely pattern of each gene basedon the posterior probabilities. MultiPP$Patterns provides the details of thepatterns.
12
3.4 Working on isoform expression estimations with morethan two conditions
Similar to IsoList The object IsoMultiList is a object containing the isoformexpression estimations matrix, isoform names and gene names of isoformsβ hostgenes. IsoMultiList$IsoMultiMat contains 1,000 simulated isoforms with 8samples. the first two samples are from condition 1; the second and the thirdsample are from condition 2; the fifth and sixth sample are from condition 3;the last two samples are from condition 4. Similar as in section 3.2, functionMedianNorm and GetNg could used for normalization and calculating Ngβs.
> data(IsoMultiList)
> IsoMultiMat=IsoMultiList[[1]]
> IsoNames.Multi=IsoMultiList$IsoNames
> IsosGeneNames.Multi=IsoMultiList$IsosGeneNames
> IsoMultiSize=MedianNorm(IsoMultiMat)
> NgList.Multi=GetNg(IsoNames.Multi, IsosGeneNames.Multi)
> IsoNgTrun.Multi=NgList.Multi$IsoformNgTrun
> Conditions=c("C1","C1","C2","C2","C3","C3","C4","C4")
Here we have 4 conditions, there are 15 expression patterns to consider. Thefunction GetPatterns allows the user to generate all possible patterns given theconditions. For example:
> PosParti.4Cond=GetPatterns(Conditions)
> PosParti.4Cond
C1 C2 C3 C4
Pattern1 1 1 1 1
Pattern2 1 1 1 2
Pattern3 1 1 2 1
Pattern4 1 1 2 2
Pattern5 1 2 1 1
Pattern6 1 2 1 2
Pattern7 1 2 2 1
Pattern8 1 2 2 2
Pattern9 1 1 2 3
Pattern10 1 2 1 3
Pattern11 1 2 2 3
Pattern12 1 2 3 1
Pattern13 1 2 3 2
Pattern14 1 2 3 3
Pattern15 1 2 3 4
If we were interested in Patterns 1, 2, 3, 8 and 15 only, weβd define:
> Parti.4Cond=PosParti.4Cond[c(1,2,3,8,15),]
> Parti.4Cond
13
C1 C2 C3 C4
Pattern1 1 1 1 1
Pattern2 1 1 1 2
Pattern3 1 1 2 1
Pattern8 1 2 2 2
Pattern15 1 2 3 4
Moving on to the analysis, EBMultiTest could be used to porferm the test:
> IsoMultiOut=EBMultiTest(IsoMultiMat,
+ NgVector=IsoNgTrun.Multi,Conditions=Conditions,
+ AllParti=Parti.4Cond, sizeFactors=IsoMultiSize,
+ maxround=5)
iteration 1 done
time 24.537
iteration 2 done
time 24.518
iteration 3 done
time 9.322
iteration 4 done
time 10.424
iteration 5 done
time 8.94400000000002
The posterior probobility of being in each pattern for every gene is obtained byusing the function GetMultiPP:
> IsoMultiPP=GetMultiPP(IsoMultiOut)
> names(MultiPP)
[1] "PP" "MAP" "Patterns"
> IsoMultiPP$PP[1:10,]
Pattern1 Pattern2 Pattern3 Pattern8 Pattern15
Iso_1_1 4.491089e-34 0.999067581 1.357994e-33 2.728660e-34 9.324189e-04
Iso_1_2 2.645937e-14 0.998132248 3.172235e-15 2.816801e-16 1.867752e-03
Iso_1_3 1.270054e-43 0.978252880 4.196709e-38 6.630693e-45 2.174712e-02
Iso_1_4 1.585237e-36 0.994599329 6.156303e-30 1.145819e-32 5.400671e-03
Iso_1_5 0.000000e+00 1.000000000 0.000000e+00 0.000000e+00 2.830424e-41
Iso_1_6 4.942113e-228 0.003241202 5.563985e-214 1.657836e-183 9.967588e-01
Iso_1_7 7.900232e-115 0.997756178 3.895311e-110 2.050441e-105 2.243822e-03
Iso_1_8 5.430600e-130 0.851933966 1.834973e-96 1.987346e-111 1.480660e-01
Iso_1_9 4.182136e-45 0.804797880 5.085895e-47 3.408432e-42 1.952021e-01
Iso_1_10 6.948051e-08 0.997979087 3.026767e-08 5.123101e-08 2.020762e-03
> IsoMultiPP$MAP[1:10]
14
Iso_1_1 Iso_1_2 Iso_1_3 Iso_1_4 Iso_1_5 Iso_1_6
"Pattern2" "Pattern2" "Pattern2" "Pattern2" "Pattern2" "Pattern15"
Iso_1_7 Iso_1_8 Iso_1_9 Iso_1_10
"Pattern2" "Pattern2" "Pattern2" "Pattern2"
> IsoMultiPP$Patterns
C1 C2 C3 C4
Pattern1 1 1 1 1
Pattern2 1 1 1 2
Pattern3 1 1 2 1
Pattern8 1 2 2 2
Pattern15 1 2 3 4
where MultiPP$PP provides the posterior probobility of being in each pattern forevery gene. MultiPP$MAP provides the most likely pattern of each gene basedon the posterior probabilities. MultiPP$Patterns provides the details of thepatterns.
15
4 More detailed examples
4.1 Gene Level DE Analysis (Two Conditions)
4.1.1 Running EBSeq on simulated gene expression estimations
EBSeq is applied as described in Section 3.1.3.
> data(GeneMat)
> Sizes=MedianNorm(GeneMat)
> EBOut=EBTest(Data=GeneMat,
+ Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=Sizes, maxround=5)
iteration 1 done
time 24.734
iteration 2 done
time 13.912
iteration 3 done
time 13.477
iteration 4 done
time 12.976
iteration 5 done
time 11.546
> PP=GetPPMat(EBOut)
> str(PP)
num [1:10000, 1:2] 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:10000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
..$ : chr [1:2] "PPEE" "PPDE"
> head(PP)
PPEE PPDE
Gene_1 0.000000e+00 1
Gene_2 0.000000e+00 1
Gene_3 0.000000e+00 1
Gene_4 0.000000e+00 1
Gene_5 0.000000e+00 1
Gene_6 3.289292e-10 1
> DEfound=rownames(PP)[which(PP[,"PPDE"]>=.95)]
> str(DEfound)
chr [1:991] "Gene_1" "Gene_2" "Gene_3" "Gene_4" "Gene_5" ...
EBSeq found 991 DE genes for a target FDR of 0.05.
16
4.1.2 Calculating FC
The function PostFC could be used to calculate the Fold Change (FC) as wellas the posterior FC on the normalization factor adjusted data. Figure 2 shows
> GeneFC=PostFC(EBOut)
> str(GeneFC)
List of 3
$ PostFC : Named num [1:10000] 0.239 0.244 4.152 4.238 3.927 ...
..- attr(*, "names")= chr [1:10000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
$ RealFC : Named num [1:10000] 0.239 0.24 4.154 4.305 3.942 ...
..- attr(*, "names")= chr [1:10000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
$ Direction: chr "C1 Over C2"
> PlotPostVsRawFC(EBOut,GeneFC)
ββ
ββββ
ββ
β
ββ
β
ββ
β
β
β
β
β
βββ
βββ
β
β
ββ
β
β
β
β
β
β
β
β
βββ
βββ
β
β
β
β
ββ
ββββ
β
β
β
β
β
β
β
β
ββ
β
ββ
β
βββ
β
β
β
β
β
β
ββ
ββ
β
β
ββ
β
β
ββ
β
β
βββ
β
βββ
βββ
βββ
β
β
βββββ
β
β
ββ
ββ
ββ
β
β
β
β
β
ββ
ββ
ββ
β
ββ
β
β
β
β
ββ
β
ββ
β
ββββ
β
β
ββ
ββ
β
ββββ
β
β
β
β
β
β
β
β
β
β
β
β
ββββ
β
β
β
β
ββ
βββ
β
β
β
β
βββ
ββ
β
β
ββ
ββ
β
ββ
β
β
ββ
β
β
ββ
β
βββ
β
β
β
ββββ
β
β
ββ
β
β
β
β
β
ββ
βββ
ββ
β
β
β
β
βββ
βββ
ββ
β
βββββββ
β
β
β
β
β
β
ββ
β
β
ββ
βββ
ββ
β
β
β
ββ
ββ
β
ββ
β
ββ
β
β
ββ
β
βββ
β
ββ
ββ
β
β
β
ββ
βββββββ
ββ
β
β
β
β
β
ββ
β
β
β
βββ
ββ
β
ββ
β
ββ
β
β
βββ
ββ
ββ
ββ
ββ
βββ
ββ
ββ
βββ
β
ββ
β
β
ββ
ββββ
β
β
β
β
ββ
ββ
ββ
β
ββ
ββ
β
βββ
β
β
β
β
ββ
ββ
βββββ
β
β
β
ββ
ββ
β
ββ
β
β
β
β
βββ
ββ
ββ
ββ
ββ
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
βββ
β
β
ββ
β
β
βββ
β
ββ
β
β
ββ
β
β
ββ
β
β
β
β
ββ
β
β
β
β
β
βββ
ββ
β
β
β
β
βββ
β
β
β
ββ
β
β
ββ
β
β
ββ
β
ββ
βββ
ββββ
β
β
ββ
β
β
βββ
β
β
β
β
ββ
β
β
β
ββ
βββ
β
ββ
β
βββ
β
βββ
β
β
β
ββ
β
β
β
ββ
β
ββ
β
ββ
β
ββββ
β
β
β
ββ
βββ
β
β
β
β
ββ
β
β
β
β
ββ
ββ
β
β
β
β
βββ
β
β
β
β
β
βββ
β
β
β
β
ββ
βββ
β
βββ
β
β
βββ
β
βββ
ββ
β
β
β
ββ
β
β
ββ
βββ
β
ββ
β
ββ
β
βββββ
β
ββββ
β
ββ
β
β
ββ
β
βββ
ββ
ββ
β
βββ
β
βββ
β
β
βββ
βββ
β
β
β
β
βββββ
ββββ
ββββββ
β
β
ββ
ββ
β
β
β
β
β
ββ
β
β
βββ
β
ββ
β
β
ββ
ββ
β
βββββ
βββ
β
ββ
ββ
β
β
ββ
β
ββββ
β
ββββ
β
βββ
β
β
β
β
β
β
β
β
ββ
β
ββ
β
β
β
β
β
β
β
ββββ
β
ββββ
β
β
ββ
β
β
ββ
β
β
ββ
β
βββ
ββββ
ββ
βββ
ββ
ββ
β
ββ
ββ
β
β
β
β
ββ
ββ
ββ
β
β
ββ
β
β
β
β
β
β
β
β
β
ββ
βββ
ββ
β
β
β
β
βββ
β
β
β
β
β
β
ββ
ββ
β
ββ
β
β
ββ
β
β
β
βββ
ββ
β
ββ
βββ
β
ββ
β
β
β
βββ
β
β
βββ
β
ββ
β
β
β
β
β
ββ
βββ
β
β
β
β
β
ββ
β
βββ β
β
ββ
ββ
β
β
ββββ
β
β
ββββ
β
βββ
βββ
ββ
β
β
β
β
βββ
β
β
ββ
β
β
ββ
β
β
β
β
ββ
β
ββ
ββ
β
β
β
βββ
ββ
βββ
ββ
βββββ
ββ
β
β
β
β
β
ββ
ββββββ
βββββββββββββ
ββ
ββββ
ββββββ
βββββ
β
βββ
ββ
β
βββββ
βββββββ
βββββββ
βββ
β
βββ
βββββ
ββββββ
ββ
βββββ
ββ
β
ββ
β
ββββ
ββ
β
β
ββββ
ββββ
β
β
ββ
ββββ
ββββ
β
β
β
β
β
ββ
ββ
ββββββββββ
β
β
βββββββ
β
β
β
β
ββββββββ
ββββββ
ββ
ββββββ
β
ββββββββ
β
ββ
ββββ
β
β
β
ββ
βββββββββββ
ββ
βββ
βββ
βββ
βββ
β
βββ
βββββββ
β
βββ
βββββ
ββββββββββ
β
ββββββ
β
ββ
ββββ
βββ
β
ββ
β
ββ
ββ
βββββ
ββ
βββββ
βββββ
βββ
βββββββββββ
β
ββββ
βββββββ
β
ββββ
βββββ
ββββ
βββ
β
ββ
β
β
ββββββ
ββ
β
ββ
ββββββββ
ββββ
βββ
β
ββββββ
βββ
ββββ
βββββββ
β
ββββββ
β
ββ
ββββ
ββ
ββββ
ββ
ββ
β
ββ
ββ
β
βββ
βββ
βββ
βββββββββββββββ
βββββ
ββ
ββ
ββββββββββ
ββββ
ββββββ
β
ββββββ
β
β
βββ
β
ββββββββ
β
βββ
β
ββ
ββββββ
β
ββ
βββββ
β
ββ
β
ββββ
βββ
β
βββ
ββ
βββ
ββββ
ββββββββββββββ
ββ
ββββ
ββββ
ββ
ββββββββββββ
βββββ
ββ
βββ
β
ββ
ββ
ββββ
ββββ
ββ
βββ
ββββββ
ββββββββββ
β
βββ
ββ
βββββββ
β
ββββββββββ
ββ
ββββββ
βββββ
βββββ
βββββββββββ
βββ
ββ
ββββ
ββββ
β
β
β
ββββββββββ
βββββ
βββ
ββββββββ
ββββββββββ
β
β
β
ββ
ββββ
ββββ
ββββ
βββ
ββ
ββββ
βββββ
β
βββββββββββ
β
β
β
βββββ
β
ββ
ββββ
ββ
βββββ
β
ββββββ
β
βββ
βββββββββ
ββ
β
ββββ
β
ββββββββββ
β
ββββ
βββ
βββ
β
βββββββ
ββββββ
ββββββ
β
β
ββ
β
ββ
ββ
ββ
β
β
β
βββ
βββ
ββββββ
ββββ
β
βββββββββββ
βββ
β
ββ
βββββββ
βββββββββββ
βββ
βββββ
β
ββββ
ββββββββββββ
β
ββ
ββ
βββββββββββββ
β
β
βββββββ
βββ
ββββ
β
βββ
ββ
β
ββββββ
β
βββ
β
βββ
β
βββββββββ
βββ
ββ
ββββ
βββ
β
β
ββ
βββ
βββ
β
β
βββ
β
ββββ
β
ββββ
β
βββββ
βββ
β
ββββββ
βββββ
β
β
ββββ
β
βββββββ
ββββ
ββββ
βββββ
ββ
βββββββ
ββ
ββββ
β
β
ββ
β
ββββββ
ββ
βββββ
β
βββ
β
ββ
β
ββ
β
ββββ
ββββββ
β
ββββββββ
βββ
β
β
β
ββ
ββββββ
β
βββββββ
βββββ
ββββββ
βββββ
ββ
β
ββ
β
βββββ
βββββββββββ
β
ββββββββ
βββ
βββ
βββ
ββββββββ
β
βββββββ
βββββ
ββ
βββ
ββ
β
ββββ
βββββββ
βββββ
βββββ
βββ
βββββββ
βββββββββ
ββ
β
ββββ
ββββββ
ββ
β
βββββββ
βββββββ
ββββββ
βββ
ββ
β
ββββββββββββββββ
ββ
βββ
βββ
βββ
ββββββββ
β
ββββ
β
β
βββββ
β
βββ
β
ββββββ
βββββ
ββ
βββ
β
β
β
βββββ
ββ
β
ββββ
ββ
ββββββββββββββ
ββββββ
ββββββββββ
ββ
β
β
βββββ
β
β
β
ββ
ββββ
β
ββββ
βββ
β
ββββββ
β
βββββ
βββββββ
β
βββββββ
β
βββββ
β
ββββ
β
ββ
β
β
ββββ
ββββββββββββββ
βββββ
βββ
ββ
β
ββ
βββ
βββ
ββββββ
β
ββ
ββ
ββ
β
βββββββββββββββ
ββ
ββ
β
ββ
ββ
ββ
β
βββ
β
βββββββ
ββ
ββ
ββββ
ββ
ββ
ββ
ββ
β
β
ββββββββ
ββββ
βββββββ
β
ββββββββ
ββ
β
β
β
ββββββ
β
ββ
ββββββ
ββ
ββ
βββββ
β
β
βββ
βββ
ββββ
βββ
βββ
ββββ
βββββ
βββ
β
βββββββ
βββββββ
β
ββββ
ββββββ
ββ
ββ
βββββββ
ββββ
βββββ
ββ
ββββββββ
βββββββ
ββββββββββ
ββ
β
βββ
βββ
ββββββββ
βββββββββ
βββ
β
ββ
β
β
β
ββ
β
β
β
βββββ
ββ
ββ
βββββ
βββββββββββ
βββββββ
β
ββββ
β
βββ
ββββ
β
βββββ
βββββ
ββββ
ββββ
β
β
β
ββββ
ββββ
ββββββββββββ
βββββ
β
ββ
β
βββββ
ββ
β
βββ
ββ
β
ββββββ
ββββ
β
ββ
β
β
ββββ
β
ββββββ
β
β
ββββββββ
β
βββ
β
ββ
βββββββ
β
β
β
βββββ
ββ
β
ββββββ
βββ
ββ
βββ
ββ
β
ββββ
βββ
ββ
βββ
ββββ
ββββββββββββββββββ
β
ββ
ββ
ββββββββ
β
ββββ
β
βββββ
ββββββ
ββ
ββββ
β
βββ
ββββββ
ββββββββββββ
β
ββ
βββββ
βββββ
ββ
ββββββββ
ββββ
ββ
β
ββ
βββββ
βββ
β
βββββββββββββββββββ
ββ
βββββββββ
β
ββ
ββββββ
βββ
βββββ
β
βββββ
β
ββ
βββββββββββ
β
ββ
ββββββ
β
ββββββ
ββ
βββββββ
βββββ
ββ
ββββββ
β
βββ
β
β
β
βββ
ββ
βββββββ
β
βββ
β
β
β
β
βββ
ββββββ
βββββββ
ββββββ
ββββββ
ββββββ
ββ
β
βββ
ββββββββ
ββββ
β
βββββββ
ββ
β
ββ
β
ββ
β
ββ
β
ββββββ
ββββββ
βββββββ
ββββββββββ
βββββ
βββ
βββ
β
β
β
ββ
β
βββ
βββββββ
ββ
βββ
β
βββ
β
ββ
β
ββββββββ
β
ββββ
ββββ
βββ
βββ
βββ
βββββββββββββ
ββββ
β
ββ
β
βββ
ββββ
βββββββββ
βββ
ββββ
β
ββ
ββ
ββββββ
ββ
ββββ
β
β
ββββ
βββ
βββ
ββββββ
β
βββ
ββββ
βββββ
β
ββββββββββββββ
ββββββββββββ
ββββ
βββ
β
βββββ
βββββ
β
β
β
ββ
ββββ
β
βββββ
βββ
β
ββ
β
ββββ
ββ
βββββ
ββββββ
β
β
βββ
βββ
βββ
ββ
βββ
β
βββ
ββ
ββββββ
ββ
βββββββ
ββ
β
ββ
βββββ
βββ
ββ
β
β
β
ββ
ββ
β
ββ
β
ββ
β
βββ
β
βββββ
β
ββββ
βββββ
ββ
β
βββββββββ
βββββββ
ββββββββ
ββ
β
ββ
β
βββββ
β
β
βββββββ
β
ββββββββ
βββββ
β
ββββββ
ββββ
ββ
β
βββ
β
ββ
ββ
ββββ
βββ
β
βββββ
ββ
ββ
β
β
βββ
ββββββββ
ββ
ββββ
βββ
βββββ
βββββ
ββββββββββ
ββ
βββββ
ββ
βββ
βββββ
β
ββββ
ββ
ββββ
βββββββ
β
ββ
ββ
βββ
β
βββ
ββββ
ββ
βββββ
β
ββββ
β
βββ
ββ
βββββ
ββ
β
β
βββββ
ββ
βββ
ββ
ββ
βββββ
β
β
ββββββββββ
β
βββββββββββ
β
βββ
ββββ
β
β
β
ββββββ
βββββ
ββ
β
ββββββββββββββ
ββ
ββββ
βββ
ββββββββββ
βββ
β
βββββββ
β
βββ
β
ββ
β
β
βββββββ
ββββ
ββ
βββ
ββββ
β
ββ
ββ
β
βββ
ββββ
β
ββ
β
ββββββ
β
ββ
β
βββ
β
β
βββ
βββ
β
ββ
β
βββββββββββ
β
ββββ
ββ
β
βββ
βββ
β
βββ
ββββββββββ
βββ
ββ
βββββββββββ
β
βββ
βββ
β
ββββ
ββββ
βββββ
ββββ
βββ
β
ββββ
βββ
β
β
β
β
β
βββββββ
ββββ
βββ
ββ
ββββ
βββ
βββββββ
βββββ
ββ
ββββββββββ
βββββββββ
ββ
βββ
β
ββ
ββ
ββββββββββ
β
β
βββββ
βββββ
βββ
ββββββ
βββ
β
βββββ
ββ
βββ
βββββ
β
ββββ
ββ
β
ββββββ
β
ββββ
ββ
ββββββββββ
βββββββββ
ββββ
ββ
ββ
βββ
β
β
ββββ
β
βββ
ββββ
ββ
βββββ
β
βββ
βββ
β
βββββ
β
ββ
ββββββ
βββ
β
βββ
ββ
ββ
β
β
βββ
ββ
β
βββ
βββ
β
β
ββ
β
β
β
βββββ
β
ββ
β
ββ
βββββ
β
ββ
ββ
ββββ
βββ
βββ
β
βββββββββ
ββ
ββββ
ββ
β
β
βββ
β
ββββ
ββ
βββ
βββ
ββββββ
β
βββββββ
ββββ
β
ββββββββ
β
β
β
β
ββββββββ
ββ
βββββ
ββββ
β
ββββ
β
β
βββββββ
ββ
ββββ
β
β
ββββ
β
βββββ
β
ββ
ββ
βββββ
β
ββββ
β
βββ
β
ββ
ββ
β
β
β
βββ
βββ
ββ
ββ
βββββββββ
βββ
β
β
βββ
βββ
β
ββ
ββ
βββ
ββ
βββββ
βββ
ββ
β
ββ
ββ
βββ
βββββββββββββ
ββ
β
ββ
β
β
β
βββββββ
ββ
ββ
βββββββ
β
βββββ
ββ
ββββ
ββ
β
β
βββ
βββ
β
βββββββ
βββ
β
β
ββ
β
ββ
β
βββββββ
ββββ
ββββ
βββ
ββββ
ββ
βββββββ
ββββββββ
ββ
β
βββββ
β
βββ
ββββββββββ
ββ
β
β
β
βββββ
βββββ
βββ
βββ
βββββββ
ββ
β
βββ
β
β
βββββ
βββββ
β
βββ
β
ββ
βββ
ββββββ
βββ
ββ
β
βββββ
ββββ
ββ
ββββ
ββββ
βββ
βββββ
βββ
β
βββββ
βββββ
βββββββββ
βββββ
ββ
ββββββββββββ
ββ
ββ
βββββββ
β
βββββββ
β
βββββββββ
β
β
βββ
βββββ
βββ
ββ
ββ
βββββ
ββ
ββββββ
β
ββ
ββββββββ
ββ
β
ββββββ
ββββββββββ
ββββ
β
βββββ
ββ
ββ
β
βββββ
ββ
βββββββββββββ
ββ
βββ
β
ββββ
ββ
β
ββββββ
ββ
ββββ
βββββ
βββ
β
βββ
β
βββ
ββββββ
βββ
βββ
β
ββ
β
ββ
ββ
ββββ
ββ
β
βββ
ββ
βββββββββββββ
βββ
ββββ
ββββββ
βββββ
β
ββββββ
β
βββ
ββ
β
ββ
β
βββββ
βββ
β
βββ
β
βββββββββββββββ
ββββ
ββ
β
β
ββ
β
β
ββββ
β
ββββ
ββββ
ββββββ
ββ
βββ
β
βββββ
βββ
βββ
ββ
ββββββββββββββ
ββββββββββ
ββββββ
β
βββββ
ββ
ββββ
βββ
βββββ
ββββ
βββ
β
ββββ
βββ
β
β
ββββ
β
β
ββββ
β
β
βββ
βββ
β
β
ββββ
βββ
βββββ
β
ββ
β
β
ββββββββ
βββ
βββββββββ
ββ
β
ββββ
βββββββ
βββ
β
ββββ
β
βββ
βββ
ββββ
ββββ
ββ
β
ββ
β
ββββββββββ
ββββββββ
βββββ
β
β
βββ
βββ
ββ
βββ
ββ
ββ
ββββββ
ββ
βββββ
ββ
βββββββββββ
βββββ
ββββββ
β
β
βββ
ββ
βββ
β
ββββββββ
β
ββ
ββ
ββ
β
βββββ
ββ
βββββββ
βββ
ββββββββ
ββ
ββ
βββ
βββ
β
ββββββ
ββββ
βββ
ββ
βββ
β
ββββββ
ββ
β
βββ
ββββββ
ββ
ββββββ
βββ
ββββ
ββ
βββ
βββ
ββ
β
ββ
βββββ
ββ
βββββ
ββββββββββ
ββ
βββββββββββ
ββββ
βββββ
βββββ
βββ
β
β
βββββ
ββββββ
βββββ
ββββ
ββ
ββ
ββββ
ββββββ
ββ
ββ
β
β
ββ
βββββββ
βββ
ββββ
β
ββββ
β
ββββ
β
ββββ
ββ
βββ
ββ
ββ
βββββ
β
βββ
β
βββ
β
β
ββ
β
βββββ
βββββ
βββββ
ββ
βββ
ββ
ββββ
βββββββ
ββ
ββ
β
ββ
ββββββ
ββββββββ
ββββββ
ββββββ
ββ
β
ββ
βββ
ββββ
β
β
ββ
β
βββββββ
βββββββ
βββββ
ββ
β
βββββββββ
β
β
βββββββ
βββββββ
βββ
β
β
ββββββββββ
βββββ
ββββ
ββββββββ
ββ
ββ
βββ
ββ
ββ
ββββ
ββ
βββ
ββ
β
ββββ
ββ
ββ
βββββββ
βββ
β
β
ββββ
βββ
βββββββ
β
ββ
βββββ
ββ
β
β
ββ
β
β
ββββββββββββββ
ββ
βββββ
ββββ
βββββββ
β
ββββ
β
βββββ
ββββ
β
ββ
β
βββ
ββ
βββββ
ββββ
ββββββ
ββ
βββ
ββ
β
β
ββ
ββ
ββ
βββ
βββ
β
β
β
β
βββ
β
βββ
βββββ
ββ
βββ
ββ
βββββ
β
βββ
βββββ
βββ
ββββββ
ββββββββ
β
βββ
β
βββββ
β
ββ
βββ
βββββ
β
βββ
β
ββββββββββ
β
ββ
ββββ
ββ
ββ
βββ
βββ
ββββ
β
βββββ
ββ
ββββ
ββ
β
βββββ
β
ββ
βββ
β
ββββββββββββ
β
β
βββ
β
βββββββ
β
βββββββ
βββ
ββ
β
ββ
ββββββββ
ββββββ
ββ
β
βββ
βββ
β
βββββββ
β
ββββ
ββ
β
βββ
β
βββββ
β
ββββ
β
ββ
ββ
ββ
ββββββββββ
β
ββββββ
ββ
ββ
β
ββ
ββ
βββββββββββ
βββββββ
β
βββββ
β
ββββββ
β
ββ
ββ
ββ
ββββ
ββ
ββ
β
ββββββββ
ββββ
βββ
ββββ
β
βββ
βββ
ββββββ
β
ββ
β
ββββββ
ββ
βββ
β
β
ββββββ
β
β
ββ
β
ββ
ββββββ
ββββββββββ
ββ
β
βββ
β
ββββββ
βββββββ
βββ
βββ
ββ
β
β
βββββ
ββ
βββ
βββ
βββββ
ββββββ
β
ββ
βββ
βββββ
βββββ
βββ
βββ
ββ
βββ
β
ββββββββ
βββββ
β
β
ββββββ
β
β
βββ
βββ
ββ
ββ
βββ
β
βββ
ββ
ββββββ
ββ
ββ
βββββββββββ
β
β
β
ββββββ
β
ββββ
βββββ
βββββ
ββ
βββββββ
β
ββββββ
β
ββββ
β
βββ
βββ
ββββ
ββ
β
ββββ
β
βββββββ
β
β
ββ
ββββ
β
ββββββ
β
β
β
ββββ
βββ
ββββ
ββ
βββββ
β
ββββββ
ββ
ββββ
β
ββββ
βββ
βββ
ββββββββββββ
β
β
ββββββ
ββββ
ββββ
β
β
βββ
β
β
β
ββ
ββ
ββ
β
ββ
β
ββββββββ
β
β
ββ
βββ
β
β
βββββββ
βββ
ββ
β
ββββββββ
β
ββ
β
ββ
β
ββββ
β
ββ
βββββββββ
ββββ
ββββ
ββ
βββ
β
βββββ
ββββββββ
βββ
ββ
ββ
β
ββββ
β
ββ
β
ββ
βββ
ββ
β
ββ
ββ
βββββ
ββ
βββββββ
βββββ
β
βββββββ
β
β
ββββ
β
β
ββ
βββ
ββββ
ββ
ββ
ββ
βββ
β
β
βββ
βββ
ββββββ
β
β
ββ
ββββββ
βββ
β
ββ
β
βββ
ββββββ
βββββ
ββββββ
βββββ
β
βββ
βββββ
β
βββββββ
βββββββββ
βββ
β
β
ββ
βββ
ββββββ
β
βββββ
ββββββ
ββββ
βββ
βββ
ββ
βββββββββ
β
βββ
βββ
ββ
ββββββ
βββ
β
ββββ
ββββ
β
ββ
β
βββββββ
β
β
βββ
β
βββββ
βββββ
ββ
ββ
ββ
ββ
ββββ
βββββ
βββββββ
β
βββ
βββββ
β
βββββ
βββββ
β
ββ
ββ
β
β
β
βββββββ
ββ
ββ
β
βββ
β
βββ
ββββ
ββ
ββββββββββ
β
ββ
β
ββββ
ββ
β
ββββ
β
ββββ
β
ββββββ
β
ββββ
βββββββ
β
ββββ
ββββββββββββββββ
βββββββ
βββββββ
βββ
ββ
β
ββββββββββββ
ββ
β
βββββββββ
ββββ
β
ββ
β
βββββ
ββββββ
ββ
β
β
ββββββ
ββββββ
βββ
β
ββ
ββ
ββ
ββββββββ
βββ
ββββββ
β
ββββ
βββ
ββ
βββββ
βββββ
βββ
βββ
βββ
ββββ
ββ
ββ
βββββ
β
β
β
β
ββββββ
ββ
βββ
β
ββββ
β
ββ
βββ
ββββββ
βββββ
β
ββββββ
ββββ
ββ
β
βββ
ββββ
βββ
βββ
βββ
ββ
ββ
β
β
βββ
βββββ
ββ
ββ
β
ββββββββ
ββββββ
β
βββ
ββ
ββ
β
ββββ
ββ
ββββ
β
β
β
ββββ
β
βββ
ββββ
β
ββ
βββββ
βββ
ββ
β
β
β
ββ
ββ
β
β
ββββ
ββββ
β
βββ
βββ
ββ
βββββ
β
ββββ
β
ββ
ββ
βββ
ββββββ
ββ
β
β
βββββββ
ββ
ββββ
ββββββ
β
ββββββ
β
βββββββββββββββββ
ββββ
β
ββ
βββββββ
ββββ
ββ
ββββ
β
ββββββ
β
ββ
βββββββ
ββ
ββββββββββ
β
β
β
ββ
β
βββββββββ
ββββ
βββ
βββββββ
βββ
ββ
β
β
ββ
β
β
βββ
ββ
ββ
ββ
ββ
ββ
β
βββ
ββ
β
ββββββ
ββ
βββ
ββ
βββ
ββ
β
βββββββ
βββ
βββββββββ
ββββ
β
β
β
ββββ
β
ββββ
ββ
ββββ
ββ
βββ
ββ
βββ
ββββ
β
βββ
βββββ
βββββ
ββ
β
βββ
ββ
ββ
ββ
βββ
β
β
ββββββββ
ββ
β
ββ
βββββ
β
βββ
β
βββββ
βββ
βββ
βββ
ββββββββ
β
β
ββ
β
β
ββββ
β
ββββ
β
βββ
β
ββ
β
ββ
βββ
ββ
βββ
ββββββ
ββ
β
βββββ
βββ
β
β
βββ
β
βββ
β
ββ
β
β
β
βββββββββ
βββ
ββ
βββ
β
βββ
βββββ
βββββββ
ββββββββββ
ββ
ββββ
ββββββ
β
βββββ
β
ββββββ
βββ
ββ
βββββββ
β
βββββ
β
ββ
βββ
βββββββββ
βββ
βββββββββββββ
βββ
βββββ
β
ββ
βββββ
ββ
ββββ
βββ
ββββ
βββ
βββ
ββββ
ββ
ββββ
ββ
ββ
β
β
β
β
βββ
β
ββββββββ
βββββ
β
β
β
βββββ
β
ββββ
ββββ
βββββββ
β
βββ
βββ
ββ
β
β
ββ
βββββ
βββββββββββββ
ββββββββ
ββ
ββ
ββββββββ
ββββ
ββββ
ββββββ
βββββββββββββ
β
β
ββ
β
ββ
ββββ
βββββ
β
βββββ
βββ
βββββββββ
βββββββββ
ββ
β
ββββββββββββββββ
ββ
ββββ
β
β
ββ
β
β
βββββ
ββ
β
ββββ
ββββ
ββ
βββ
ββββ
ββ
β
βββββββββ
ββββ
βββββ
β
β
βββ
β
βββ
βββ
ββββ
βββββββββ
βββββββββββ
βββ
βββββββ
ββββββββββββββ
ββ
βββ
β
β
ββββββ
βββββββ
ββββββ
βββ
β
βββ
ββββ
βββ
ββββ
ββ
βββββββββββ
βββββ
ββ
βββ
ββββββ
ββββββ
β
ββ
βββ
ββββ
ββ
βββ
βββββββ
ββ
ββββ
βββ
ββ
βββ
β
ββ
β
ββββββ
ββββββββββ
β
ββββββββββ
ββ
βββββββ
ββ
ββ
β
β
β
βββββββββββ
β
ββ
ββββ
β
ββ
β
βββββββ
ββ
ββββββ
β
ββββββ
β
ββββ
ββ
ββββ
ββββββ
βββ
β
β
β
ββββ
β
ββ
ββ
β
ββ
β
βββ
ββ
βββ
ββββ
βββ
β
β
β
β
βββ
ββββ
β
ββββ
βββββ
β
ββββ
βββ
β
ββββββ
β
βββββ
ββ
ββ
β
β
ββ
βββββ
β
β
ββ
βββββββ
βββββββββββ
β
β
ββ
β
ββββ
β
ββββ
βββββββ
βββ
ββββ
β
ββ
ββββ
βββ
βββ
βββ
ββ
ββ
βββ
ββββββββββββββ
ββββ
β
βββ
ββ
β
βββββ
βββββββββ
ββββ
ββ
ββ
ββ
β
β
ββ
βββββββ
ββ
βββ
β
βββ
ββββββββββ
ββ
ββ
β
ββ
ββ
βββ
βββββββ
ββββββ
β
βββββββββ
ββ
ββββββββ
ββββ
βββ
β
βββββββ
ββ
βββββββββ
ββββ
β
β
βββ
β
β
βββ
ββββββ
β
β
β
βββββ
ββ
βββββ
ββββββββ
ββββ
β
ββ
β
ββββββββ
β
β
ββ
ββ
ββ
ββββ
βββββββββ
ββ
β
ββββ
βββββ
β
ββ
βββββββ
β
βββββ
β
βββ
βββ
βββ
β
β
ββ
βββββ
ββββ
ββ
βββββββββ
ββββ
βββ
ββ
ββββββββ
β
βββ
ββ
βββ
βββββ
ββββββ
βββ
ββ
β
ββββ
β
βββ
β
ββββββββββββββββ
ββ
βββββββ
ββββ
βββ
ββββββββββββββββ
ββ
ββββββ
βββββ
β
β
ββ
βββββββββ
ββ
ββ
βββ
ββ
β
ββββββββ
ββ
β
βββββ
β
β
ββββββ
β
βββββ
β
ββ
ββββββ
βββ
βββ
β
ββ
β
ββββ
β
ββββββββ
β
ββ
β
βββββββββββ
ββββββ
β
ββ
β
βββββ
ββ
βββ
βββββββββ
ββ
βββββ
βββ
βββ
ββββββ
βββ
ββββββ
βββ
ββββ
ββ
β
ββ
β
β
ββ
ββ
βββ
ββ
ββββ
ββ
ββ
βββββββ
β
β
β
βββ
βββββββ
β
ββββ
ββ
β
β
β
β
ββββββββββ
ββ
β
β
βββ
ββββ
ββββ
ββ
β
β
βββββ
βββ
βββ
βββββββ
ββββββββββ
ββ
β
βββ
β
β
βββ
β
ββββββββββ
βββββββ
β
ββ
β
ββββ
β
ββ
βββ
ββββββ
β
ββββ
ββ
ββ
β
βββ
ββ
β
β
β
βββ
ββββββ
βββ
βββββ
ββββββ
ββββββ
ββ
ββββββ
βββββ
β
ββββ
β
β
βββ
βββββββββ
βββ
βββ
ββββ
β
βββββ
β
βββββββ
βββ
β
ββ
βββ
β
β
βββββββββββ
βββββββ
βββ
ββ
β
β
β
β
βββ
β
ββββββ
β
β
ββ
βββββ
β
βββ
ββββ
β
ββ
ββ
ββββ
β
βββββ
β
βββββ
ββ
ββ
ββββββ
βββ
ββββββ
βββ
β
ββ
βββ
βββββββ
βββ
β
βββββ
β
ββββββ
ββββββ
βββ
β
ββ
ββββββββ
β
βββββ
β
βββββββββ
βββ
β
ββββ
β
β
β
βββββ
ββ
ββββββββ
β
βββ
ββββββββββ
β
0.2 0.5 2.0 5.0
0.1
0.5
2.0
5.0
Posterior FC
FC
020
0060
0010
000
Rank
Figure 2: The Posterior FC vs FC on 10,000 gene expression estimations
Posterior FC vs FC on 10,000 gene expression estimations. The genes are rankedby their cross-condition mean (adjusted by the normalization factors). Theposterior FC tends to be shrinkaged (be closer to 1) on the genes with lowexpressions (small rank).
17
4.1.3 Checking convergence
As detailed in Section 2, we assume the prior distribution of qCg is Beta(Ξ±, Ξ²).The EM algorithm is used for estimating the hyper-parameters Ξ±, Ξ² and the mix-ture parameter p. The optimized parameters at each iteration may be obtainedas follows (recall we are using 5 iterations for demonstration purposes):
> EBOut$Alpha
[,1]
iter1 0.8442208
iter2 0.8451571
iter3 0.8440494
iter4 0.8441588
iter5 0.8441456
> EBOut$Beta
Ng1
iter1 1.735640
iter2 1.762926
iter3 1.760836
iter4 1.760182
iter5 1.759134
> EBOut$P
[,1]
iter1 0.1804475
iter2 0.1409131
iter3 0.1348617
iter4 0.1340951
iter5 0.1338376
In our case the differences between the 4th and 5th iteration are always lessthan 0.001.
4.1.4 Checking the model fit and other diagnostics
As noted in Leng et al., 2012 [3], EBSeq relies on parametric assumptions thatshould be checked following each analysis. The QQP function may be used toassess prior assumptions. In practice, QQP generates the Q-Q plot of the em-pirical qβs vs the simulated qβs from the Beta prior distribution with estimatedhyper-parameters. Figure 3 shows that the data points lie on the y = x line forboth condition, which indicates the good fitting of our Beta prior distribution.
18
> par(mfrow=c(2,2))
> QQP(EBOut)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Ig 1 C 1
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Ig 1 C 2
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
Figure 3: The QQ-plot for checking the assumption of a Beta prior as well asthe model fit using data from condition 1 and condition 2
Likewise, the DenNHist function may be used to check the density plot of em-pirical qβs vs the simulated qβs from the fitted Beta prior distribution. Figure 4also shows our estimated distribution fits the data very well.
19
> par(mfrow=c(2,2))
> DenNHist(EBOut)
Ig 1 C 1
Q alpha=0.84 beta=1.76
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
1.0
2.0
3.0 Data
Fitted density
Ig 1 C 2
Q alpha=0.84 beta=1.76
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
0.0
1.0
2.0
3.0 Data
Fitted density
Figure 4: The density plot for checking the model fit using data from condition1 and condition 2
4.2 Isoform Level DE Analysis (Two Conditions)
4.2.1 The Ig vector
Since EBSeq fits rely on Ig, we need to obtain the Ig for each isoform. This canbe done using the function GetNg. The required inputs of GetNg are the isoformnames (IsoformNames) and their corresponding gene names (IsosGeneNames),described above. In the simulated data, we assume that the isoforms in theIg = 1 group belong to genes Gene_1, ... , Gene_1000; The isoforms in theIg = 2 group belong to genes Gene_1001, ..., Gene_2000; and isoforms inthe Ig = 3 group belong to Gene_2001, ..., Gene_3000.
> data(IsoList)
> str(IsoList)
List of 3
$ IsoMat : num [1:6000, 1:10] 176 789 1300 474 1061 ...
20
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:6000] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" ...
.. ..$ : NULL
$ IsoNames : chr [1:6000] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" ...
$ IsosGeneNames: chr [1:6000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
> IsoMat=IsoList$IsoMat
> IsoNames=IsoList$IsoNames
> IsosGeneNames=IsoList$IsosGeneNames
> NgList=GetNg(IsoNames, IsosGeneNames, TrunThre=3)
> names(NgList)
[1] "GeneNg" "GeneNgTrun" "IsoformNg" "IsoformNgTrun"
> IsoNgTrun=NgList$IsoformNgTrun
> IsoNgTrun[c(1:3,1001:1003,3001:3003)]
Iso_1_1 Iso_1_2 Iso_1_3 Iso_2_1 Iso_2_2 Iso_2_3 Iso_3_1 Iso_3_2 Iso_3_3
1 1 1 2 2 2 3 3 3
GetNg contains 4 vectors. GeneNg (IsoformNg) provides the number of iso-forms Ng within each gene (within each isoformβs host gene). GeneNgTrun
(IsoformNgTrun) provides the truncated Ng values (Ig values). The defalt trun-cation threshold is 3, which means the values in GeneNg (IsoformNg) who aregreater than 3 will be changed to 3 in GeneNgTrun (IsoformNgTrun). We use3 in the case studies since the number of isoforms with Ng larger than 3 isrelatively small and the small sample size may induce poor parameter fittingif we treat them as separate groups. In practice, if there is evidence that theNg = 4, 5, 6... groups should be treated as separate groups, a user can changeTrunThre to define a different truncation threshold.
4.2.2 Using mappability ambiguity clusters instead of the Ig vectorwhen the gene-isoform relationship is unknown
While working with a de-novo assembled transcriptome, in which case the gene-isoform relationship is unknown, a user can use read mapping ambiguity clus-ter information instead of Ng, as provided by RSEM [4] in the output fileoutput_name.ngvec. The file contains a vector with the same length as thetotal number of transcripts. Each transcript has been assigned to one of 3 levels(1, 2, or 3) to indicate the mapping uncertainty level of that transcript. Themapping ambiguity clusters are partitioned via k-means algorithm on the un-mapability scores by RSEM. A user can read in the mapping ambiguity clusterinformation using:
> IsoNgTrun = scan(file="output_name.ngvec", what=0, sep="\n")
More details on using the RSEM-EBSeq pipeline on de novo assembled transcrip-tomes can be found at http://deweylab.biostat.wisc.edu/rsem/README.
html#de.
21
Other unmappability scores and other cluster methods (e.g. Gaussian MixedModel) could also be used to form the uncertainty clusters.
4.2.3 Running EBSeq on simulated isoform expression estimations
EBSeq can be applied as described in Section 3.2.4.
> IsoSizes=MedianNorm(IsoMat)
> IsoEBOut=EBTest(Data=IsoMat, NgVector=IsoNgTrun,
+ Conditions=as.factor(rep(c("C1","C2"),each=5)),sizeFactors=IsoSizes, maxround=5)
iteration 1 done
time 67.788
iteration 2 done
time 22.982
iteration 3 done
time 14.4580000000001
iteration 4 done
time 12.183
iteration 5 done
time 12.1990000000001
> IsoPP=GetPPMat(IsoEBOut)
> str(IsoPP)
num [1:6000, 1:2] 0 0 0 0 0 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:6000] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" ...
..$ : chr [1:2] "PPEE" "PPDE"
> IsoDE=rownames(IsoPP)[which(IsoPP[,"PPDE"]>=.95)]
> str(IsoDE)
chr [1:534] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" "Iso_1_5" ...
We see that EBSeq found 534 DE isoforms at a target FDR of 0.05.The function PostFC could also be used here to calculate the Fold Change
(FC) as well as the posterior FC on the normalization factor adjusted data.
> IsoFC=PostFC(IsoEBOut)
> str(IsoFC)
List of 3
$ PostFC : Named num [1:6000] 0.285 0.28 3.54 0.304 3.743 ...
..- attr(*, "names")= chr [1:6000] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" ...
$ RealFC : Named num [1:6000] 0.284 0.28 3.544 0.304 3.749 ...
..- attr(*, "names")= chr [1:6000] "Iso_1_1" "Iso_1_2" "Iso_1_3" "Iso_1_4" ...
$ Direction: chr "C1 Over C2"
22
4.2.4 Checking convergence
For isoform level data, we assume the prior distribution of qCgi is Beta(Ξ±, Ξ²Ig ).As in Section 4.1.3, the optimized parameters at each iteration may be obtainedas follows (recall we are using 5 iterations for demonstration purposes):
> IsoEBOut$Alpha
[,1]
iter1 0.6964958
iter2 0.7056876
iter3 0.7036317
iter4 0.7040022
iter5 0.7032276
> IsoEBOut$Beta
Ng1 Ng2 Ng3
iter1 1.642912 2.261051 2.654820
iter2 1.695865 2.376032 2.834243
iter3 1.693549 2.367979 2.827887
iter4 1.692093 2.370663 2.835483
iter5 1.692036 2.363139 2.826236
> IsoEBOut$P
[,1]
iter1 0.2072478
iter2 0.1590444
iter3 0.1473788
iter4 0.1442685
iter5 0.1430039
Here we have 3 Ξ²βs in each iteration corresponding to Ξ²Ig=1, Ξ²Ig=2, Ξ²Ig=3. Wesee that parameters are fixed within 10β2 or 10β3. In practice, we requirechanges less than 10β3 to declare convergence.
4.2.5 Checking the model fit and other diagnostics
In Leng et al., 2012[3], we showed the mean-variance differences across differentisoform groups on multiple data sets. In practice, if it is of interest to checkdifferences among isoform groups defined by truncated Ig (such as those shownhere in Figure 1), the function PolyFitValue may be used. The following codegenerates the three panels shown in Figure 5 (if condition 2 is of interest, a usercould change each C1 to C2.):
23
> par(mfrow=c(2,2))
> PolyFitValue=vector("list",3)
> for(i in 1:3)
+ PolyFitValue[[i]]=PolyFitPlot(IsoEBOut$C1Mean[[i]],
+ IsoEBOut$C1EstVar[[i]],5)
Estimated Mean
Est
imat
ed V
ar
0.1 1 10 1000 1e+05
0.1
101e
+05
Estimated MeanE
stim
ated
Var
0.1 1 10 1000 1e+05
0.1
101e
+05
Estimated Mean
Est
imat
ed V
ar
0.1 1 10 1000 1e+05
0.1
101e
+05
Figure 5: The mean-variance fitting plot for each Ng group
Superimposing all Ig groups using the code below will generate the figure(shown here in Figure 6), which is similar in structure to Figure 1:
24
> PolyAll=PolyFitPlot(unlist(IsoEBOut$C1Mean), unlist(IsoEBOut$C1EstVar),5)
> lines(log10(IsoEBOut$C1Mean[[1]][PolyFitValue[[1]]$sort]),
+ PolyFitValue[[1]]$fit[PolyFitValue[[1]]$sort],col="yellow",lwd=2)
> lines(log10(IsoEBOut$C1Mean[[2]][PolyFitValue[[2]]$sort]),
+ PolyFitValue[[2]]$fit[PolyFitValue[[2]]$sort],col="pink",lwd=2)
> lines(log10(IsoEBOut$C1Mean[[3]][PolyFitValue[[3]]$sort]),
+ PolyFitValue[[3]]$fit[PolyFitValue[[3]]$sort],col="green",lwd=2)
> legend("topleft",c("All Isoforms","Ng = 1","Ng = 2","Ng = 3"),
+ col=c("red","yellow","pink","green"),lty=1,lwd=3,box.lwd=2)
Estimated Mean
Est
imat
ed V
ar
0.1 1 10 100 1000 10000 1e+05
0.1
1010
001e
+05
1e+
07
All IsoformsNg = 1Ng = 2Ng = 3
Figure 6: The mean-variance plot for each Ng group
To generate a QQ-plot of the fitted Beta prior distribution and the qΜC βs withincondition, a user may use the following code to generate 7 panels (as in thegene-level analysis):
25
> par(mfrow=c(2,3))
> QQP(IsoEBOut)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.0 0.4 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Ig 1 C 1
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.0 0.4 0.80.
00.
20.
40.
60.
81.
0
Ig 2 C 1
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.0 0.4 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Ig 3 C 1
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββ
0.0 0.4 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Ig 1 C 2
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.0 0.4 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Ig 2 C 2
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
0.0 0.4 0.8
0.0
0.2
0.4
0.6
0.8
1.0
Ig 3 C 2
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
Figure 7: The QQ-plot of the fitted prior distribution within each Ig group
And in order to produce the plot of the fitted Beta prior densities and thehistograms of qΜC βs within each condition, the following may be used (it generatesFigure 8):
26
> par(mfrow=c(2,3))
> DenNHist(IsoEBOut)
Ig 1 C 1
Q alpha=0.7 beta=1.69
Den
sity
0.0 0.4 0.8
01
23
DataFitted density
Ig 2 C 1
Q alpha=0.7 beta=2.36
Den
sity
0.0 0.4 0.80
12
34
5
DataFitted density
Ig 3 C 1
Q alpha=0.7 beta=2.83
Den
sity
0.0 0.4 0.8
01
23
45
67
DataFitted density
Ig 1 C 2
Q alpha=0.7 beta=1.69
Den
sity
0.0 0.4 0.8
0.0
1.0
2.0
3.0 Data
Fitted density
Ig 2 C 2
Q alpha=0.7 beta=2.36
Den
sity
0.0 0.4 0.8
01
23
45
6 DataFitted density
Ig 3 C 2
Q alpha=0.7 beta=2.83
Den
sity
0.0 0.4 0.8
02
46
8
DataFitted density
Figure 8: The prior distribution fitting within each Ig group
4.3 Working on gene expression estimations with morethan two conditions
As described in Section 3.3, the function GetPatterns allows the user to gen-erate all possible patterns given the conditions. To visualize the patterns, thefunction PlotPattern may be used.As described, if we were interested in Patterns 1, 2, 4 and 5 only, weβd define:
> Parti=PosParti[-3,]
> Parti
C1 C2 C3
Pattern1 1 1 1
Pattern2 1 1 2
Pattern4 1 2 2
Pattern5 1 2 3
Moving on to the analysis, MedianNorm or one of its competitors should be
27
> Conditions=c("C1","C1","C2","C2","C3","C3")
> PosParti=GetPatterns(Conditions)
> PosParti
C1 C2 C3
Pattern1 1 1 1
Pattern2 1 1 2
Pattern3 1 2 1
Pattern4 1 2 2
Pattern5 1 2 3
> PlotPattern(PosParti)
C1
C2
C3
Pattern1
Pattern2
Pattern3
Pattern4
Pattern5
Figure 9: All possible patterns
used to determine the normalization factors. Once this is done, the formal testis performed by EBMultiTest.
> data(MultiGeneMat)
> MultiSize=MedianNorm(MultiGeneMat)
> MultiOut=EBMultiTest(MultiGeneMat,
+ NgVector=NULL,Conditions=Conditions,
28
+ AllParti=Parti, sizeFactors=MultiSize,
+ maxround=5)
iteration 1 done
time 14.434
iteration 2 done
time 5.91600000000005
iteration 3 done
time 6.28700000000003
iteration 4 done
time 3.03699999999992
iteration 5 done
time 2.80200000000002
The posterior probability of being in each pattern for every gene is obtainedusing the function GetMultiPP:
> MultiPP=GetMultiPP(MultiOut)
> names(MultiPP)
[1] "PP" "MAP" "Patterns"
> MultiPP$PP[1:10,]
Pattern1 Pattern2 Pattern4 Pattern5
Gene_1 1.667196e-95 0.4474787 5.035574e-73 0.55252128
Gene_2 3.110967e-20 0.9697363 2.313379e-19 0.03026369
Gene_3 2.082947e-159 0.9679974 1.603355e-105 0.03200255
Gene_4 1.207559e-252 0.5357814 3.144834e-185 0.46421856
Gene_5 6.347984e-27 0.9371062 1.550532e-20 0.06289380
Gene_6 2.586284e-18 0.8383464 1.363798e-19 0.16165359
Gene_7 0.000000e+00 0.4467407 0.000000e+00 0.55325926
Gene_8 2.192617e-16 0.9850391 5.905463e-17 0.01496091
Gene_9 1.180902e-15 0.9453487 3.636282e-15 0.05465126
Gene_10 8.208183e-72 0.9219273 1.217895e-67 0.07807272
> MultiPP$MAP[1:10]
Gene_1 Gene_2 Gene_3 Gene_4 Gene_5 Gene_6 Gene_7
"Pattern5" "Pattern2" "Pattern2" "Pattern2" "Pattern2" "Pattern2" "Pattern5"
Gene_8 Gene_9 Gene_10
"Pattern2" "Pattern2" "Pattern2"
> MultiPP$Patterns
C1 C2 C3
Pattern1 1 1 1
Pattern2 1 1 2
Pattern4 1 2 2
Pattern5 1 2 3
29
where MultiPP$PP provides the posterior probobility of being in each pattern forevery gene. MultiPP$MAP provides the most likely pattern of each gene basedon the posterior probabilities. MultiPP$Patterns provides the details of thepatterns.
The FC and posterior FC for multiple condition data could be obtained byfunction GetMultiFC:
> MultiFC=GetMultiFC(MultiOut)
> str(MultiFC)
List of 3
$ FCMat : num [1:1000, 1:3] 1.211 1.013 0.947 1.09 1.065 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:1000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
.. ..$ : chr [1:3] "C1OverC2" "C1OverC3" "C2OverC3"
$ Log2FCMat: num [1:1000, 1:3] 0.2761 0.019 -0.079 0.1242 0.0902 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:1000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
.. ..$ : chr [1:3] "C1OverC2" "C1OverC3" "C2OverC3"
$ CondMeans: num [1:1000, 1:3] 496 1841 252 1787 808 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:1000] "Gene_1" "Gene_2" "Gene_3" "Gene_4" ...
.. ..$ : Named chr [1:3] "C1" "C2" "C3"
.. .. ..- attr(*, "names")= chr [1:3] "Condition1" "Condition2" "Condition3"
To generate a QQ-plot of the fitted Beta prior distribution and the qΜC βs withincondition, a user could also use function DenNHist and QQP.
30
> par(mfrow=c(2,2))
> QQP(MultiOut)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββ
βββββββββββββββββββββββββββ
βββββββββββββββββββββββββ
βββββββββββββββ
ββββββββββββββββββββ
ββββββββββββββββββ
ββββββββββββ
ββββββ
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Ig 1 C 1
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββ
βββββββββββββββββ
ββββββββββββββββββββββ
βββββββββββββββββββββββ
ββββββββββββββββββββββββββββββ
ββββββββββββββββ
βββββββββββ
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Ig 1 C 2
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββ
βββββββββββββββββββββββββββ
ββββββββββββββββββββ
βββββββββββββββββββ
ββββββββββββββββ
ββββββββββββ
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Ig 1 C 3
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
Figure 10: The QQ-plot of the fitted prior distribution within each condition
4.4 Working on isoform expression estimations with morethan two conditions
Similar as in Section 3.3, the function GetPatterns allows the user to generateall possible patterns given the conditions. To visualize the patterns, the functionPlotPattern may be used.
> data(IsoMultiList)
> IsoMultiMat=IsoMultiList[[1]]
> IsoNames.Multi=IsoMultiList$IsoNames
> IsosGeneNames.Multi=IsoMultiList$IsosGeneNames
> IsoMultiSize=MedianNorm(IsoMultiMat)
> NgList.Multi=GetNg(IsoNames.Multi, IsosGeneNames.Multi)
> IsoNgTrun.Multi=NgList.Multi$IsoformNgTrun
> IsoMultiOut=EBMultiTest(IsoMultiMat,NgVector=IsoNgTrun.Multi,Conditions=Conditions,
+ AllParti=Parti.4Cond,
+ sizeFactors=IsoMultiSize, maxround=5)
31
> par(mfrow=c(2,2))
> DenNHist(MultiOut)
Ig 1 C 1
Q alpha=0.72 beta=1.57
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4 DataFitted density
Ig 1 C 2
Q alpha=0.72 beta=1.57
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
DataFitted density
Ig 1 C 3
Q alpha=0.72 beta=1.57
Den
sity
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4 DataFitted density
Figure 11: The prior distribution fitting within each condition
iteration 1 done
time 24.491
iteration 2 done
time 24.467
iteration 3 done
time 9.23300000000006
iteration 4 done
time 10.424
iteration 5 done
time 8.88199999999995
> IsoMultiPP=GetMultiPP(IsoMultiOut)
> names(MultiPP)
[1] "PP" "MAP" "Patterns"
> IsoMultiPP$PP[1:10,]
32
Pattern1 Pattern2 Pattern3 Pattern8 Pattern15
Iso_1_1 4.491089e-34 0.999067581 1.357994e-33 2.728660e-34 9.324189e-04
Iso_1_2 2.645937e-14 0.998132248 3.172235e-15 2.816801e-16 1.867752e-03
Iso_1_3 1.270054e-43 0.978252880 4.196709e-38 6.630693e-45 2.174712e-02
Iso_1_4 1.585237e-36 0.994599329 6.156303e-30 1.145819e-32 5.400671e-03
Iso_1_5 0.000000e+00 1.000000000 0.000000e+00 0.000000e+00 2.830424e-41
Iso_1_6 4.942113e-228 0.003241202 5.563985e-214 1.657836e-183 9.967588e-01
Iso_1_7 7.900232e-115 0.997756178 3.895311e-110 2.050441e-105 2.243822e-03
Iso_1_8 5.430600e-130 0.851933966 1.834973e-96 1.987346e-111 1.480660e-01
Iso_1_9 4.182136e-45 0.804797880 5.085895e-47 3.408432e-42 1.952021e-01
Iso_1_10 6.948051e-08 0.997979087 3.026767e-08 5.123101e-08 2.020762e-03
> IsoMultiPP$MAP[1:10]
Iso_1_1 Iso_1_2 Iso_1_3 Iso_1_4 Iso_1_5 Iso_1_6
"Pattern2" "Pattern2" "Pattern2" "Pattern2" "Pattern2" "Pattern15"
Iso_1_7 Iso_1_8 Iso_1_9 Iso_1_10
"Pattern2" "Pattern2" "Pattern2" "Pattern2"
> IsoMultiPP$Patterns
C1 C2 C3 C4
Pattern1 1 1 1 1
Pattern2 1 1 1 2
Pattern3 1 1 2 1
Pattern8 1 2 2 2
Pattern15 1 2 3 4
> IsoMultiFC=GetMultiFC(IsoMultiOut)
The FC and posterior FC for multiple condition data could be obtained byfunction GetMultiFC:To generate a QQ-plot of the fitted Beta prior distribution and the qΜC βs withincondition, a user could also use function DenNHist and QQP.
33
> Conditions=c("C1","C1","C2","C2","C3","C3","C4","C4")
> PosParti.4Cond=GetPatterns(Conditions)
> PosParti.4Cond
C1 C2 C3 C4
Pattern1 1 1 1 1
Pattern2 1 1 1 2
Pattern3 1 1 2 1
Pattern4 1 1 2 2
Pattern5 1 2 1 1
Pattern6 1 2 1 2
Pattern7 1 2 2 1
Pattern8 1 2 2 2
Pattern9 1 1 2 3
Pattern10 1 2 1 3
Pattern11 1 2 2 3
Pattern12 1 2 3 1
Pattern13 1 2 3 2
Pattern14 1 2 3 3
Pattern15 1 2 3 4
> PlotPattern(PosParti.4Cond)
> Parti.4Cond=PosParti.4Cond[c(1,2,3,8,15),]
> Parti.4Cond
C1 C2 C3 C4
Pattern1 1 1 1 1
Pattern2 1 1 1 2
Pattern3 1 1 2 1
Pattern8 1 2 2 2
Pattern15 1 2 3 4
C1
C2
C3
C4
Pattern1
Pattern2
Pattern3
Pattern4
Pattern5
Pattern6
Pattern7
Pattern8
Pattern9
Pattern10
Pattern11
Pattern12
Pattern13
Pattern14
Pattern15
Figure 12: All possible patterns for 4 conditions
34
> par(mfrow=c(3,4))
> QQP(IsoMultiOut)
>
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββ
ββββββββββββββββββββββ
ββββββββββ
βββββββββ
ββββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 1 C 1
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββ
βββββββββββββ
ββββββββββββββββββββββββββββ
βββββββββββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 1 C 2
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββ
βββββββββββββββ
ββββββββββββββββββββββββββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 1 C 3
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββ
ββββββββββ
βββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 1 C 4
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββ
ββββββββββ
βββββββββ
βββββββββββββ
βββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 2 C 1
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββ
βββββββββββββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 2 C 2
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββ
ββββββββββββββββββββββββββ
ββββββββ
βββββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 2 C 3
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββ
βββββββββββββ
βββββββββββββββ
βββββββββββββββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 2 C 4
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββ
βββββββββββββββββββββββββββββ
ββββββββββββ
ββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 3 C 1
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββ
ββββββββββββββββββββββ
ββββββββββ
ββββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 3 C 2
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββ
ββββββββββββββββββββββ
ββββββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 3 C 3
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββ
βββββββββββββββββ
βββββββββββββββββββββ
0.0 0.4 0.8
0.0
0.4
0.8
Ig 3 C 4
estimated q's
sim
ulat
ed q
's fr
om fi
tted
beta
Figure 13: The QQ-plot of the fitted prior distribution within each conditionand Ig group
35
> par(mfrow=c(3,4))
> DenNHist(IsoMultiOut)
Ig 1 C 1
Q alpha=0.64 beta=1.94
Den
sity
0.0 0.4 0.8
02
46
DataFitted density
Ig 1 C 2
Q alpha=0.64 beta=1.94
Den
sity
0.0 0.4 0.8
02
46 Data
Fitted density
Ig 1 C 3
Q alpha=0.64 beta=1.94
Den
sity
0.0 0.4 0.8
02
46
8
DataFitted density
Ig 1 C 4
Q alpha=0.64 beta=1.94
Den
sity
0.0 0.4 0.8
02
46
DataFitted density
Ig 2 C 1
Q alpha=0.64 beta=1.24
Den
sity
0.0 0.4 0.8
02
46
DataFitted density
Ig 2 C 2
Q alpha=0.64 beta=1.24
Den
sity
0.0 0.4 0.8
01
23
4
DataFitted density
Ig 2 C 3
Q alpha=0.64 beta=1.24
Den
sity
0.0 0.4 0.8
02
4
DataFitted density
Ig 2 C 4
Q alpha=0.64 beta=1.24
Den
sity
0.0 0.4 0.8
01
23
4 DataFitted density
Ig 3 C 1
Q alpha=0.64 beta=1.33
Den
sity
0.0 0.4 0.8
01
23
4 DataFitted density
Ig 3 C 2
Q alpha=0.64 beta=1.33
Den
sity
0.0 0.4 0.8
01
23
4 DataFitted density
Ig 3 C 3
Q alpha=0.64 beta=1.33
Den
sity
0.0 0.4 0.8
02
4 DataFitted density
Ig 3 C 4
Q alpha=0.64 beta=1.33
Den
sity
0.0 0.4 0.8
02
4 DataFitted density
Figure 14: The prior distribution fitting within each condition and Ig group
36
4.5 Working without replicates
4.5.1 Gene counts with two conditions
Sometimes people are working on the data with no replicates within any con-dition. In this case, the transcript specific variance is hard to estimate whilewe only have one value for each gene within condition. Estimating the vari-ance across condition is dangerous without the prior knowledge of being DE orEE. While working without replicates, EBSeq estimate the variance by poolingsimilar genes together. By defining NumBin = 1000 (default in EBTest), EB-Seq will group the genes with similar means together into 1,000 bins. Withthe assumption that no more than 50% genes are DE in the data set, We takegenes whose FC are in the 25% - 75% quantile of all FCβs as the candidategenes. For each bin, the bin-wise variance estimation would be the median ofthe across-condition variance estimations of the candidate genes within that bin.We use the across-condition variance estimations for the candidate genes andthe bin-wise variance estimations of the host bin for the non-candidate genes.
Using the same data as in Section 4.1, we take only sample 1 from condition1 and sample 6 from condition 2. Functions MedianNorm, GetPPMat and PostFC
could also work on data without replicates.
> data(GeneMat)
> GeneMat.norep=GeneMat[,c(1,6)]
> Sizes.norep=MedianNorm(GeneMat.norep)
> EBOut.norep=EBTest(Data=GeneMat.norep,
+ Conditions=as.factor(rep(c("C1","C2"))),
+ sizeFactors=Sizes.norep, maxround=5)
Remove transcripts with all zero
No Replicate - estimate phi 0.00368711916180493
iteration 1 done
time 11.481
iteration 2 done
time 5.40200000000004
iteration 3 done
time 5.01599999999996
iteration 4 done
time 4.24599999999998
iteration 5 done
time 4.57299999999998
> PP.norep=GetPPMat(EBOut.norep)
> DEfound.norep=rownames(PP.norep)[which(PP.norep[,"PPDE"]>=.95)]
> GeneFC.norep=PostFC(EBOut.norep)
4.5.2 Isoform counts with two conditions
We also take sample 1 and sample 6 in the data we used in Section 4.2. Examplecodes are shown below.
37
> data(IsoList)
> IsoMat=IsoList$IsoMat
> IsoNames=IsoList$IsoNames
> IsosGeneNames=IsoList$IsosGeneNames
> NgList=GetNg(IsoNames, IsosGeneNames)
> IsoNgTrun=NgList$IsoformNgTrun
> IsoMat.norep=IsoMat[,c(1,6)]
> IsoSizes.norep=MedianNorm(IsoMat.norep)
> IsoEBOut.norep=EBTest(Data=IsoMat.norep, NgVector=IsoNgTrun,
+ Conditions=as.factor(c("C1","C2")),
+ sizeFactors=IsoSizes.norep, maxround=5)
Remove transcripts with all zero
No Replicate - estimate phi 0.0154195839877669
iteration 1 done
time 24.758
iteration 2 done
time 4.66200000000003
iteration 3 done
time 5.21399999999994
iteration 4 done
time 5.15700000000004
iteration 5 done
time 4.73300000000006
> IsoPP.norep=GetPPMat(IsoEBOut.norep)
> IsoDE.norep=rownames(IsoPP.norep)[which(IsoPP.norep[,"PPDE"]>=.95)]
> IsoFC.norep=PostFC(IsoEBOut.norep)
4.5.3 Gene counts with more than two conditions
We take one sample from each condition (sample 1, 3 and 5) in the data weused in Section 4.3. Example codes are shown below.
> data(MultiGeneMat)
> MultiGeneMat.norep=MultiGeneMat[,c(1,3,5)]
> Conditions=c("C1","C2","C3")
> PosParti=GetPatterns(Conditions)
> Parti=PosParti[-3,]
> MultiSize.norep=MedianNorm(MultiGeneMat.norep)
> MultiOut.norep=EBMultiTest(MultiGeneMat.norep,
+ NgVector=NULL,Conditions=Conditions,
+ AllParti=Parti, sizeFactors=MultiSize.norep,
+ maxround=5)
No Replicate - estimate phi 0.00277960151525143
iteration 1 done
38
time 6.31100000000004
iteration 2 done
time 3.56999999999994
iteration 3 done
time 6.48500000000001
iteration 4 done
time 3.92900000000009
iteration 5 done
time 2.75399999999991
> MultiPP.norep=GetMultiPP(MultiOut.norep)
> MultiFC.norep=GetMultiFC(MultiOut.norep)
4.5.4 Isoform counts with more than two conditions
We take one sample from each condition (sample 1, 3, 5 and 7) in the data weused in Section 4.4. Example codes are shown below.
> data(IsoMultiList)
> IsoMultiMat=IsoMultiList[[1]]
> IsoNames.Multi=IsoMultiList$IsoNames
> IsosGeneNames.Multi=IsoMultiList$IsosGeneNames
> IsoMultiMat.norep=IsoMultiMat[,c(1,3,5,7)]
> IsoMultiSize.norep=MedianNorm(IsoMultiMat.norep)
> NgList.Multi=GetNg(IsoNames.Multi, IsosGeneNames.Multi)
> IsoNgTrun.Multi=NgList.Multi$IsoformNgTrun
> Conditions=c("C1","C2","C3","C4")
> PosParti.4Cond=GetPatterns(Conditions)
> PosParti.4Cond
C1 C2 C3 C4
Pattern1 1 1 1 1
Pattern2 1 1 1 2
Pattern3 1 1 2 1
Pattern4 1 1 2 2
Pattern5 1 2 1 1
Pattern6 1 2 1 2
Pattern7 1 2 2 1
Pattern8 1 2 2 2
Pattern9 1 1 2 3
Pattern10 1 2 1 3
Pattern11 1 2 2 3
Pattern12 1 2 3 1
Pattern13 1 2 3 2
Pattern14 1 2 3 3
Pattern15 1 2 3 4
39
> Parti.4Cond=PosParti.4Cond[c(1,2,3,8,15),]
> Parti.4Cond
C1 C2 C3 C4
Pattern1 1 1 1 1
Pattern2 1 1 1 2
Pattern3 1 1 2 1
Pattern8 1 2 2 2
Pattern15 1 2 3 4
> IsoMultiOut.norep=EBMultiTest(IsoMultiMat.norep,
+ NgVector=IsoNgTrun.Multi,Conditions=Conditions,
+ AllParti=Parti.4Cond, sizeFactors=IsoMultiSize.norep,
+ maxround=5)
No Replicate - estimate phi 0.00244688865176316
iteration 1 done
time 20.26
iteration 2 done
time 20.14
iteration 3 done
time 10.567
iteration 4 done
time 9.94200000000001
iteration 5 done
time 7.11299999999994
> IsoMultiPP.norep=GetMultiPP(IsoMultiOut.norep)
> IsoMultiFC.norep=GetMultiFC(IsoMultiOut.norep)
40
References
[1] S Anders and W Huber. Differential expression analysis for sequence countdata. Genome Biology, 11:R106, 2010.
[2] J H Bullard, E A Purdom, K D Hansen, and S Dudoit. Evaluation ofstatistical methods for normalization and differential expression in mrna-seqexperiments. BMC Bioinformatics, 11:94, 2010.
[3] N. Leng, J.A. Dawson, J.A Thomson, V Ruotti, R. A. Rissman, B.M.GSmits, J.D. Hagg, M.N. Gould, R.M. Stwart, and C. Kendziorski. Ebseq:An empirical bayes hierarchical model for inference in rna-seq experiments.BMI technical report, University of Wisconsin Madison, 226, 2012.
[4] B Li and C N Dewey. Rsem: accurate transcript quantification from rna-seq data with or without a reference genome. BMC Bioinformatics, 12:323,2011.
[5] M D Robinson and Oshlack A. A scaling normalization method for differ-ential expression analysis of rna-seq data. Genome Biology, 11:R25, 2010.
[6] C Trapnell, A Roberts, L Goff, G Pertea, D Kim, D R Kelley, H Pimentel,S L Salzberg, J L Rinn, and L Pachter. Differential gene and transcriptexpression analysis of rna-seq experiments with tophat and cufflinks. NatureProtocols, 7(3):562β578, 2012.
41