analysis of differential expression t-test anova non-parametric methods correlation regression
Post on 21-Dec-2015
229 views
TRANSCRIPT
![Page 1: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/1.jpg)
Analysis of Differential Expression
T-testANOVANon-parametric methodsCorrelationRegression
![Page 2: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/2.jpg)
Research Question
Do nicotine-exposed rats have different X gene expression than control rats in ventral tegmental area? Design an experiment in which treatment rats
(N>2) are exposed to nicotine and control rats (N>2) are exposed to saline.
Collect RNA from VTA, convert to cDNA Determine the amount of X transcript in each
individual. Perform a test of means considering the
variability within each group.
![Page 3: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/3.jpg)
Observed difference between groups
May be due to Treatment Chance
![Page 4: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/4.jpg)
Hypothesis Testing
Null hypothesis: There is no difference between the means of the groups.
Alternative hypothesis: Means of the groups are different.
![Page 5: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/5.jpg)
Hypothesis testing
You can not accept null hypothesis You can reject it You can support it
![Page 6: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/6.jpg)
P-value
The ‘P’ stands for probability, and measures how likely it is that any observed difference between groups is due to chance, alone.
![Page 7: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/7.jpg)
P-value
there is a significant difference between groups if the P value is small enough (e.g., <0.05).
P value equals to the probability of type I error.
Type I error: wrongly concluding that there is a difference between groups (false positive).
Type II error: wrongly concluding that there is no difference between groups (false negative).
![Page 8: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/8.jpg)
Multiple tests on the same data
Expression data on multiple genes from the same individuals
Subsets of genes are coregulated thus they are not independent.
Such data requires multiple tests.
![Page 9: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/9.jpg)
Why not do multiple t-tests? Or if you do, adjust the p-values
Because it increases type I error: a study involving four treatments, there are
six possible pairwise comparisons. If the chance of a type I error in one such
comparison is 0.05, then the chance of not committing a type I error is 1 – 0.05 = 0.95.
then the chance of not committing a type I error in any one of them is 0.956 = 0.74.
Cumulative type I error = 1-0.74=0.26
![Page 10: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/10.jpg)
Normal Distribution
it is entirely defined by two quantities: its mean and its standard deviation (SD). The mean determines where the peak
occurs and the SD determines the shape of the
curve.
![Page 11: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/11.jpg)
Curves: same mean, different stds
![Page 12: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/12.jpg)
Rules of normal distribution
68.3% of the distribution falls within 1 SD of the mean (i.e. between mean – SD and mean + SD);
95.4% of the distribution falls between mean – 2 SD and mean + 2 SD;
99.7% of the distribution falls between mean – 3 SD and mean + 3 SD.
![Page 13: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/13.jpg)
Most commonly used rule
95% of the distribution falls between mean – 1.96 SD and mean + 1.96 SD
If the data are normally distributed, one can use a range (confidence interval) within which 95% of the data falls into.
![Page 14: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/14.jpg)
A sample
Samples vary Samples are collected in limited
numbers They are representatives of a
population. A sample:
E.g., nicotine treated rat RNA
![Page 15: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/15.jpg)
Sample means
Consider all possible samples of fixed size (n) drawn from a population.
Each of these samples has its own mean and these means will vary between samples.
Each sample will have their own distribution, thus their own std.
![Page 16: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/16.jpg)
Population mean
The mean of all the sample means is equal to the population mean ().
SD of the sample means measures the deviation of individual sample means from the population mean ()
![Page 17: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/17.jpg)
Standard error
It reflects the effect of sample size, larger the SE, either the variation is high or sample size is small.
![Page 18: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/18.jpg)
Confidence Intervals
a confidence interval gives a range of values within which it is likely that the true population value lies.
It is defined as follows: 95% confidence interval (sample mean –
1.96 SE) to (sample mean + 1.96 SE). a 99% confidence interval (calculated as
mean ± 2.56 SE)
![Page 19: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/19.jpg)
T-distribution
The t-distribution is similar in shape to the Normal distribution, being symmetrical and unimodal, but is generally more spread out with longer tails.
The exact shape depends on a quantity known as the ‘degrees of freedom’, which in this context is equal to the sample size minus 1.
![Page 20: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/20.jpg)
T-distribution
![Page 21: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/21.jpg)
One-sample t-test
Null hypothesis: Sample mean does not differ from hypothesized mean, e.g., 0 (Ho: =0)
A t-statistics (t) is calculated. t is the number of SEs that separate the sample mean from
the hypothesized value. The associated P value is obtained by comparison with the t
distribution. Larger the t-statistics, lower the probability of obtaining such a
large value, thus p is smaller and more significant.
![Page 22: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/22.jpg)
Paired t-test
Used with paired data. Paired data arise in a number of
different situations, a matched case–control study in which
individual cases and controls are matched to each other, or
A repeat measures study in which some measurement is made on the same set of individuals on more than one occasion
![Page 23: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/23.jpg)
Paired t-test
![Page 24: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/24.jpg)
Two-sample t-test
Comparison of two groups with unpaired data. E.g., comparison of individuals of
treatment and those of control for a particular variable.
Now there are two independent populations thus two STDs
![Page 25: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/25.jpg)
Calculation of pooled STD
The pooled SD for the difference in means is calculated as follows:
![Page 26: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/26.jpg)
Calculation of pooled SE
the combined SE gives more weight to the larger sample size (if sample sizes are unequal) because this is likely to be more reliable. The pooled SD for the difference in means is calculated as follows:
![Page 27: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/27.jpg)
Two sample T-test
Comparison of means of two groups based on a t-statistics and its student’s t-distribution. dividing the difference between the
sample means by the standard error of the difference.
![Page 28: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/28.jpg)
T-statistic
A P value may be obtained by comparison with the t distribution on n1 + n2 – 2 degrees of freedom.
Again, the larger the t statistic, the smaller the P value will be.
![Page 29: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/29.jpg)
Example
X-gene exprs. Tumor Control
# of samples 119 117
Mean 81 95
Std 18 19
![Page 30: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/30.jpg)
Calculation of SD
![Page 31: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/31.jpg)
Calculation of SE
![Page 32: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/32.jpg)
T-statistic
t = (95-81)/2.41 = 14/2.41 = 5.81, with a corresponding P value less than 0.0001.
Reject null hypothesis that states that sample means do not differ.
![Page 33: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/33.jpg)
Analysis of Variance
ANOVA A technique for analyzing the way in
which the mean of a variable is affected by different types and combinations of factors. E.g., the effect of three different diets on total
serum cholesterol
![Page 34: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/34.jpg)
Sample Experiment
Variance:
![Page 35: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/35.jpg)
Sum of squares calculations
totalwithinbetween
![Page 36: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/36.jpg)
Degrees of freedom
![Page 37: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/37.jpg)
Sources of variation
P value of 0.0039 means that at least two of the treatment groups are different.
![Page 38: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/38.jpg)
Multiple Tests
Post hoc comparisons between pairs of treatments.
Overall type I error rate increases by increasing number of pairwise comparisons.
One has to maintain the 0.05 type I error rate after all of the comparisons.
![Page 39: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/39.jpg)
Bonferroni Adjustment
0.05/#of tests Too conservative
![Page 40: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/40.jpg)
NonParametric methods
Many statistical methods require assumptions. T-test requires samples are normally
distributed. They require transformations
Nonparametric methods require very little or no assumptions.
![Page 41: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/41.jpg)
Wilcoxon signed rank test for paired data
![Page 42: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/42.jpg)
Wilcoxon signed rank test
![Page 43: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/43.jpg)
Central venous oxygen saturation on admission and after 6 h into ICU.
Take the difference between the paired data points. Patients have
SvO2 values on admission and after 6 hours.
![Page 44: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/44.jpg)
Central venous oxygen saturation on admission and after 6 h into ICU.
Rank differences regardless of their sign.
Give a sign to the ranked differences
![Page 45: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/45.jpg)
Calculate
Sum of positive ranks (R+)
Sum of negative ranks (R-)
![Page 46: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/46.jpg)
Sum of positive and negative ranks
![Page 47: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/47.jpg)
Critical values for WSR test when n = 10
5
![Page 48: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/48.jpg)
Wilcoxon sum or Mann-Whitney test
Wilcoxon signed rank is good for paired data.
For unpaired data, wilcoxon sum test is used.
![Page 49: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/49.jpg)
Steps of Wilcoxon rank-sum test
![Page 50: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/50.jpg)
Total drug doses in patients with a 3 to 5 day stay in intensive care unit.
Rank all observations in the increasing order regardless of groupings
Use average rank if the values tie
Add up the ranks Select the smaller
value, calculate a p-value for it.
![Page 51: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/51.jpg)
Critical values
![Page 52: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/52.jpg)
Correlation and Regression
Correlation quantifies the strength of the relationship between two paired samples.
Regression expresses the relationship in the form of an equation.
Example: whether two genes, X and Y are coregulated, or the expression level of gene X can be predicted based on the expression level of gene Y.
![Page 53: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/53.jpg)
Product moment correlation
r lies between -1 and +1
![Page 54: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/54.jpg)
Age and urea for 20 patients in emergency unit
![Page 55: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/55.jpg)
Scattergram
r = 0.62
![Page 56: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/56.jpg)
Confidence intervals around r
![Page 57: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/57.jpg)
Confidence of r
![Page 58: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/58.jpg)
![Page 59: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/59.jpg)
Misuse of correlation
There may be a third variable both of the variables are related to
It does not imply causation. A nonlinear relationship may exist.
![Page 60: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/60.jpg)
Regression
![Page 61: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/61.jpg)
Method of least squares
The regression line is obtained using the method of least squares. Any line y = a + bx that we draw through the points gives a predicted or fitted value of y for each value of x in the dataset.
For a particular value of x the vertical difference between the observed and the fitted value of y is known as the deviation or residual.
The method least squares finds the values a and b that minimizes the sum of squares of all deviations.
![Page 62: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/62.jpg)
Age and urea level
![Page 63: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/63.jpg)
Residuals
![Page 64: Analysis of Differential Expression T-test ANOVA Non-parametric methods Correlation Regression](https://reader036.vdocuments.mx/reader036/viewer/2022062421/56649d575503460f94a3601b/html5/thumbnails/64.jpg)
Method of least squares