1-11-20051 for a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 is a...

14
1-11-2005 1 For a specific gene x ij = i th measurement under condition j, i=1,…,6; j=1,2 Is a Specific Gene Differentially Expressed Differential expression 1 2 ) σ , μ ( ~ x 2 1 i1 N ) σ , μ ( ~ x 2 2 i2 N Statistical Model of observed data Estimate the model parameters based on the data 2 2 ) 1 ( ) 1 ( ˆ 2 2 2 1 2 2 n s n s n s n x x n i ij j 1 j ˆ 1 ) ( 1 2 2 n x x s n i j ij j Calculating t-statistic n 2 s t 1 2 * x x t * -t * -4 -2 0 2 4 0.0 0.1 0.2 0.3 0.4 t-statistics -4 -2 0 2 4 0.0 0.1 0.2 0.3 0.4 t-statistics Calculating p-value based on the “null distribution” of the t- statistic assuming 1 = 2

Upload: garry-skinner

Post on 18-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

  • Is a Specific Gene Differentially ExpressedFor a specific gene xij = ith measurement under condition j, i=1,,6; j=1,2Differential expression 1 2

  • Genome-wide analysisHow do we perform t-test for 30,000 at onceHow do we handle results, present data and resultsWhat is significantHow to compare different approaches to normalization of the data and the statistical analysis of resultsIdeally, we would like to maximize our ability to identify truly differentially expressed genes and minimize the falsely implicated genes.Doing it by hand (by R) firstUsing Bioconductor

  • Calculating t-test for 30,000 genes at a timeData import : source("http://eh3.uc.edu/ImportSimpleData.R")>SimpleData SimpleData[1,] Name ID W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W61 no name Rn30000100 85 57 91 71 67 111 72 86 88 108 124 171> W C
  • Calculating t-test for 30,000 genes at a timeTransforming data : source("http://eh3.uc.edu/TransformSimpleData.R")> NoZerosData NoZerosData[33525,] W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W633525 94 51 75 56 53 0 79 84 87 73 86 0> NoZerosData[NoZerosData==0] NoZerosData[33525,] W1 C1 W2 C2 W3 C3 C4 W4 C5 W5 C6 W633525 94 51 75 56 53 NA 79 84 87 73 86 NAlog(0) = -Inflog(-1)=-Inffunction(-Inf) = -Inf or Inf or NaNrm.na=TRUE> LSimpleData LSimpleData[,3:14]
  • Calculating t-test for 30,000 genes at a timeCalculating t-tests : source("http://eh3.uc.edu/MultipleTTest.R")MW
  • Displaying results Scatter Plotssource("http://eh3.uc.edu/TTestScatterPlots.R")

  • Displaying results - Histogramssource("http://eh3.uc.edu/TTestHistograms.R")

  • Expression Data on Individual Microarrayssource("http://eh3.uc.edu/MicroarrayScatterPlots.R")

  • Microarray-Specific Normalization of Expression DataNormalization is the process of removing systematic biases prior to statistical analysisSystematic intensity-dependent trends are considered a systematic bias since it is extremely unlikely that they are a consequence of some underlying biological mechanism of interestThis particular bias is effectively removed by estimating the intensity-dependent "trend" using the local regression and subtracting it from the observed ratiosWe will generally consider that normalization procedures do not affect independence of experimental replicates they are performed separately for each microarraySome biases cannot be factored out without introducing certain level of correlation between replicate.Such biases will be factored out within the statistical model that will then account for introducing such correlation (through multi-way Analysis of Variance Model)

  • Local Regression Normalizationsource("http://eh3.uc.edu/LoessNormalization.R")

  • Normalized Datasource("http://eh3.uc.edu/NormalizedDataScatterPlots.R")

  • Normalized Data Displaying results Scatter Plotssource(http://eh3.uc.edu/NormalizedTTests.R)

  • Comparing Normalized and Raw Data Resultssource("http://eh3.uc.edu/ComparingTTests.R")

  • Comparing Normalized and Raw Data Resultssource("http://eh3.uc.edu/ComparingTTests.R")