dahlia nielsen north carolina state university bioinformatics research center
TRANSCRIPT
Dahlia Nielsen
North Carolina State University
Bioinformatics Research Center
Microarray Animation
http://www.bio.davidson.edu/Courses/ genomics/chip/chip.html
Importing data into JMP/Genomics Need two (paired) tables
Data: expression intensities Experimental design
Data probably originally exists in separate files: one file per sample/microarray first create experimental design file
Experimental Design File
Required Columns columnname file Array (can be “made up” values) intensity
if using text file input dye (or channel) if two-color platform
cy3 vs cy5
Experimental Design File
Required Columns Other columns
information about samples treatment class phenotype …
Data Analysis Steps
QC distribution analysis correlation plots
Normalization more QC
same as above Analysis Results visualization
Data Analysis Steps
QC distribution analysis correlation plots
Normalization more QC
same as above Analysis Results visualization
JMP/Genomics creates a script for each of these
can run script to re-create results (without re-doing analyses)
QC
Distribution analysis visualization of how consistent your data/samples
are useful for detecting problem arrays
Correlation plots also a measure of array consistency
Normalization
Lots of choices Lots of discussion No right / wrong Depends in part on your goals Different degrees
very “light” (mixed model) intermediate (loess) more “heavy-handed” (quantile)
More QC
Indication of success of normalization procedure
as before … consistency between arrays/samples detect problem arrays
Analysis
Generally performed one gene at a time Hypothesis-testing framework
ANOVA (test for changes in expression levels across treatment groups)
multiple-testing adjustment necessary Exploratory procedures
pca cluster analysis
Volcano plots
Visualization tool to display results plot of effect size (x-axis) vs. significance
level (y-axis) Some genes may display large differences
between treatment groups, but also high variance (less significance)
Some genes might display smaller effect sizes, but expression values very consistent (low var.) … smaller p-values
Final results
Probably should consider not only pvalues, but also magnitude of effect
small changes (in spite of small pvalues) might not be replicable inherent accuracy of microarrays tendency of performing experiments with small
sample sizes
Final check on results
Once identify genes with significant results e.g. expression levels significantly different
between treatment groups Examine data
Is the change identified (above) readily apparent? Normalized data … And raw data