pathway and gene set analysis part 1 - biostat.washington.edu · alison motsinger-reif, phd...
TRANSCRIPT
![Page 1: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/1.jpg)
Alison Motsinger-Reif, PhDBioinformatics Research Center
Department of StatisticsNorth Carolina State University
Pathway and Gene Set AnalysisPart 1
![Page 2: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/2.jpg)
Theearlystepsofamicroarraystudy
• ScientificQuestion(biological)• Studydesign(biological/statistical)• ConductingExperiment(biological)• Preprocessing/NormalizingData(statistical)• Findingdifferentiallyexpressedgenes(statistical)
![Page 3: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/3.jpg)
Adataexample
• Leeetal(2005)comparedadiposetissue(abdominalsubcutaenousadipocytes)betweenobeseandleanPimaIndians
• SampleswerehybridisedonHGu95e-Affymetrixarrays(12639genes/probesets)
• AvailableasGDS1498ontheGEOdatabase• Weselectedthemalesamplesonly
– 10obesevs9lean
![Page 4: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/4.jpg)
![Page 5: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/5.jpg)
The“Result”Probe Set ID log.ratio pvalue adj.p73554_at 1.4971 0.0000 0.000491279_at 0.8667 0.0000 0.001774099_at 1.0787 0.0000 0.010483118_at -1.2142 0.0000 0.013981647_at 1.0362 0.0000 0.013984412_at 1.3124 0.0000 0.022290585_at 1.9859 0.0000 0.025884618_at -1.6713 0.0000 0.025891790_at 1.7293 0.0000 0.035080755_at 1.5238 0.0000 0.035185539_at 0.9303 0.0000 0.035190749_at 1.7093 0.0000 0.035174038_at -1.6451 0.0000 0.035179299_at 1.7156 0.0000 0.035172962_at 2.1059 0.0000 0.035188719_at -3.1829 0.0000 0.035172943_at -2.0520 0.0000 0.035191797_at 1.4676 0.0000 0.035178356_at 2.1140 0.0001 0.035990268_at 1.6552 0.0001 0.0421
WhathappenedtotheBiology???
![Page 6: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/6.jpg)
SlightlymoreinformativeresultsProbe Set ID Gene SymbolGene Title go biological process termgo molecular function term log.ratio pvalue adj.p73554_at CCDC80 coiled-coil domain containing 80--- --- 1.4971 0.0000 0.000491279_at C1QTNF5 /// MFRPC1q and tumor necrosis factor related protein 5 /// membrane frizzled-related proteinvisual perception /// embryonic development /// response to stimulus--- 0.8667 0.0000 0.001774099_at --- --- --- --- 1.0787 0.0000 0.010483118_at RNF125 ring finger protein 125 immune response /// modification-dependent protein catabolic processprotein binding /// zinc ion binding /// ligase activity /// metal ion binding-1.2142 0.0000 0.013981647_at --- --- --- --- 1.0362 0.0000 0.013984412_at SYNPO2 synaptopodin 2 --- actin binding /// protein binding1.3124 0.0000 0.022290585_at C15orf59 chromosome 15 open reading frame 59--- --- 1.9859 0.0000 0.025884618_at C12orf39 chromosome 12 open reading frame 39--- --- -1.6713 0.0000 0.025891790_at MYEOV myeloma overexpressed (in a subset of t(11;14) positive multiple myelomas)--- --- 1.7293 0.0000 0.035080755_at MYOF myoferlin muscle contraction /// blood circulationprotein binding 1.5238 0.0000 0.035185539_at PLEKHH1 pleckstrin homology domain containing, family H (with MyTH4 domain) member 1--- binding 0.9303 0.0000 0.035190749_at SERPINB9 serpin peptidase inhibitor, clade B (ovalbumin), member 9anti-apoptosis /// signal transductionendopeptidase inhibitor activity /// serine-type endopeptidase inhibitor activity /// serine-type endopeptidase inhibitor activity /// protein binding1.7093 0.0000 0.035174038_at --- --- --- --- -1.6451 0.0000 0.035179299_at --- --- --- --- 1.7156 0.0000 0.035172962_at BCAT1 branched chain aminotransferase 1, cytosolicG1/S transition of mitotic cell cycle /// metabolic process /// cell proliferation /// amino acid biosynthetic process /// branched chain family amino acid metabolic process /// branched chain family amino acid biosynthetic process /// branched chain family amino acid biosynthetic processcatalytic activity /// branched-chain-amino-acid transaminase activity /// branched-chain-amino-acid transaminase activity /// transaminase activity /// transferase activity /// identical protein binding2.1059 0.0000 0.035188719_at C12orf39 chromosome 12 open reading frame 39--- --- -3.1829 0.0000 0.035172943_at --- --- --- --- -2.0520 0.0000 0.035191797_at LRRC16A leucine rich repeat containing 16A--- --- 1.4676 0.0000 0.035178356_at TRDN triadin muscle contraction receptor binding 2.1140 0.0001 0.035990268_at C5orf23 chromosome 5 open reading frame 23--- --- 1.6552 0.0001 0.0421
Ifwearelucky,someofthetopgenesmeansomething tous
Butwhatiftheydon’t?
Andhowwhataretheresultsforothergeneswithsimilarbiologicalfunctions
![Page 7: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/7.jpg)
Howtoincorporatebiologicalknowledge
• Thetypeofknowledgewedealwithisrathersimple:
Weknowgroups/setsofgenesthatforexample– Belongtothesamepathway– Haveasimilarfunction– Arelocatedonthesamechromosome,etc…
• Wewillassumethesegroupingstobegiven,i.e.wewillnotyetdiscussmethodsusedtodetectpathways,networks,geneclusters• Wewilllater!
![Page 8: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/8.jpg)
Whatisapathway?
• Nocleardefinition– Wikipedia:“In biochemistry,metabolicpathways are
seriesofchemicalreactionsoccurringwithinacell.Ineachpathway,aprincipalchemicalismodifiedbychemicalreactions.”
– Thesepathwaysdescribeenzymesandmetabolites
• Butoftentheword“pathway”isalsousedtodescribegeneregulatorynetworksorproteininteractionnetworks
• Inallcasesapathwaydescribesabiologicalfunctionveryspecifically
![Page 9: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/9.jpg)
WhatisaGeneSet?• Justwhatitsays:asetofgenes!
– AllgenesinvolvedinapathwayareanexampleofaGeneSet
– AllgenescorrespondingtoaGeneOntologytermareaGeneSet
– AllgenesmentionedinapaperofSmithetalmightformaGeneSet
• AGeneSetisamuchmoregeneralandlessspecificconceptthanapathway
• Still:wewillsometimesusetwowordsinterchangeably,astheanalysismethodsaremainlythesame
![Page 10: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/10.jpg)
WhereDoGeneSets/ListsComeFrom?
• Molecularprofilinge.g.mRNA,protein– Identificationà Genelist– Quantificationà Genelist+values– Ranking,Clustering(biostatistics)
• Interactions:Proteininteractions,Transcriptionfactorbindingsites(ChIP)
• Geneticscreene.g.ofknockoutlibrary• Associationstudies(Genome-wide)
– Singlenucleotidepolymorphisms(SNPs)– Copynumbervariants(CNVs)
– ……..
![Page 11: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/11.jpg)
WhatisGeneSet/Pathwayanalysis?
• Theaimistogiveonenumber(score,p-value)toaGeneSet/Pathway– Aremanygenesinthepathwaydifferentiallyexpressed(up-regulated/downregulated)
– Canwegiveanumber(p-value)totheprobabilityofobservingthesechangesjustbychance?
![Page 12: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/12.jpg)
Goals• Pathwayandgenesetdataresources
• Geneattributes• Databaseresources
• GO,KeGG,Wikipathways,MsigDB• Geneidentifiersandissueswithmapping
• Differencesbetweenpathwayanalysistools• Selfcontainedvs.competitivetests• Cut-offmethodsvs.globalmethods• Issueswithmultipletesting
![Page 13: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/13.jpg)
Goals• Pathwayandgenesetdataresources
• Geneattributes• Databaseresources
• GO,KeGG,Wikipathways,MsigDB• Geneidentifiersandissueswithmapping
• Differencesbetweenpathwayanalysistools• Selfcontainedvs.competitivetests• Cut-offmethodsvs.globalmethods• Issueswithmultipletesting
![Page 14: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/14.jpg)
GeneAttributes• Functionalannotation
– Biologicalprocess,molecularfunction,celllocation
• Chromosomeposition• Diseaseassociation• DNAproperties
– TFbindingsites,genestructure(intron/exon),SNPs
• Transcriptproperties– Splicing,3’UTR,microRNA bindingsites
• Proteinproperties– Domains,secondaryandtertiarystructure,PTMsites
• Interactionswithothergenes
![Page 15: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/15.jpg)
GeneAttributes• Functionalannotation
– Biologicalprocess,molecularfunction,celllocation
• Chromosomeposition• Diseaseassociation• DNAproperties
– TFbindingsites,genestructure(intron/exon),SNPs
• Transcriptproperties– Splicing,3’UTR,microRNA bindingsites
• Proteinproperties– Domains,secondaryandtertiarystructure,PTMsites
• Interactionswithothergenes
![Page 16: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/16.jpg)
DatabaseResources• Usefunctionalannotationtoaggregategenesintopathways/genesets
• Anumberofdatabasesareavailable– Differentanalysistoolslinktodifferentdatabases– Toomanydatabasestogointodetailoneveryone– Commonlyusedresources:
• GO• KeGG• MsigDB• WikiPathways
![Page 17: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/17.jpg)
PathwayandGeneSetdataresources
• TheGeneOntology(GO)database– http://www.geneontology.org/– GOoffersarelational/hierarchicaldatabase– Parentnodes:moregeneralterms– Childnodes:morespecificterms– Attheendofthehierarchytherearegenes/proteins– Atthetopthereare3parentnodes:biologicalprocess,molecularfunctionandcellularcomponent
• Example:wesearchthedatabasefortheterm“inflammation”
![Page 18: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/18.jpg)
Thegenesonourarraythatcodeforoneofthe44geneproductswouldformthecorresponding “inflammation”geneset
![Page 19: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/19.jpg)
WhatistheGeneOntology(GO)?
• Setofbiologicalphrases(terms)whichareappliedtogenes:– proteinkinase– apoptosis– membrane
• Ontology:Aformalsystemfordescribingknowledge
![Page 20: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/20.jpg)
GOStructure• Termsarerelatedwithinahierarchy– is-a– part-of
• Describesmultiplelevelsofdetailofgenefunction
• Termscanhavemorethanoneparentorchild
![Page 21: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/21.jpg)
GOStructurecell
membrane chloroplast
mitochondrial chloroplastmembrane membrane
is-apart-of
Speciesindependent.Somelower-leveltermsarespecifictoagroup, buthigher leveltermsarenot
![Page 22: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/22.jpg)
WhatGOCovers?
• GOtermsdividedintothreeaspects:– cellularcomponent– molecularfunction– biologicalprocess
glucose-6-phosphate isomeraseactivity
Celldivision
![Page 23: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/23.jpg)
Terms• WheredoGOtermscomefrom?
– GOtermsareaddedbyeditorsatEBIandgeneannotationdatabasegroups
– Termsaddedbyrequest– Expertshelpwithmajordevelopment– 27734terms,98.9%withdefinitions.
• 16731biological_process• 2385cellular_component• 8618molecular_function
![Page 24: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/24.jpg)
• Genesarelinked,orassociated,withGOtermsbytrainedcuratorsatgenomedatabases– Knownas‘geneassociations’orGOannotations– Multipleannotationspergene
• SomeGOannotationscreatedautomatically
Annotations
![Page 25: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/25.jpg)
AnnotationSources• Manualannotation
– Createdbyscientificcurators• Highquality• Smallnumber(time-consumingtocreate)
• Electronicannotation– Annotationderivedwithouthumanvalidation
• Computationalpredictions(accuracyvaries)• Lower‘quality’thanmanualcodes
• Keypoint:beawareofannotationorigin
![Page 26: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/26.jpg)
EvidenceTypes• ISS: Inferred from Sequence/Structural Similarity• IDA: Inferred from Direct Assay• IPI: Inferred from Physical Interaction• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern• TAS: Traceable Author Statement• NAS: Non-traceable Author Statement• IC: Inferred by Curator• ND: No Data available
• IEA: Inferred from electronic annotation
![Page 27: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/27.jpg)
SpeciesCoverage
• Allmajoreukaryoticmodelorganismspecies
• HumanviaGOAgroupatUniProt
• SeveralbacterialandparasitespeciesthroughTIGRandGeneDB atSanger
• Newspeciesannotationsindevelopment
![Page 28: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/28.jpg)
VariableCoverage
LomaxJ.GetreadytoGO!Abiologist's guidetotheGeneOntology.BriefBioinform.2005Sep;6(3):298-304.
![Page 29: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/29.jpg)
ContributingDatabases– BerkeleyDrosophila GenomeProject(BDGP)– dictyBase (Dictyostelium discoideum)– FlyBase (Drosophilamelanogaster)– GeneDB (Schizosaccharomyces pombe,Plasmodiumfalciparum, Leishmania
major andTrypanosoma brucei)– UniProtKnowledgebase (Swiss-Prot/TrEMBL/PIR-PSD)andInterPro databases– Gramene (grains, including rice,Oryza)– MouseGenomeDatabase(MGD)andGeneExpressionDatabase(GXD) (Mus
musculus)– RatGenomeDatabase(RGD) (Rattus norvegicus)– Reactome– Saccharomyces GenomeDatabase(SGD) (Saccharomyces cerevisiae)– TheArabidopsis InformationResource(TAIR) (Arabidopsis thaliana)– TheInstituteforGenomicResearch(TIGR):databasesonseveralbacterial
species– WormBase (Caenorhabditis elegans)– ZebrafishInformationNetwork(ZFIN):(Danio rerio)
![Page 30: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/30.jpg)
GOSlimSets• GOhastoomanytermsforsomeuses– Summaries(e.g.Piecharts)
• GOSlimisanofficialreducedsetofGOterms– Generic,plant,yeast
![Page 31: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/31.jpg)
GOSoftwareTools
• GOresourcesarefreelyavailabletoanyonewithoutrestriction– Includestheontologies,geneassociationsandtoolsdevelopedbyGO
• OthergroupshaveusedGOtocreatetoolsformanypurposes– http://www.geneontology.org/GO.tools
![Page 32: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/32.jpg)
AccessingGO:QuickGO
http://www.ebi.ac.uk/ego/
![Page 33: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/33.jpg)
OtherOntologies
http://www.ebi.ac.uk/ontology-lookup
![Page 34: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/34.jpg)
KEGGpathwaydatabase
• KEGG=KyotoEncyclopediaofGenesandGenomes– http://www.genome.jp/kegg/pathway.html– ThepathwaydatabasegivesfarmoredetailedinformationthanGO• Relationshipsbetweengenesandgeneproducts
– But:thisdetailedinformationisonlyavailableforselectedorganismsandprocesses
– Example:Adipocytokinesignalingpathway
![Page 35: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/35.jpg)
![Page 36: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/36.jpg)
KEGGpathwaydatabase
• Clickingonthenodesinthepathwayleadstomoreinformationongenes/proteins– Otherpathwaysthenodeisinvolvedwith– EntriesinGene/Proteindatabases– References– Sequenceinformation
• UltimatelythisallowstofindcorrespondinggenesonthemicroarrayanddefineaGeneSetforthepathway
![Page 37: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/37.jpg)
Wikipathways
• http://www.wikipathways.org
• Awikipedia forpathways– Onecanseeanddownloadpathways– Butalsoeditandcontributepathways
• TheprojectislinkedtotheGenMAPP andPathvisio analysis/visualisationtools
![Page 38: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/38.jpg)
![Page 39: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/39.jpg)
MSigDB• MSigDB =MolecularSignatureDatabasehttp://www.broadinstitute.org/gsea/msigdb
• RelatedtothetheanalysisprogramGSEA• MSigDB offersgenesetsbasedonvariousgroupings– Pathways– GOterms– Chromosomalposition,…
![Page 40: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/40.jpg)
![Page 41: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/41.jpg)
SomeWarnings• Inmanycasesthedefinitionofapathway/genesetina
databasemightdifferfromthatofascientist
• Thenodesinpathwaysareoftenproteinsormetabolites;theactivityofthecorrespondinggenesetisnotnecessarilyagoodmeasurementoftheactivityofthepathway
• Therearemanymoreresourcesoutthere(BioCarta,BioPax)
• Commercialpackagesoftenusetheirownpathway/genesetdefinitions(Ingenuity,Metacore,Genomatix,…)
• GenesinagenesetareusuallynotgivenbyaProbeSetID,butrefertosomegenedatabase(Entrez IDs,Unigene IDs)
• Conversioncanleadtoerrors!
![Page 42: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/42.jpg)
SomeWarnings• Inmanycasesthedefinitionofapathway/genesetina
databasemightdifferfromthatofascientist
• Thenodesinpathwaysareoftenproteinsormetabolites;theactivityofthecorrespondinggenesetisnotnecessarilyagoodmeasurementoftheactivityofthepathway
• Therearemanymoreresourcesoutthere(BioCarta,BioPax)
• Commercialpackagesoftenusetheirownpathway/genesetdefinitions(Ingenuity,Metacore,Genomatix,…)
• GenesinagenesetareusuallynotgivenbyaProbeSetID,butrefertosomegenedatabase(Entrez IDs,Unigene IDs)
• Conversioncanleadtoerrors!
![Page 43: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/43.jpg)
GeneAttributes• Functionalannotation
– Biologicalprocess,molecularfunction,celllocation• Chromosomeposition• Diseaseassociation• DNAproperties
– TFbindingsites,genestructure(intron/exon),SNPs• Transcriptproperties
– Splicing,3’UTR,microRNA bindingsites• Proteinproperties
– Domains,secondaryandtertiarystructure,PTMsites• Interactionswithothergenes
![Page 44: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/44.jpg)
SourcesofGeneAttributes
• Ensembl BioMart (eukaryotes)– http://www.ensembl.org
• Entrez Gene(general)– http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene
• Modelorganismdatabases– E.g.SGD:http://www.yeastgenome.org/
• Manyothers…..
![Page 45: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/45.jpg)
EnsemblBioMart• Convenientaccesstogenelistannotation
Selectgenome
Selectfilters
Selectattributestodownload
![Page 46: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/46.jpg)
GeneandProteinIdentifiers• Identifiers(IDs)areideallyunique,stablenamesornumbersthathelptrackdatabaserecords– E.g.SocialInsuranceNumber,EntrezGeneID41232
• Geneandproteininformationstoredinmanydatabases– à GeneshavemanyIDs
• Recordsfor:Gene,DNA,RNA,Protein– Importanttorecognizethecorrectrecordtype– E.g.Entrez Generecordsdon’tstoresequence.TheylinktoDNAregions,RNAtranscriptsandproteins.
![Page 47: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/47.jpg)
NCBIDatabaseLinks
http://www.ncbi.nlm.nih.gov/Database/datamodel/data_nodes.swf
NCBI:U.S.NationalCenterforBiotechnologyInformation
PartofNationalLibraryofMedicine(NLM)
![Page 48: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/48.jpg)
CommonIdentifiersSpecies-specificHUGOHGNCBRCA2MGIMGI:109337RGD2219ZFINZDB-GENE-060510-3FlyBase CG9097WormBase WBGene00002299orZK1067.1SGDS000002187orYDL029WAnnotationsInterPro IPR015252OMIM600185Pfam PF09104GeneOntologyGO:0000724SNPs rs28897757ExperimentalPlatformAffymetrix 208368_3p_s_atAgilentA_23_P99452CodeLink GE60169Illumina GI_4502450-S
GeneEnsembl ENSG00000139618EntrezGene675UnigeneHs.34012
RNAtranscriptGenBankBC026160.1RefSeq NM_000059Ensembl ENST00000380152
ProteinEnsembl ENSP00000369497RefSeq NP_000050.2UniProt BRCA2_HUMANorA1YBP1_HUMANIPIIPI00412408.1EMBLAF309413PDB1MIU
Red=Recommended
![Page 49: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/49.jpg)
IdentifierMapping
• SomanyIDs!– Mapping(conversion)isaheadache
• Fourmainuses– Searchingforafavoritegenename– Linktorelatedresources– Identifiertranslation
• E.g.Genestoproteins,Entrez GenetoAffy– Unificationduringdatasetmerging
• Equivalentrecords
![Page 50: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/50.jpg)
IDMappingServices
• Synergizer– http://llama.med.harvard.edu/syner
gizer/translate/
• EnsemblBioMart– http://www.ensembl.org
• UniProt– http://www.uniprot.org/
![Page 51: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/51.jpg)
IDMappingChallenges• Avoiderrors:mapIDscorrectly• Genenameambiguity– notagoodID
– e.g.FLJ92943,LFS1,TRP53,p53– Bettertousethestandardgenesymbol:TP53
• Excelerror-introduction– OCT4ischangedtoOctober-4
• Problemsreaching100%coverage– E.g.duetoversionissues– Usemultiplesourcestoincreasecoverage
ZeebergBRetal.Mistakenidentifiers: genenameerrorscanbeintroducedinadvertentlywhenusingExcelinbioinformatics BMCBioinformatics.2004Jun 23;5:80
![Page 52: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/52.jpg)
Goals• Pathwayandgenesetdataresources
• Geneattributes• Databaseresources
• GO,KeGG,Wikipathways,MsigDB• Geneidentifiersandissueswithmapping
• Differencesbetweenpathwayanalysistools• Selfcontainedvs.competitivetests• Cut-offmethodsvs.globalmethods• Issueswithmultipletesting
![Page 53: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/53.jpg)
Goals• Pathwayandgenesetdataresources
• Geneattributes• Databaseresources
• GO,KeGG,Wikipathways,MsigDB• Geneidentifiersandissueswithmapping
• Differencesbetweenpathwayanalysistools• Selfcontainedvs.competitivetests• Cut-offmethodsvs.globalmethods• Issueswithmultipletesting
![Page 54: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/54.jpg)
AimsofAnalysis
• Reminder:Theaimistogiveonenumber(score,p-value)toaGeneSet/Pathway– Aremanygenesinthepathwaydifferentiallyexpressed(up-regulated/downregulated)?
– Canwegiveanumber(p-value)totheprobabilityofobservingthesechangesjustbychance?
– Similartosinglegeneanalysisstatisticalhypothesistestingplaysanimportantrole
![Page 55: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/55.jpg)
Generaldifferencesbetweenanalysistools
• Selfcontainedvs competitivetest– Thedistinctionbetween“self-contained”and“competitive”methodsgoesbacktoGoeman andBuehlman (2007)
– Aself-containedmethodonlyusesthevaluesforthegenesofageneset
• Thenullhypothesishereis:H={“NogenesintheGeneSetaredifferentiallyexpressed”}
– Acompetitivemethodcomparesthegeneswithinthegenesetwiththeothergenesonthearrays
• HerewetestagainstH:{“ThegenesintheGeneSetarenotmoredifferentiallyexpressedthanothergenes”}
![Page 56: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/56.jpg)
Example:AnalysisfortheGO-Term“inflammatoryresponse”(GO:0006954)
![Page 57: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/57.jpg)
BacktotheRealDataExample
• UsingBioconductor softwarewecanfind96probesetsonthearraycorrespondingtothisterm
• 8outofthesehaveap-value<5%
• Howmanysignificantgeneswouldweexpectbychance?
• Dependsonhowwedefine“bychance”
![Page 58: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/58.jpg)
The“self-contained”version
• Bychance(i.e.ifitisNOTdifferentiallyexpressed)ageneshouldbesignificantwithaprobabilityof5%
• Wewouldexpect96x 5%=4.8significantgenes
• Usingthebinomialdistributionwecancalculatetheprobabilityofobserving8ormoresignificantgenesasp=0.108,i.e.notquitesignificant
![Page 59: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/59.jpg)
The“competitive”version• Overall1272outof12639
genesaresignificantinthisdataset(10.1%)
• Ifwerandomlypick96geneswewouldexpect96x 10.1%=9.7genestobesignificant“bychance”
• Ap-valuecanbecalculatedbasedonthe 2x2table
• Testsforassociation:Chi-Square-TestorFisher’sexacttest
In GS Not in GSsig 8 1264
non-sig 88 11 279
P-valuefromFisher’sexacttest(one-sided):0.733,i.e veryfarfrombeingsignificant
![Page 60: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/60.jpg)
CompetitiveTests• Competitiveresultsdependhighlyonhowmanygenesareon
thearrayandpreviousfiltering– Onasmalltargetedarraywhereallgenesarechanged,acompetitive
methodmightdetectnodifferentialGeneSetsatall
• Competitivetestscanalsobeusedwithsmallsamplesizes,evenforn=1– BUT:Theresultgivesnoindicationofwhetheritholdsforawider
populationofsubjects,thep-valueconcernsapopulationofgenes!
• Competitiveteststypicallygivelesssignificantresultsthanself-contained(asseenwiththeexample)
• Fisher’sexacttest(competitive)isprobablythemostwidelyusedmethod!
![Page 61: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/61.jpg)
Cut-offmethodsvs wholegenelistmethods
• Aproblemwithbothtestsdiscussedsofaris,thattheyrelyonanarbitrarycut-off
• Ifwecallagenesignificantfor10%alphathresholdtheresultswillchange– Inourexamplethebinomialtestyieldsp=0.022,i.e.forthiscut-offtheresultissignificant!
• Wealsoloseinformationbyreducingap-valuetoabinary(“significant”,“non-significant”)variable– Itshouldmakeadifference,whetherthenon-significantgenesinthesetarenearlysignificantorcompletelyunsignificant
![Page 62: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/62.jpg)
P-value histogram for inflammation genes
pvalue[incl]
Frequency
0.0 0.2 0.4 0.6 0.8 1.0
05
1015
• Wecanstudythedistributionofthep-valuesinthegeneset
• Ifnogenesaredifferentiallyexpressedthisshouldbeauniformdistribution
• Apeakontheleftindicates,thatsomegenesaredifferentiallyexpressed
• WecantestthisforexamplebyusingtheKolmogorov-Smirnov-Test
• Herep=0.082,i.e.notquitesignificant
•Thiswouldbea“self-contained”test,asonlythegenesinthegenesetarebeingused
![Page 63: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/63.jpg)
Kolmogorov-SmirnovTest
• TheKS-testcomparesanobservedwithanexpectedcumulativedistribution
• TheKS-statisticisgivenbythemaximumdeviationbetweenthetwo
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Observed and Expected culmulative distribution
x
Fn(x)
![Page 64: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/64.jpg)
Histogram of the ranks of p-values for inflammation genes
p.rank[incl]
Frequency
0 2000 4000 6000 8000 10000 12000 14000
05
1015
• AlternativelywecouldlookatthedistributionoftheRANKSofthep-valuesinourgeneset
• Thiswouldbeacompetitivemethod,i.e wecompareourgenesetwiththeothergenes
• AgainonecanusetheKolmogorov-Smirnovtesttotestforuniformity
• Here:p=0.851,i.e.veryfarfromsignificance
![Page 65: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/65.jpg)
Othergeneralissues• Directionofchange
– Inourexamplewedidn’tdifferentiatebetweenupordown-regulatedgenes
– Thatcanbeachievedbyrepeatingtheanalysisforp-valuesfromone-sidedtest
• Eg.wecouldfindGO-Termsthataresignificantlyup-regulated– Withmostsoftwarebothapproachesarepossible
• MultipleTesting– AswearetestingmanyGeneSets,weexpectsomesignificantfindings
“bychance”(falsepositives)– Controllingthefalsediscoveryrateistricky:Thegenesetsdooverlap,
sotheywillnotbeindependent!• EvenmoretrickyinGOanalysiswherecertainGOtermsaresubsetofothers
– TheBonferroni-Methodismostconservative,butalwaysworks!
![Page 66: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/66.jpg)
• Resampling strategies(dependencebetweengenes)– Themethodsweusedsofarinourexampleassumethatgenesareindependentofeachother…ifthisisviolatedthep-valuesareincorrect
– Resampling ofgroup/phenotypelabelscancorrectforthis
– Wegiveanexampleforourdataset
Multiple TestingforPathways
![Page 67: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/67.jpg)
ExampleResampling Approach1. Calculatetheteststatistic,e.g.thepercentageofsignificant
genesintheGeneSet
2. Randomlyre-shufflethegrouplabels(lean,obese)betweenthesamples
3. Repeattheanalysisforthere-shuffleddatasetandcalculateare-shuffledversionoftheteststatistic
4. Repeat2and3manytimes(thousands…)
5. Weobtainadistributionofre-shuffled%ofsignificantgenes:thepercentageofre-shuffledvaluesthatarelargerthantheoneobservedin1isourp-value
![Page 68: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/68.jpg)
• Thereshufflingtakesgenetogenecorrelationsintoaccount
• Manyprogramsalsooffertoresamplethegenes:ThisdoesNOTtakecorrelationsintoaccount
• Roughlyspeaking:– Resampling phenotypes:correspondstoself-containedtest
– Resampling genes:correspondstocompetitivetest
Resampling Approach
![Page 69: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/69.jpg)
• Genesbeingpresentmorethanonce– Commonapproaches
• Combineduplicates(average,median,maximum,…)• Ignore(i.e treatduplicateslikedifferentgenes)
• Usingsummarystatisticsvs usingalldata– Ourexamplesusedp-valuesasdatasummaries– Otherapproachesusefold-changes,signaltonoiseratios,etc…
– Somemethodsarebasedontheoriginaldataforthegenesinthegenesetratherthanonasummarystatistic
Resampling Approaches
![Page 70: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/70.jpg)
Resampling Approaches
• Theresamplingapproachesarehighlycomputationallyintensive
• Newmethodsarebeingdevelopedtospeedthisup– Empiricalapproximationsofpermutations– Empiricalpathwayanalysis,withoutpermutation.
• ZhouYH,BarryWT,WrightFA.Biostatistics.2013Jul;14(3):573-85.doi:10.1093/biostatistics/kxt004.Epub 2013Feb20.
![Page 71: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/71.jpg)
Summary
• Databases• Choicemakesadifference• NotallusethesameIDs– watchoutJ• Majordifferencesbetweenmethods• Issueswithmultipletesting
• Nextlecture,willgointomoredetailonafewmethods
![Page 72: Pathway and Gene Set Analysis Part 1 - biostat.washington.edu · Alison Motsinger-Reif, PhD Bioinformatics Research Center Department of Statistics North Carolina State University](https://reader031.vdocuments.mx/reader031/viewer/2022040623/5d4e6d1488c993fd6f8ba4b6/html5/thumbnails/72.jpg)
Questions?