pathway analysis using bioconductor the global test revisited

15
Pathway analysis using BioConductor The global test revisited

Upload: octavia-smith

Post on 23-Dec-2015

225 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Pathway analysis using BioConductor The global test revisited

Pathway analysis using BioConductor

The global test revisited

Page 2: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Overview

• Introduction

• Annotation

• Pathway analysis

• Demonstration

Page 3: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Introduction

• Pathway

• Set of related genes

• Functional

• Structural

• Described as lists of gene identifiers

• Micro array

• 1000s of tests

• Description

• Location on chip/slide

• Sequence ID

• On chip replication

Page 4: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Feature description

• Proprietary ID (Affymetrix/Agilent)

• GenBank, RefSeq, EnsembleID

• Symbol, LocusLink /Entrez Gene,Unigene

• SwissProt

• Chromosomal location

• EC number, GO, KEGG

Page 5: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Page 6: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Page 7: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Annotation sources

• Batch Gene Finder: http://cgap.nci.nih.gov/Genes

• BioMart: http://www.ebi.ac.uk/BioMart/martview

• Resourcerer: http://www.tigr.org/tigr-scripts/magic/r1.pl

• Bioconductor metadata http://www.bioconductor.org

• NettAffx http://www.affymetrix.com/analysis/index.affx

Page 8: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Create Annotation for Array

• Select / create unique identifier for probes on array

• i.e. Use positional information b01r03c14

• Use this identifier as rownames of data and annotation

• Use annotation sources to connect sequence ids to gene ids

UniGene LocusLink SymbolA28102_atAB000114_at Hs.94070 4958 OMDAB000115_at Hs.389724 10964 IFI44LAB000220_at Hs.269109 10512 SEMA3CAB000381_s_atAB000409_at Hs.371594 8569 MKNK1

Smp1 Smp2 Smp3 Smp4A28102_at 140.7 164.73 137.8 53.15AB000114_at 115.88 617.08 97.3 393.94AB000115_at 259.19 393.79 193.66 32.09AB000220_at 130.83 258.38 213.3 31.83AB000381_s_at 7505.4 18152.14 4990.21 25.35AB000409_at 166.29 418.2 125.12 78.88

Page 9: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Connecting sequence ids to gene ids

# 2 matrices, myAnnot and GBFAnnot (from Gene Batch Finder)

# Create temporary annotation with correct dimensions

tmpAnnot<- matrix("", nrow=nrow(veerannot), ncol=ncol(GBFAnnot),

dimnames=list(rownames(veerannot), colnames(GBFAnnot)))

ind<-match(myAnnot[,1],rownames(GBFAnnot))

tmpAnnot[!is.na(ind),]<-GBFAnnot[ind[!is.na(ind)],]

myAnnot<-cbind(myAnnot,tmpAnnot)

Genbankr1c1 NM_002658r1c2 NM_000476r1c3 NM_002581r1c4 NM_000466r1c5 NM_002343

Symbol Description Genbank Unigene LocusLinkNM_000476 AK1 Adenylate kinase 1 NM_000476 Hs.175473 203NM_002343 LTF Lactotransferrin NM_002343 Hs.529517 4057NM_000466 PEX1 Peroxisome biogenesis factor 1 NM_000466 Hs.164682 5189NM_002658 PLAU Plasminogen activator, urokinase NM_002658 NM_001001791 Hs.77274 5328NM_002581

ProbeId Genbank Symbol Name Genbank Unigene LocusLinkr1c1 NM_002658 PLAU Plasminogen activator, urokinase NM_002658 NM_001001791 Hs.77274 5328r1c2 NM_000476 AK1 Adenylate kinase 1 NM_000476 Hs.175473 203r1c3 NM_002581r1c4 NM_000466 PEX1 Peroxisome biogenesis factor 1 NM_000466 Hs.164682 5189r1c5 NM_002343 LTF Lactotransferrin NM_002343 Hs.529517 4057

Page 10: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Selecting probes by pathway

• Using BioConductor metadata package

• Using BioConductor GO and Mapping

>library(hgu95av2)>get("GO:0005868",envir=hgu95av2GO2PROBE) NAS <NA> TAS <NA> TAS ISS "37300_at" "40318_at" "40319_at" "40949_at" "40950_at" "946_at"

> library(GO)> ll<-get("GO:0005868",envir=GOLOCUSID)> rownames(myAnnot)[myAnnot[,”LocusLink”] %in% ll][1] "Contig51966_RC" "NM_004411" "Contig47291_RC" "NM_006141" [5] "NM_014183" "AB002323" "NM_006519"

Page 11: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Pathway analysis

• List based methods

• Order based methods

• Statistical combination of results

Page 12: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

List based Pathway analysis

• Compare the proportion if differentially expressed genes in a

pathway to the proportion on the array

• R: phyper(), GOHyperG() in GOstats package (BioConductor)

• phyper(PWde,ARde,ARall-ARde,PWall,lower.tail=FALSE)

Diff Expressed AllIn Pathway 15 100On Array 1000 10000

>phyper(15,1000,9000,100,lower.tail=FALSE)

[1] 0.03910265

Page 13: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Order based analysis

• Genes are ordered by difference, from up- to non- to

downregulated. Interesting pathways form clusters along this

order

• In R: Gene Set Enrichment Analysis (GSEA) package http://www.broad.mit.edu/gsea/software/software_index.html

Page 14: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Statistical combination of results

• All genes in a pathway contribute their statistical influence

• In R: globaltest package (BioConductor)

01

02

03

04

0

influ

en

ce

20

42

90

_s

_a

t

22

15

88

_x

_a

t

22

15

89

_s

_a

t

22

15

90

_s

_a

t

20

08

22

_x

_a

t

21

00

50

_a

t

21

30

11

_s

_a

t

higher expression in NA sampleshigher expression in 0 samples

Page 15: Pathway analysis using BioConductor The global test revisited

R User group 6 dec 2005

Demonstration