continue to use the permutation test to analyze other sage libraries

1
Timothy H. W. Chan, Calum MacAulay, Wan Lam, Stephen Lam, Kim Lonergan, Steven Jones, Marco Marra, Raymond T. Ng Department of Computer Science, University of British Columbia The British Columbia Cancer Research Centre Previously analyzed publicly available Breast and Brain SAGE libraries using the permutation test (Ng. et al, Frontiers of Cardiovascular Science 2003) and had some success (60% of top ranked genes for breast SAGE data were verified to be related to the neoplastic process). BC Cancer Research Centre has produced various Lung Cancer SAGE libraries including 5 CIS (carcinoma in situ), 6 Invasive and 17 Normal libraries. It would be interesting to use the permutation test to contrast and compare the various stages of lung cancer and search for small transcriptional changes (pathway regulators, check points, switches). To use the permutation test on normal and different stages of lung cancer (CIS and Invasive) SAGE libraries to discover candidate cancer-related genes. To contrast and compare these two stages of lung cancer. To demonstrate the advantages and power the permutation test holds over the T-test. To reduce comparison errors, the tag frequencies are normalized by scaling each library up to 300,000. Continue to use the permutation test to analyze other SAGE libraries. The permutation also has the power to detect small transcriptional changes as long as the gene across all the libraries have a consistent Tag count. Further analysis of these low TAG count significant genes (with high permutation scores) is required as they could be vital pathway regulators, checkpoints or switches that may have led to the onset of lung cancer. Validate genes further by experimentation. The null hypothesis states that there is no difference between the mean of the normal and the cancer sample. If this were the case, it would make no difference if we “mix up the labels” of the libraries. The alternative hypothesis states that it does make a difference and the mean of the normal and cancer sample are different. An investigation is conducted on the top ranked genes for cancer-relation using the currently available literatures on PubMed. Verification Criteria : Some tags map to more than one gene. To deal with this, the expression level of the tag is assigned to each gene the tag maps to. For instance, if tag A maps to genes 1, 2, and 3, all the genes will be assigned the tag count of tag A. Data Pre-Processing Scoring and Ranking Genes Literature Verification | | rn rc O 99% confidence - Output Permutation Test Null Hypothesis: Alternative Hypothesis M i freq M i 1 ) ( M SS Stdv M i i X SS 1 2 ) ( I Stdv O PS I Simulated Normal Pool (same size as normal samples) Pool together cancer and normal libraries Simulated Cancer Pool (same size as cancer samples) 0 : n c H 0 ! : n c a H Simulat ed | | sn sc I N PLOT Observed Score those >=99% confidence Mean Sum of Squares Standard Deviation Permutation (Z) Score Criteri a # Related to: A Up/Down regulated in Lung Cancer B Up/Down regulated in different type of cancer C Oncogene/Tumor suppresor/Mutator D Major component of the cell cycle (neoplastic process), or Angiogenesis E Not previously associated with cancer Higher permutation scores correspond to either greater differences between the two samples or greater differential consistencies between the two samples. For each tissue and significant genes, rank the genes by sorting the permutation scores in descending order. 1981 out 32,871 TAGS considered at 99% confidence failed the permutation test for Normal vs Invasive Lung Cancer. 1887 TAGS out of 40,476 TAGS considered at 99% confidence failed the permutation test for Normal vs CIS Lung Cancer 119 TAGS out of 20,077 TAGS considered failed the permutation test for CIS vs Invasive Lung Cancer Power of The Permutation Test With the permutation test, the number of samples required for the test to be acceptable is relatively low compared to other statistical tests (ie. T-test, chi- square). Top N Ranked TAGS Intersections 50 2 100 16 200 51 300 88 400 136 500 184 1000 450 The permutation test is great at picking out genes that are related to the neoplastic process. It is also much better at picking out these genes than the T-test. The permutation test between Invasive and CIS show that there are 119 Tags that are differentially expressed which suggests that the two stages of cancer have different genes turned on or off. In addition, the intersections between the top ranked genes between Normal vs Invasive And Normal vs CIS are quite low (top 200 only 25% of the Tags intersect) which also suggest differences between the 2 stages. Top 20 TAG That Map to Genes - T-test Results Criteria INV vs Normal CIS vs Normal A 0 0 B 0 1 C 0 1 D 0 0 E 5 6 Total Unique Significant Genes 5 8 Total Hypotheticals 11 8 Top 20 TAG That Map to Genes - Permutation Test Results Criteria INV vs Normal CIS vs Normal A 1 3* B 4 5* C 0 1 D 1 2 E 8 3 Total Unique Significant Genes 15 12 Total Hypotheticals 5 1 Quality of these genes is mostly dependent on criteria A and B. Following closely are criteria C and D as they are important genes in the neoplastic process Hypotheticals or genes who have no known function did not meet any of the criteria. * Indicates that there exists a duplicate (more than one TAG match to the same gene). # ofSam ples vs # ofC om binations (Log Scale) 1 100 10000 1000000 100000000 10000000000 1 10 100 Num ber ofSam ples Num berofCom binations The low intersections suggest that CIS and Invasive stages of cancer are different.

Upload: timothy-slater

Post on 30-Dec-2015

17 views

Category:

Documents


0 download

DESCRIPTION

Pool together cancer and normal libraries. Null Hypothesis: Alternative Hypothesis. Simulated Normal Pool (same size as normal samples). Simulated Cancer Pool (same size as cancer samples ). Mean. Sum of Squares. N. Observed. Simulated. Standard Deviation. PLOT. Score those >=99% - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Continue to use the permutation test to analyze other SAGE libraries

Timothy H. W. Chan, Calum MacAulay, Wan Lam, Stephen Lam, Kim Lonergan, Steven Jones, Marco Marra, Raymond T. Ng Department of Computer Science, University of British Columbia

The British Columbia Cancer Research Centre

Previously analyzed publicly available Breast and Brain SAGE libraries using the permutation test (Ng. et al, Frontiers of Cardiovascular Science 2003) and had some success (60% of top ranked genes for breast SAGE data were verified to be related to the neoplastic process). BC Cancer Research Centre has produced various Lung Cancer SAGE libraries including 5 CIS (carcinoma in situ), 6 Invasive and 17 Normal libraries. It would be interesting to use the permutation test to contrast and compare the various stages of lung cancer and search for small transcriptional changes (pathway regulators, check points, switches).

To use the permutation test on normal and different stages of lung cancer (CIS and Invasive) SAGE libraries to discover candidate cancer-related genes.

To contrast and compare these two stages of lung cancer.

To demonstrate the advantages and power the permutation test holds over the T-test.

To reduce comparison errors, the tag frequencies are normalized by scaling each library up to 300,000.

Continue to use the permutation test to analyze other SAGE libraries.

The permutation also has the power to detect small transcriptional changes as long as the gene across all the libraries have a consistent Tag count. Further analysis of these low TAG count significant genes (with high permutation scores) is required as they could be vital pathway regulators, checkpoints or switches that may have led to the onset of lung cancer.

Validate genes further by experimentation.

Use validated genes for early cancer detection or derive new treatments from data.

The null hypothesis states that there is no difference between the mean of the normal and the cancer sample. If this were the case, it would make no difference if we “mix up the labels” of the libraries.

The alternative hypothesis states that it does make a difference and the mean of the normal and cancer sample are different.

An investigation is conducted on the top ranked genes for cancer-relation using the currently available literatures on PubMed.

Verification Criteria:

Some tags map to more than one gene. To deal with this, the expression level of the tag is assigned to each gene the tag maps to. For instance, if tag A maps to genes 1, 2, and 3, all the genes will be assigned the tag count of tag A.

     

Data Pre-Processing

Scoring and Ranking Genes

Literature Verification

|| rnrcO

99% confidence - Output

Permutation Test

Null Hypothesis:

Alternative Hypothesis

M

ifreqM

i 1

)(

M

SSStdv

M

iiXSS

1

2)(

IStdv

OPS I

Simulated Normal Pool (same size as normal samples)

Pool together cancer and normal libraries

Simulated Cancer Pool(same size as cancer samples)

0: ncH

0!: ncaH

Simulated

|| snscI

N

PLOT

Observed

Score those >=99% confidence

Mean Sum of Squares

Standard Deviation

Permutation (Z) Score

Criteria # Related to:

A Up/Down regulated in Lung Cancer

B Up/Down regulated in different type of cancer

C Oncogene/Tumor suppresor/Mutator

D Major component of the cell cycle (neoplastic process), or Angiogenesis

E Not previously associated with cancer

Higher permutation scores correspond to either greater differences between the two samples or greater differential consistencies between the two samples. For each tissue and significant genes, rank the genes by sorting the permutation scores in descending order.

1981 out 32,871 TAGS considered at 99% confidence failed the permutation test for Normal vs Invasive Lung Cancer.

1887 TAGS out of 40,476 TAGS considered at 99% confidence failed the permutation test for Normal vs CIS Lung Cancer

119 TAGS out of 20,077 TAGS considered failed the permutation test for CIS vs Invasive Lung Cancer

Power of The Permutation Test

With the permutation test, the number of samples required for the test to be acceptable is relatively low compared to other statistical tests (ie. T-test, chi-square).

Top N Ranked TAGS Intersections

50 2

100 16

200 51

300 88

400 136

500 184

1000 450

The permutation test is great at picking out genes that are related to the neoplastic process.

It is also much better at picking out these genes than the T-test.

The permutation test between Invasive and CIS show that there are 119 Tags that are differentially expressed which suggests that the two stages of cancer have different genes turned on or off. In addition, the intersections between the top ranked genes between Normal vs Invasive And Normal vs CIS are quite low (top 200 only 25% of the Tags intersect) which also suggest differences between the 2 stages.

Top 20 TAG That Map to Genes - T-test Results

CriteriaINV vs Normal

CIS vs Normal

A 0 0

B 0 1

C 0 1

D 0 0

E 5 6

Total Unique Significant Genes 5 8

Total Hypotheticals 11 8

Top 20 TAG That Map to Genes - Permutation Test Results

CriteriaINV vs Normal

CIS vs Normal

A 1 3*

B 4 5*

C 0 1

D 1 2

E 8 3

Total Unique Significant Genes 15 12

Total Hypotheticals 5 1

Quality of these genes is mostly dependent on criteria A and B. Following closely are criteria C and D as they are important genes in the neoplastic process

Hypotheticals or genes who have no known function did not meet any of the criteria.

* Indicates that there exists a duplicate (more than one TAG match to the same gene).

# of Samples vs # of Combinations (Log Scale)

1

100

10000

1000000

100000000

10000000000

1 10 100

Number of Samples

Num

ber

of C

ombi

natio

ns

The low intersections suggest that CIS and Invasive stages of cancer are different.