supplementary information - nature...supplementary information 4 | research 8 o x o-gu a nin e l eve...

W W W. N A T U R E . C O M / N A T U R E | 1

SUPPLEMENTARY INFORMATIONdoi:10.1038/nature11935

CRC Cell lines Tumours (TCGA)

b

a c

CIN− CIN+

P < 0.0001

Num

eric

al C

ompl

exity

0

2

4

6

8

CIN− CIN+

Stru

ctur

al C

ompl

exity

0

10

20

30

40

50

60

70

80

90

P < 0.0001

0 1 2 3 4 5 6 7 8 9 11 13 15 24 36

0

20

40

60

80

100

Numerical Complexity Score

Stru

ctur

al C

ompl

exity

Sco

re

Supplementary Figure 1. Numerical and Structural CIN co-occur in CRC. a,b, Ploidy-normalised scores reflecting (a) numerical and (b) structural karyotypic complexity derived from SNP 6.0 data for CIN+ versus CIN- CRC cell lines (see Methods). c, Correlation between numerical and structural complexity in colorectal tumours (TCGA cohort19, Spearman’s rank correlation: r=0.4555, P < 0.0001).

SUPPLEMENTARY INFORMATION

2 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

a

0

20

40

60

80

100

**

% A

naph

ases

with

seg

rega

tion

erro

rs

CIN- CIN+

0

10

20

30

40

HT2

9

HT5

5

NC

IH74

7

SKC

O1

SW11

16

SW62

0

T84

Distorted NDC80Spherical NDC80

% S

egre

gatio

n er

rors

f

Supernumerarycentrioles

p=0.59

Multipolar spindlesp=0.027

% P

rom

etap

hase

cel

ls w

ith >

4 ce

ntrio

les

% P

rom

etap

hase

cel

ls w

ith >

2 sp

indl

e po

les

0

20

40

60

80

100hg

0

20

40

60

80

100

0

5

10

15

20

Fold

cha

nge

in m

itotic

inde

xaf

ter 8

hou

rs n

ocod

azol

e (1

00ng

/ml)

b

Mitotic checkpoint functionp=0.54

Segregation errorsp=0.0025

Monastrol

n=106

n=77n=127

n=105

- + - + DAPI NDC80 Tubulin

NCIH747

c

e

0

20

40

60 DicentricAcentricDSBOther

HT2

9

HT5

5

NC

IH74

7

SKC

O1

SW62

0

SW11

16 T84

% M

etap

hase

s

i

t= -6 t=0

NEBD LCC ANA

t=27 t=30 t=33 t=44

DLD

1H

CT-

116

RKO

HT2

9SK

CO1

SW62

00

50

100

150 NEBD-Anaphase Onset

Min

utes

DLD

1H

CT-

116

RK

OH

T29

SKC

O1

SW62

00

20

40

60

80 NEBD-LCC

Min

utes

0

20

40

60

DLD

1H

CT-

116

RK

OH

T29

SKC

O1

SW62

0

Min

utes

LCC-Anaphase onset

d

HC

T-11

6

HT2

9

HT5

5

NC

IH74

7

SKC

O1

SW62

0

SW11

16 T84

0

1

2

3

4

siCohesin

Cen

trom

ere

wid

th (μ

m) siControl

HCT-116

HT55

SKCO1

0

10

20

30

40

50

60

70

80

HT55 SW620694735 62

Laggings AcentricsBridges

% S

egre

gatio

n er

rors

Anaphasesw/ errors (%)

DNA Centrin-3 Tubulin

CIN- CIN+

CIN- CIN+CIN- CIN+

siCohesin

HC

T-11

6

+-

n=49

n=70

n=38

n=49

n=84n=72

n=76

DAPI All-centromere probe

T84NCIH747

DLD1HCT-116

SW620

HT29SKCO1

SW1463HT55SW1116

GP2DLS174TRKO

CIN

+C

IN-

90

Supplementary Figure 2. Chromosome segregation errors in CIN+ CRC cells are independent of gross mitotic defects. a, % Anaphases exhibit-ing segregation errors (acentrics, bridges or lagging chromosomes) in a panel of CIN- versus CIN+ CRC cell lines (n=100 cells per cell line, black lines are median values, statistic: Mann-Whitney U-test P=0.0025). b, Classification of segregation errors in HT55 and SW620 cells ± 16 h monastrol (100 µM) to arrest cells in prometaphase with monopolar spindles, followed by drug washout for 75 min, elevating the frequency of improper chromosome attachments6. Monastrol washout resulted in specific induction of lagging chromosomes, not anaphase bridges or acentrics. c, CIN+ cells were stained with antibodies for NDC80, a microtubule binding component of the kinetochore, and β-tubulin to identify merotelic attachments. % Segregation errors in CIN+ cell lines that were lagging chromosomes with (red bars) or without (green bars) kinetochore distortion. Top right: Representative image of a merotelic attachment in an NCIH747 anaphase. NDC80 is distorted (bottom right). d, Cell lines stably expressing H2B-mRFP were imaged every 3 minutes to assess duration of prometaphase (nuclear envelope breakdown (NEBD) to last chromosome congressed (LCC)), metaphase (LCC to anaphase onset), and the total duration of mitosis (NEBD to anaphase onset)(n=20-71 cells per cell line). Stills of SKCO1-H2B-mRFP (CIN+) cells illustrating these events are shown. e, To assess mitotic checkpoint proficiency, cells were incubated with nocodazole (100 ng/ml) or DMSO for 8 hours and % mitotic cells was measured by flow cytometry. Mitotic cells were detected with antibodies against MPM2 (20,000 cells per cell line, black lines are median values, Mann-Whitney U-test P=0.54). f, Cohesion between sister chromatids was assessed by measuring inter-chromatid distance at the centromere from metaphase spreads (shown in inset images), n=25 sister chromatid pairs across 5 metaphases. HCT-116 cells were transfected with siRNA targeting the cohesin subunit SCC1 as a positive control. Tukey box plot with outliers displayed. g, % Prometaphase cells with >4 centrioles, detected with anti-centrin 3 antibodies (n=100 cells per cell line, black lines are median values, statistic: Mann-Whitney U-test P=0.59). h, images of HT55 and SKCO1 cells stained with anti-β−tubulin and anti-centrin 3 antibodies, displaying a multipolar and bipolar spindle respectively. i, % Prometa-phase cells with multipolar spindles, detected with antibodies against β-tubulin (n=100 cells per cell line, black lines are median values, statistic: Mann-Whitney U-test P=0.027). j, % Metaphases in cell lines as indicated displaying a chromosome of the indicated morphology, measured from chromosome spreads hybridised to all-centromere DNA probe.

n=36

n=75n=50 n=55

n=79

n=25

n=64

j

*


SUPPLEMENTARY INFORMATION RESEARCH

% P

rom

etap

hase

s w

ith ≥

3 γH

2AX

foci

0

5

10

15

20

25

30

35

40

45

50%

G1

cells

with

53B

P1 b

odie

s

% S

egre

gatio

n er

rors

i j l

0

10

20

30

40

50

60

70

80

90

100**

**

0

10

20

30

40

50

60

70

80

90

100 **

n=76

0

20

40

60

80LaggingAcentricBridge

% A

naph

ases

with

UFB

s

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5TriradialDicentricDSBAcentric

% A

bnor

mal

chr

omos

omes

b c

m

e f

n=12

58

n=13

36

k

No ErrorSeg.Error

Double strand break

d

h

0

2

4

6

8

10

12

2 15

% D

evia

tion

from

the

mod

e

Control (n=330)Aphidicolin (n=400)

Chromosome

0

20

40

60

80

100

% M

etap

hase

s w

ith ≥

1 ab

norm

al c

hrom

osom

e

0

20

40

60

80

100

Con

trol

% A

naph

ases

with

seg

rega

tion

erro

rs

a

DAPI NDC80 CREST

g

n=33

n=22

DAPI γH

2AXD

API RPA

Supplementary Figure 3. Pharmacological induction of replication stress induces structural and numerical instability. a-m, HCT-116 cells were treated with 0.2 µM aphidicolin for 24 h. a, % Anaphases with segregation errors (mean±s.e.m, 3 experiments, n>50 anaphases per experiment. b, Segregation error classification. c, Examples of an acentric chromosome and an anaphase bridge induced by aphidicolin treatment. d, % Metaphases with structurally abnormal chromosomes. e, Classification of structurally abnormal chromosomes. f, Example of an aphidicolin-induced chromosome break. g,h, % Deviation from the modal copy number for centromeres 2 and 15 across the HCT-116 cell population with representative image for aphidicolin treatment shown (h). i, % Prometa-phases with ≥3 γH2AX foci (mean±s.e.m of 3 independent experiments, n=100 cells). j, Representative image of prometaphase DNA damage after aphidicolin treatment (cells stained with antibodies against γH2AX). k, UFBs in an anaphase cell after aphidicolin treatment (cells stained with antibodies against RPA). l, % Anaphases with UFBs ± aphidicolin (black bars: anaphases without errors, blue bars: anaphases with errors) (mean±s.e.m of three independent experiments, n=100 cells per experiment). m, % G1 cells with ≥ 3 53BP1 bodies (mean±s.e.m of three independent experiments, n>150 cells per experiment). All statistical tests were two-tailed t-tests **p<0.01, ***p<0.001.

***

Aphi

dico

lin

Aphi

dico

lin

Con

trol

Aphi

dico

lin

Con

trol

Aphi

dico

lin

Con

trol

Aphi

dico

lin

Con

trol

Aphi

dico

lin

Con

trol

Aphi

dico

lin

DAPI CEP2 CEP15

i)

ii)

i) 3:2 ii) 2:2



RESEARCH

8 ox

o-gu

anine

leve

ls(M

edia

n C

V 53

0/30

Blue

)

CIN- CIN+0

2000

4000

6000a H2O2

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

DLD

1

GP2

D

HC

T-11

6

RKO

LS14

7T

HT2

9

HT5

5

SW62

0

SW11

16 T84

Untreated

SW620

RKO

Untreated + H2O2

b

c

DLD1

GP2D

HT55

CIN- CIN+

SW1116

Untreated + H2O2

CIN- CIN+

8 ox

o-gu

anine

leve

ls(M

edia

n C

V 53

0/30

Blue

)

DAPI ΥH

2AX ACA

i)

ii)

iii)

i)

ii)

iii)

d

HT2

9

HT5

5

SW62

0

SW11

16

0

20

40

60

80

100InterstitialTerminal

e

% F

oci

Supplementary Figure 4. Prometaphase DNA damage in CIN+ cells is not solely attributable to telomeric or oxidative stress-induced DNA damage. a,b, Levels of the DNA modification 8-oxoguanine, a marker of oxidative DNA damage, were measured by flow cytometry in a panel of CIN+ and CIN- cell lines. a, Median CV values for 8-oxoguanine staining (values are the mean of two independent experiments). Treatment with hydrogen peroxide (H2O2) (350 µM for 4 h, followed by a 14 h recovery period) to induce oxidative stress, verified the specificity of the staining - representative experiment shown (b). c, 8-oxoguanine levels were assayed by immunohisto-chemistry. Representative fields of cells (original magnification 40X) are shown (± H2O2 treatment as in (b)); no differences were observed between CIN+ and CIN- cell lines. d, Chromosome spreads were prepared from four CIN+ cell lines and stained with antibodies for γH2AX and centromeres (ACA). right: Foci were classified as i) interstitial foci (clearly not at the chromosome end) or ii) terminal foci (at the end of the chromosome arm). Short chromosome arms (iii) could not be accurately scored and were therefore excluded from this classification. e, Quantification of percentage of foci that were interstitial or terminal (n >200 foci per cell line).



% R

eplic

aton

fork

s ≤0

.8 k

b/m

in

HC

T-11

6

RKO

HT2

9

HT5

5

SW62

0

SW11

16

0.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

HCT-116 RKO HT29 HT55 SW620 SW1116

Fork

rate

(kb/

min

)

CldUIdU

a b

c d

Cell line1st label origins

2nd label origins

Progressing forks

2nd labelTerminations

1st labelterminations/Stalled forks

Fork assymmetry (sister fork ratio <0.75)

RKO 22.8 1.2 60.8 14.6 0.6 29.1HCT-116 21.9 5.0 57.1 13.7 2.3 24.1SW620 18.8 1.8 64.5 10.7 4.2 44.4HT55 18.9 1.5 63.9 12.0 3.7 25.4HT29 20.9 1.6 60.5 15.3 1.7 35.1SW1116 21.9 2.8 56.2 16.4 2.8 34.1

e

CIN- CIN+

CIN-

CIN+

0

10

20

30

40

50

60

70

80

90

100

1st label origin

Progressing replication fork

1st label termination/stalled fork

2nd label origin

2nd label termination

1st label (CldU - red)

2nd label (IdU - green)

1 : 1 x < y

Supplementary Figure 5. CIN+ cells exhibit reduced DNA replication fork rates and evidence of fork stalling. a-e, CIN+ and CIN-

cells were pulsed with CldU followed by IdU (30 minutes each), before DNA fibre assays were performed: a, Mean replication fork rate for both CldU and IdU incorporation (n>60 forks per experiment, mean±s.e.m of three independent experiments). b, % Replication forks exhibiting speeds ≤0.8 kb/min (n>60 forks per experiment, mean±s.e.m of three independent experiments). c, Schematic of replication structures. d, Representative fibres showing asymmetric sister replication fork progression. Sister fork ratio = x/y (where x is the shorter fork). Forks with a ratio of <0.75 were considered asymmetric5. e, Table showing quantification of replication structures and sister fork asymmetry in CIN+ and CIN- cell lines (n=600 replication forks per cell line or n=51-142 bidirectional forks per cell line for fork asymme-try, from three independent experiments). Values are percentage of replication forks. For fork asymmetry, only bidirectional forks were quantified. No increase was observed in the frequency of origin structures (1st label origins, 2nd label origins) despite reports that slow replication fork rates are accompanied by increased origin firing, in order to enable the timely completion of DNA replication5.

Percentage replication forks



RESEARCH

a

b

Tumours (TCGA)

CRC Cell Lines

0 20 40 60 80 100

% Aneuploid nuclei

18q LOH

Adenoma

Carcinoma

cC

IN−

CIN

+

CIN

−

CIN

+

CIN

−

CIN

+

CIN

−

CIN

+

Num

ber o

f Sam

ples

0

5

10

15

20

TP53P = 0.022

adj. P = 0.84

APCP = 0.037

adj. P = 0.84

KRASP = 0.81adj. P = 1

SMAD4P = 0.05

adj. P = 0.84

CIN

−

CIN

+

CIN

−

CIN

+

CIN

−

CIN

+

CIN

−

CIN

+

Num

ber o

f Sam

ples

Mutant WT

0

10

20

30

40

50

60

70

80 TP53P = 5e−06

adj. P = 0.0051

APCP = 0.041adj. P = 1


SMAD4P = 0.74adj. P = 1

Mutant WT

Supplementary Figure 6. TP53 is the only gene significantly correlated with CIN. a, Exome sequencing data was analysed for the TCGA cohort of colorectal tumours (n=101)19. Tumours were defined as CIN+ or CIN- based on weighted genome instability index >0.2 (see Methods) and genes that showed significant enrichment (Fisher’s exact test) for mutation in CIN+ tumours were identified. Only TP53 remained significant after correction for multiple testing34, and TP53 mutations also occurred in CIN- tumours. b, Data examining the mutation status of 64 protein-coding genes in each of 29 CRC cell lines was downloaded from COSMIC (http://cancer.sanger.ac.uk/cancergenome/projects/cosmic/) and the relationship of mutation status to CIN status examined. Three genes were more frequently mutated in CIN+ versus CIN- cell lines: TP53, SMAD4 and APC, although none remained significant after correcting for multiple testing. TP53 and APC mutations both occurred in CIN- cell lines. c, % Aneuploid nuclei in preparations of nuclei from paired, adjacent colorectal adenomas and carcinomas. Nuclei were stained with Feulgen stain, and DNA content was measured by DNA image cytometry. 18q LOH was determined by microsatellite or SNP analysis (see Methods).



cb

0

20

40

60

80

100

% m

RN

A ex

pres

sion

in s

iCon

trol

siZN

F516

siPI

GN

siM

EX3C

siC

ontro

l

a

≥3 standard deviations above control (≥37%)≥2 standard deviations above control (≥31%)

siRNA pool deconvolution(% segregation errors)

siRNA 1 2 3 4MEX3C 45 42 37 69PIGN 40 43 47 30ZNF516 37 47 57 33LOC284274 37 48 33 13CCDC68 24 50 16 40CCDC102B 43 41 20 23PMAIP1 23 50 30 33DOK5L 30 26 27 43CPLX4 23 37 33 28PARD6G 33 20 17 33SMAD4 39 20 20 27PQLC1 23 37 19 23

Valid

ated

Faile

d

d

0

10

20

30

40

50

60

70

80

90

100

% m

RN

A ex

pres

sion

in s

iCon

trol siRNA-1

siRNA-2siRNA-3siRNA-4pool

*

^

siZN

F516

siPI

GN

siM

EX3C

siC

CD

C68

siC

CD

C10

2B

0

10

20

30

40

50

60

siC

ontro

lM

EX3C

-Am

b-1

MEX

3C-A

mb-

2M

EX3C

-Am

b-3

PIG

N-A

mb-

1

PIG

N_U

TR3_

3M

EX3C

_UTR

3_1

ZNF5

16_U

TR3_

2ZN

F516

_UTR

3_3

PIG

N_U

TR5_

1PI

GN

_UTR

5_3

MEX

3C_U

TR5_

1ZN

F516

_UTR

5_3

Ambion 3'UTR 5'UTR

e

% A

naph

ases

with

seg

rega

tion

erro

rs

Supplementary Figure 7. Identification and validation of CIN-suppressor genes. a, % Anaphase cells with segregation errors in HCT-116 cells following silencing of 94 genes encoded on 18q (n≥30 anaphases per siRNA pool). 8 genes encoding hypothetical proteins (for which siRNA pools were unavailable) were not incuded and 3 genes could not be assessed for chromosome segregation errors (see Supplementary Table 3). b, siRNA pool deconvolution: segregation error rates after transfection with the four individual siRNAs from the pool used in (a). *LOC284274 mRNA could not be detected and thus was excluded as it was not possible to validate silencing or determine whether this gene is expressed. ^PMAIP1 was excluded due to previously reported off-target effects upon MAD2 protein of sequences 2 and 340. c, mRNA expression measured by quantitative reverse-transcription PCR relative to expression in control-transfected HCT-116 cells for each siRNA pool validating with 4/4 sequences (mean±s.d of 3 experiments). d, mRNA expression relative to control-transfected cells after transfection of individual sequences comprising the siRNA pools for which at least two sequences induced segregation errors beyond the screen threshold (37%). CCDC68 and CCDC102B were excluded from further functional follow-up due to discrepancies between mRNA knockdown and segregation error phenotype. e, % Segregation error frequency upon silencing the indicated gene with further independent siRNA sequences targeting the coding sequence, and with sequences targeting the 3’ and 5’-untranslated regions of the transcripts (n≥30 anaphases per sequence).

siRNA

% A

naph

ases

with

seg

rega

tion

erro

rs

3 S.D above control mean

Cont

rol (

mea

n+S.

D)C

CD

C68

PIG

NZN

F516

C18O

RF62

CCDC

102B

PMAI

P1PA

RD6G

MEX

3CPQ

LC1

CPLX

4DO

K5L

SMAD

4TN

FRS1

1ASE

RPIN

B3FV

T1SA

LL3

ZNF2

36PL

EKHE

1SE

RPIN

B5SE

RPIN

B4KI

AA14

68SE

RPIN

B2C1

8ORF

51O

NEC

UT2

ST8S

IA3

HAK/

ALPK

2PO

LIC

ND

P1RN

F152

GRP RA

XLM

AN1

ATP8

B1SE

RPIN

B8ZN

F532

TXNL

4ASE

RPIN

B11

SEC1

1CTX

NL1

NETO

1M

C4R

TCF4

C18O

RF26

GAL

R1RA

B27B

RTT

NC1

8ORF

22FL

J257

15C1

8ORF

54C

DH

7NE

DD4L

MBD

2C

ND

P2FB

XO15

NFA

TC1

CCBE

1CB

LN2

STAR

D6SE

RPIN

B13

C18O

RF55

CTDP

1D

CC

VPS4

BKC

NG2

ZNF4

07M

ALT1

CDH2

0TS

HZ1

ZCC

HC

2TX

NDC1

0CD

H19

CD22

6DS

ELSO

CS6

CYB5

ASE

RPIN

B10

ZADH

2AT

P9B

NARS

SERP

INB7

WD

R7

FEC

HM

BP

0

20

40

60



RESEARCH

siC

ontro

lsi

MAD

2si

CC

DC

102B

siC

CD

C68

siPI

GN

siZN

F516

siM

EX3C

β-ACTIN

siC

ontro

lsi

MAD

2M

EX3C

poo

lM

EX3C

-1M

EX3C

-2M

EX3C

-3M

EX3C

-4 e

b

RAD51

β-ACTIN

siC

ontro

lsi

MAD

2

siPI

GN

siZN

F516

siM

EX3C

siR

AD51

c

siC

ontro

l

siM

AD2

MEX

3C p

ool

MEX

3C-1

MEX

3C-2

MEX

3C-3

MEX

3C-4

MAD2

β-ACTIN

d

siC

ontro

lG

FPsi

MEX

3C (3

’ UTR

)G

FPsi

MEX

3C (3

’ UTR

)M

EX3C

-GFP

0 20 40 60 80

100

*

0

10

20

30

40

50 MEX3C

0

10

20

30

40

50

60

siC

ontro

lG

FPsi

PIG

N_3

GFP

siPI

GN

_3PI

GN

ins-

GFP

% S

egre

gatio

n er

rors PIGN

0

10

20

30

40

50

60

siC

ontro

lG

FPsi

ZNF5

16_3

GFP

siZN

F516

_3ZN

F516

ins-

GFP

ZNF516

0 20 40 60 80

100

*% m

RN

A ex

pres

sion

0 20 40 60 80

100

*

anti-GFP

f

a

Copy number loss relative to ploidy

31 kDa

24 kDa MAD2

38 kDa

53 kDa

β-ACTIN

MAD2

siC

CD

C10

2Bsi

CC

DC

68 150 kDa

150 kDa

102 kDa

225 kDa

150 kDa225 kDa

*

r=0.403, p=0.037

p = 0.031 p=0.0019 p=0.0097

r=0.576, p=0.002 r=0.475, p=0.012

Gen

e Ex

pres

sion

PIGN MEX3C ZNF516

8

7

6

5

4

8

7

6

5

4

96.0

5.5

5.0

4.5

4.0

-2 -1 0 -2 -1 0 -2 -1 0

Supplementary Figure 8. CIN-suppressor gene validation. a, Spearman’s rank correlation between mRNA expression and DNA copy number for PIGN, MEX3C and ZNF516 in CIN- (n=8, black dots) and CIN+ (n=20, red dots) cell lines. Statistic: Mann-Whitney U-test. b, MAD2 protein levels in HCT-116 cells transfected with siRNA pools with at least 2 sequences inducing ≥ 37% anaphases with segregation errors. c, MAD2 protein levels assessed after transfection of the individual MEX3C sequences. Sequences 2 and 4 exhib-ited partial depletion of MAD2 and were therefore excluded. Lane 6 (sequence 3) was underloaded and therefore samples were rerun (d) to verify MAD2 levels were not depleted upon transfection with this sequence. e, RAD51 protein levels following validated siRNA transfection (*denotes non-specific band). f, HCT-116 cells were transfected with siRNAs as indicated and MEX3C-GFP or siRNA-insensitive versions of PIGN-GFP (PIGNins-GFP) or ZNF516-GFP (ZNF516ins-GFP) before scoring segregation errors (n≥30 anaphases per experiment). Mean+s.e.m of three independent experiments (PIGN and ZNF516) or mean+S.D. of two independent experiments (MEX3C) is shown (top panels). Western blotting using anti-GFP confirmed the expression of the GFP-tagged proteins (middle panels) and RT-qPCR confirmed gene knockdown (lower panels). Western blot and qPCR data are from the same experi-ment. *Endogenous gene knockdown could not be assessed in these samples due to the confounding presence of exogenous siRNA-insensitive mRNA transcripts. NB: PIGN-GFP and MEX3C-GFP observed MW are larger than predicted MW, in agreement with previ-ous reports (MEX3C41), and post-translational modification (PIGN, data not shown).

Methods: PIGN-GFP and ZNF516-GFP were extracted using the Qproteome cell compartment kit (Qiagen).



0

10

20

30

40

50

0

10

20

30

40

50

60

70

0

20

40

60

80

NCIH508a b

% A

naph

ases

with

seg

rega

tion

erro

rsc

siC

ontro

l

siPI

GN

siZN

F516

siM

EX3C

siC

ontro

l

siPI

GN

siZN

F516

siM

EX3C

siC

ontro

l

siPI

GN

siZN

F516

siM

EX3C

DLD1 RKO

***

*

*

***

**

***

0

10

20

30

40

50

60

siC

ontro

l

siPI

GN

siM

EX3C

siZN

F516

HCT-116HKH2

d

%An

apha

ses

with

seg

rega

tion

erro

rs

% A

naph

ases

with

seg

rega

tion

erro

rs

% A

naph

ases

with

seg

rega

tion

erro

rs

Supplementary Figure 9. CIN-suppressor gene-silencing induces segregation errors in additional cell lines. a-c, Segregation error frequencies in (a) DLD1 (CIN-) (b) RKO (CIN-) and (c) NCIH508 (CIN+, 18q normal) cells following transfection with validated siRNA pools (mean±s.e.m of 3 experiments, n>30 anaphases per condition per experiment, two-tailed t-test,*P<0.05, **p<0.01,***p<0.001). d, Segregation error frequency in HCT-116 (KRAS mutant) and isogenic cell line HKH2 (in which the mutant KRAS allele has been deleted), n>60 anaphases per condition, two experiments.


1 0 | W W W. N A T U R E . C O M / N A T U R E

RESEARCH

0102030405060708090

100

1* 2* 3 1 2 3 1 2PIGN MEX3C ZNF516

% mRNA expression relative to non-silencing

% Anaphases with segregation errors

shRNA

b

PIG

N-1

PIG

N-2

MEX

3C-1

MEX

3C-2

MEX

3C-1

0

10

20

30

Average % deviation from the modal chromosome number (CEP2 and CEP15)

** ***

shRNA

shC

ontro

l

c

* sequence identical to siRNA from primary screen

0

1

2

3

4

5

6

7

8

9

10

siC

ontro

l

Aphi

dico

lin

Mon

astro

l

siPI

GN

siM

EX3C

siZN

F516

DMSOBlebbistatin

siControl

siPIGN

DAPI C

yclin A1 53BP1

Fold

cha

nge

in %

G1

cells

with

53B

P1bo

dies

(rela

tive

to D

MSO

trea

ted

siC

ontro

l)0h

siRNA transfection

44h 48h

DMSO/blebbistatin

Fix

Immunofluorescence+quantification (binucleates only for +bleb samples)

0h

Seed cells

44h 48h

Monastrol Fix

i)

ii) 46h

DMSO/blebbistatin

f ge

Other DNA linkage Triradial Dicentric DSB Acentric

a%

Chr

omos

omes

0.0

0.5

1.0

1.5

2.0

2.5

3.0

siZN

F516

siPI

GN

siM

EX3C

siC

ontro

ln=

5343

n=47

42n=

5021

n=51

40

N.D

.

Supplementary Figure 10. CIN-suppressor gene silencing induces structural and numerical instability. a, % Aberrant chromosomes identified on metaphase chromosome spreads following CIN-suppressor silencing, categorised by type of abnormality. b, mRNA expression and anaphase segregation error frequencies (n=50 anaphases) following shRNA mediated silencing of PIGN, MEX3C and ZNF516 (N.D. not determined). c, HCT-116 cell lines stably expressing shRNAs against the three CIN-suppressor genes were seeded at low density on glass slides to allow colony formation. Slides were then hybridised to fluorescently labelled centromeric probes to chromosomes 2 (CEP2) and 15 (CEP15) and stained with DAPI. Each point is the mean % deviation from the modal centromere copy number for the two centromeric probes for an individual colony. Lines are median values, statistical test: Mann-Whitney U-test, **P<0.01, ***P<0.001. d, HCT-116 cells were transfected with siRNAs as indicated and fixed and stained with antibodies to γH2AX and β-tubulin at 24 and 48 h post siRNA transfection. Segregation errors and prometaphase DNA damage were scored (mean±s.e.m, n=50 prometaphases, 30 anaphases per experiment) of four (siControl) or three independent experiments (siPIGN, siMEX3C and siZNF516), and compared to untransfected cells (t=0) (mean±s.d. of two independent experiments). Both phenotypes occur with similar kinetics, making it unlikely that lagging chromosomes undergoing cytokinesis-induced DNA damage explain the induction of prometaphase DNA damage following CIN-suppressor silencing. e, Schematic of experiment for testing whether 53BP1 bodies observed in G1 cells were attributable to DNA damage during cytokinesis. siRNA-depleted cells were treated with blebbistatin (100 µM) or DMSO for 4 h, 44 h post-transfection, to inhibit cytokinesis. As a control, cells were treated with 0.2 µM aphidicolin for 24 h prior to blebbistatin addition (aphidicolin-induced 53BP1 bodies should be unaffected by cytokinesis inhibition15). In addition, cells were treated with monastrol (100 µM, 2 h) to induce lagging chromosomes, then released into blebbistatin or DMSO for 2 h, as a control for 53BP1 foci that are reduced by cytokinesis inhibition7. Cells were then fixed and stained, and only binucleate cells scored to ensure that cells had passed through mitosis in the presence of a cytokinesis block. Blebbistatin treatment did not affect mitotic entry (data not shown). f, Example images of binucleate control and PIGN-depleted cells after blebbistatin treatment. g, Quantification of two independ-ent experiments (n>100 binucleate G1 cells (blebbistatin), or 200 G1 cells (DMSO), per condition), normalised to DMSO-treated control cells. Only G1 53BP1 foci induced by monastrol were reduced by cytokinesis inhibition.

shC

ontro

l

0

20

40

60

80

0 24 48 0

20

40

60

80

0 24 48 0

20

40

60

80

0 24 48 0

20

40

60

80

0 24 48

d

% C

ells

Time (h) Time (h) Time (h) Time (h)

siControl siPIGN siMEX3C siZNF516Segregation errors≥ 3 γH2AX foci

W W W. N A T U R E . C O M / N A T U R E | 1 1


a

c

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

0

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

PIGN/MEX3C/ZNF516GFPHT29 HT55 SW620

PIGN/MEX3C/ZNF516GFPHT29 HT55 SW620

DAPI PIGN/MEX3C/ZNF516-GFP

Cyclin A1 53BP1

MERGEb

PIGN-GFP MEX3C-GFP ZNF516-GFP

PIGN-GFP DAPI MEX3C-GFP DAPI ZNF516-GFP DAPI

Fold

cha

nge

in %

G1

cells

w

ith ≥

3 53

BP1

bodi

es

Fold

cha

nge

in %

G1

cells

w

ith ≥

3 53

BP1

bodi

es

d

0

Supplementary Figure 11. 53BP1 bodies in G1 cells are partially reduced upon exogenous expression of CIN-suppressor genes in CIN+ cells with 18q loss. a, Example images of GFP-tagged CIN-suppressors as indicated in HT55 cells. Localisation patterns are consistent with published reports for PIGN and MEX3C41-43. b, Representative image from CIN+ HT55 cells co-transfected with GFP-tagged PIGN, MEX3C and ZNF516. c,d, Quantifications of G1 53BP1 bodies in PIGN/MEX3C/ZNF516-GFP transfected cells, relative to GFP-transfected cells. Only GFP positive G1 (cyclin-A1 negative) cells were scored (n>100 cells per experiment).



RESEARCH

0

0.2

0.4

0.6

0.8

1

siC

ontro

l

siPI

GN

siM

EX3C

siZN

F516

0.2

0.4

0.6

0.8

1Fo

rk ra

te (k

b/m

in)

siC

ontro

l

siPI

GN

siM

EX3C

siZN

F516

Fold

cha

nge

in fo

rk ra

te

0

CldUIdU

a b

0

0.25-0.49<0.25

siC

ontro

l

siPI

GN

siM

EX3C

siZN

F516

c

% S

iste

r for

ks

Sister fork ratio

10

5

15

Supplementary Figure 12. CIN-suppressor silencing impairs DNA replication fork progression in HCT-116 cells. a-c, 48 h after transfection with siRNA pools, cells were sequentially pulsed for 30 mins each with CldU and IdU, before DNA fibre assays were performed. a, Mean replication fork rate for CldU and IdU (mean±s.d. of two independent experiments, n>70 forks per experiment (total of >200 forks assessed)). b, Mean fork rate normalised to control replication fork rate (mean±s.d. of two experiments, n>70 forks per experiment, total of >200 forks assessed). c, Quantification of sister fork asymmetry following CIN-suppressor silencing (sister fork ratio= x/y where x is the shorter sister fork, n=68 -141 bidirectional forks per siRNA). Only the most asymmetric forks are shown (ratio <0.49).



0

10

20

30

DLD

1

HC

T116

LS17

4T

RKO

% A

naph

ases

with

se

greg

atio

n er

rors

0

20

40

60

80

NCIH508

% A

naph

ases

with

se

greg

atio

n er

rors

cb

0

20

40

60

80

HT2

9

HT5

5

SW62

0

SW11

16

% P

rom

etap

hase

cel

lsw

ith ≥

3 γH

2AX

foci

aUntreatedNucleosides

Supplementary Figure 13. Nucleoside supplementation of CIN+ cells reduces prometaphase DNA damage and chromosome segregation errors, but does not reduce segregation error frequency in CIN- cells. a, % Cells with ≥3 γH2AX foci ± nucleoside supplementation for 48 h (mean+s.e.m of three (HT29, SW620, SW1116) or four (HT55) independent experiments, n=100 cells per experiment). b, % Anaphases with segregation errors ± nucleoside supplementation for 48 h in CIN- cell lines as indicated (mean+s.e.m of three independent experiments, n≥30 anaphases per cell line per experiment). c, % Anaphases with segregation errors ± nucleoside supplementation for 48 h in NCIH508 CIN+, 18q normal cells, (mean+s.e.m of three independent experiments, n≥30 anaphases per experiment).

UntreatedNucleosides



RESEARCH

0

2

4

6

8

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72

HT29Pr

olife

ratio

n (N

orm

alis

ed a

rbitr

ary

units

)

0

1

2

3

4

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72

HT55

1

3

5

7

9

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72

SW620

1

3

5

7

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 68 72

SW1116

0

0.05

0.1

0.15

0.2

0.25

0.3

HT2

9

HT5

5

SW62

0

SW11

16

a

b

Prol

ifera

tion

rate

(24-

48 h

)

0

2

4

6

8

10

12

14

Fold

cha

nge

in m

itotic

inde

x af

ter

8 ho

urs

noco

dazo

le

d

0

0.2

0.4

0.6

0.8

1

1.2

HT2

9

HT5

5

SW62

0

SW11

16

c

Fold

cha

nge

in m

itotic

inde

x af

ter

add

ition

of n

ucle

osid

es

HT2

9

HT5

5

SW62

0

SW11

16

G1 S G2 MUntreated Nuc Untreated Nuc Untreated Nuc Untreated Nuc

HT29 43.4±0.6 47.2±0.6 35.5±1.7 32.6±0.9 17.6±1.9 20.3±0.4 2.5±0.3 2.9±0.2HT55 39.5±3.9 43.2±1.6 38.3±1.3 35±1.4 20.6±2.3 20.0±0.6 2.9±0.3 2.8±0.2SW620 38.2±1.3 39.7±0.6 40.4±0.7 37.6±1.2 19.6±1.6 20.7±1.9 3.4±0.4 3.3±0.2SW1116 58.83±7.5 58.8±7.8 29.6±4.6 29.3±4.7 9.8±5.4 10.1±4.7 2.9±0.6 3.0±0.8

Cell line

e

0

0.2

0.4

0.6

0.8

1

1.2

Fold

cha

nge

in c

ellu

lar A

TP le

vels

HT2

9

HT5

5

SW62

0

SW11

16

f

Supplementary Figure 14. Nucleoside supplementation does not affect proliferation, cell cycle distribution or cellular ATP levels. a, CIN+ cell lines as indicated were supplemented with nucleosides and then imaged in 96 well plates every 2 h using an in-situ imaging system. The % confluency of the well was calculated at each timepoint, and growth curves constructed. Representative growth curves for each cell line are shown. b, Mean proliferation rates ± nucleosides (mean±s.e.m of three independent experiments). Prolif-eration rates were calculated by measuring the gradient of growth curves between 24 and 48 h. c, Table showing cell cycle distribution ± nucleoside supplementation for 48 h (mean±s.e.m of three independent experiments). d, Fold change in mitotic index ± nucleoside supplementation for 48 h, measured by flow cytometry (mean±s.e.m of three independent experiments). e, Fold change in mitotic index upon nocodazole treatment (8 h, 100 ng/ml) relative to DMSO treatment ± nucleoside supplementation for 48 h (mean±s.e.m of three independent experiments). f, ATP levels were measured using Cell Titer Glo (CTG). CTG values were then normalised to cell biomass (sulforhodamine B solution). Fold-change in ATP levels was calculated for nucleoside-treated cells by normalising the values to those of untreated cells (mean±s.e.m of three experiments).





Supplementary Table 1: Frequency of 18q loss in CIN+ CRC cell lines and aneuploid colorectal tumours (BAC array and TCGA cohort19).

Supplementary Table 2: Genes encoded in regions of copy number loss in CIN+ CRC (Excel datasheet). List of 1331 genes encoded in regions of significant copy number loss in both aneuploid tumours and CIN+ CRC cell lines (P<0.001, Fisher’s exact test). Supplementary Table 3: Genes included in RNA interference screen (Excel datasheet). List of 94 genes on 18q that were present at one or no copies in ≥30% of CIN+ cell lines, and no more than one CIN- cell line, which were included in the RNA interference screen. Supplementary Table 4: Percentage of cell lines and tumours that have lost zero, one, two or all three CIN-suppressor genes.

Supplementary Table 5 (Excel datasheet): Correlation of mRNA expression and copy number in tumours (TCGA cohort). Spearman’s rank correlation co-efficient for the 75 genes that were assessable on gene expression microarrays of the 94 genes on 18q included in the RNA interference screen. Supplementary Table 6: SiRNA sequences used in this study (sense sequence).

PIGN Dharmacon D-012463-01 GAGACCACGUUUAUACAUA PIGN Dharmacon D-012463-02 AUUGACUCAUCAGUUGACU PIGN Dharmacon D-012463-03 UUAAGCAUUUCACGUAGAC PIGN Dharmacon D-012463-04 AUUAUGAAGUUAAGGGUCC MEX3C Dharmacon D-006989-01 UUUCGUCCCUAUAUAUAAC MEX3C Dharmacon D-006989-02 UUGAUCCCGUCGUUAUUGG MEX3C Dharmacon D-006989-04 AUUGGGAGGCUCGUUCACC

CRC Cell Lines (CIN+, SNP 6.0) n = 20

Aneuploid Tumours (BAC Array) n = 26

TCGA Tumours (CIN+, SNP.6.0) n = 103

18q loss (>50% of 18q)

80%

88%

82%

18q loss (>75% of 18q)

70%

85%

80%

Number of CIN-suppressors lost relative to median ploidy

CRC Cell Lines (CIN+, SNP 6.0) n = 20

TCGA Tumours (CIN+, SNP.6.0) n = 103

Zero 15% 16% One 0% 1% Two 15% 4%

All three 70% 79%



RESEARCH

MEX3C Dharmacon D-006989-17 AUAUGCACAAACAGAACCG ZNF516 Dharmacon D-010660-01 AAACAUUUGUCCAAGGGCG ZNF516 Dharmacon D-010660-02 AUUGGUAACCCGACUUAGG ZNF516 Dharmacon D-010660-03 UUGCUGAUUUCGGCGGACG ZNF516 Dharmacon D-010660-04 AAAGGCUCCGCAAUAGAGG PIGN 3’ UTR oligo 3 GCUGAUGCUAAGCUUUCCU PIGN 5’ UTR oligo 1 GCUCCAGAAACAGUAUUAU PIGN 5’ UTR oligo 3 CCGUUUGAUCUUGGCAAUU MEX3C 3’ UTR oligo 1 GCAUUACAGACGAUUCAUA MEX3C 5’ UTR oligo 1 GCUGGUUAAGGCCAUGAAA ZNF516 3’ UTR oligo 2 CCAUUCGAACUGAGCGAAU ZNF516 3’ UTR oligo 3 CCAUGCAGUGUACGCAAUU ZNF516 5’ UTR oligo 3 GCUCUUCUGACCAUGGAGA Ambion PIGN S24088 CCUUUACCCUGGGAAUUGA Ambion MEX3C _1 S27947 GCCUGAAAAUGUUGACCGA Ambion MEX3C _2 S27948 GGAUUAUCAUGUAGUCCUA Ambion MEX3C _3 S27949 GGAGUGAUCCUUCUGGUAA

Supplementary Table 7: shRNA sequences used in this study.

PIGN – 1 (V2LHS_244784) pGIPZ TATGTATAAACGTGGTCTC PIGN – 2 (V2LHS_246658) pGIPZ AATTCGTAAAGTGCATCTG PIGN – 3 (V2LHS_50417) pGIPZ TCATCTAATTCGTAAAGTG MEX3C – 1 (V2LHS_235038) pGIPZ ATGCATTTCTATTTCTTCC MEX3C – 2 (V2LHS_135328) pGIPZ AATTGGATATCATTCTTGC MEX3C – 3 (V2LHS_234137) pGIPZ TTGGATATCATTCTTGCGC ZNF516 – 1 (V2HS_7966) pSM2c TTAAGTAAACTGTTGCTCTCGC ZNF516 – 2 (V2HS_7968) pSM2c TATTCTTAGGCATGCTAGCGG ZNF516 – 3 (V2HS_79671) pSM2c TAACAAGGAGAGGCATCTGCC

Supplementary References

40. Tsui, M. et al. An intermittent live cell imaging screen for siRNA enhancers

and suppressors of a kinesin-5 inhibitor. PLoS One 4, e7339, doi:10.1371/journal.pone.0007339 (2009).

41. Buchet-Poyau, K. et al. Identification and characterization of human Mex-3 proteins, a novel family of evolutionarily conserved RNA-binding proteins differentially localized to processing bodies. Nucleic Acids Res 35, 1289-1300, doi:gkm016 [pii]

10.1093/nar/gkm016 (2007). 42. Hong, Y. et al. Pig-n, a mammalian homologue of yeast Mcd4p, is involved in

transferring phosphoethanolamine to the first mannose of the glycosylphosphatidylinositol. J Biol Chem 274, 35099-35106 (1999).

43. Lee, M. G., Wynder, C., Cooch, N. & Shiekhattar, R. An essential role for CoREST in nucleosomal histone 3 lysine 4 demethylation. Nature 437, 432-435, doi:nature04021 [pii]

10.1038/nature04021 (2005).



Sweave Document:

Replication stress links structural and numerical

cancer chromosomal instability

Burrell & McClelland et al.

January 15, 2013

The following Sweave document describes the data analysis.All data analysis was performed in the R statistical environment.

1 Getting started

1.1 Loading required packages

> rm(list = ls())

> library(limma)

> library(qvalue)

> library(siggenes)

> library(gregmisc)

> library(beeswarm)

> library(fields)

> library(multtest)

> library(hthgu133a.db)

1.2 Required Functions

> ################################################

> # gii score weighted by chromosome length

> ################################################

>

> gii.weight.chrom <- function(x, seg, ploidy){

+ gii.weight <- 0

+ ploidy.x <- ploidy[x]

+ sub <- subset(seg, seg[, 1] == x)

+

+ for(cc in unique(seg[, 2])){

+ sub.chrom <- subset(sub, sub[, 2] == cc)

+ a <- sub.chrom[as.numeric(sub.chrom[, 6]) > ploidy.x

1

+ | as.numeric(sub.chrom[, 6]) < ploidy.x, ]

+ b <- sum(as.numeric(sub.chrom[, 4]) - as.numeric(sub.chrom[, 3]))

+ if(!is.null(dim(a)))

+ gii.chrom <- sum(as.numeric(a[, 4]) - as.numeric(a[, 3]))/b

+ else

+ gii.chrom <- sum(as.numeric(a[4]) - as.numeric(a[3]))/b

+

+ gii.weight <- gii.weight + gii.chrom

+ }

+ return(gii.weight/length(unique(seg[, 2])))

+ }

> ##################################

>

> weighted.quantile <- function(x, w, probs = seq(0, 1, 0.25)

+ , na.rm = FALSE, names = TRUE) {

+

+ if (length(x) != length(w))

+ stop("Weights and variable vectors are of unequal length!")

+ if (na.rm)

+ { valid.obs <- !is.na(x) & !is.na(w)

+ x <- x[valid.obs]

+ w <- w[valid.obs]}

+ else if (any(is.na(x)) | any(is.na(w)))

+ stop("Missing values and NaN's not allowed if `na.rm' is FALSE")

+ if (any((p.ok <- !is.na(probs)) & (probs < 0 | probs > 1)))

+ stop("probs outside [0,1]")

+ if (na.p <- any(!p.ok)) {

+ o.pr <- probs

+ probs <- probs[p.ok]

+ }

+ np <- length(probs)

+ if (missing(w))

+ w <- rep(1, length(x))

+ if (na.rm) {

+ w <- w[i <- !is.na(x)]

+ x <- x[i]

+ }

+ # check if weights are probs or unit correspondences

+ ind <- order(x)

+ cumw <- cumsum(w[ind])/sum(w)

+ x <- x[ind]

+ med <- numeric(0)

+ for(i in 1:length(probs)) {

+ if(probs[i] > 0 & probs[i] < 1) med[i] <- max(x[cumw<=probs[i]])

+ else if (probs[i] == 0) med[i] <- min(x)

+ else if (probs[i] == 1) med[i] <- max(x)

2



RESEARCH

+ | as.numeric(sub.chrom[, 6]) < ploidy.x, ]

+ b <- sum(as.numeric(sub.chrom[, 4]) - as.numeric(sub.chrom[, 3]))

+ if(!is.null(dim(a)))

+ gii.chrom <- sum(as.numeric(a[, 4]) - as.numeric(a[, 3]))/b

+ else

+ gii.chrom <- sum(as.numeric(a[4]) - as.numeric(a[3]))/b

+

+ gii.weight <- gii.weight + gii.chrom

+ }

+ return(gii.weight/length(unique(seg[, 2])))

+ }

> ##################################

>

> weighted.quantile <- function(x, w, probs = seq(0, 1, 0.25)

+ , na.rm = FALSE, names = TRUE) {

+

+ if (length(x) != length(w))

+ stop("Weights and variable vectors are of unequal length!")

+ if (na.rm)

+ { valid.obs <- !is.na(x) & !is.na(w)

+ x <- x[valid.obs]

+ w <- w[valid.obs]}

+ else if (any(is.na(x)) | any(is.na(w)))

+ stop("Missing values and NaN's not allowed if `na.rm' is FALSE")

+ if (any((p.ok <- !is.na(probs)) & (probs < 0 | probs > 1)))

+ stop("probs outside [0,1]")

+ if (na.p <- any(!p.ok)) {

+ o.pr <- probs

+ probs <- probs[p.ok]

+ }

+ np <- length(probs)

+ if (missing(w))


+ if (na.rm) {

+ w <- w[i <- !is.na(x)]

+ x <- x[i]

+ }

+ # check if weights are probs or unit correspondences

+ ind <- order(x)

+ cumw <- cumsum(w[ind])/sum(w)

+ x <- x[ind]

+ med <- numeric(0)

+ for(i in 1:length(probs)) {

+ if(probs[i] > 0 & probs[i] < 1) med[i] <- max(x[cumw<=probs[i]])

+ else if (probs[i] == 0) med[i] <- min(x)

+ else if (probs[i] == 1) med[i] <- max(x)

2

+ }

+ # the treatment of names

+ med

+ }

> ###########################################################

> weighted.median <- function(x,w, na.rm = FALSE) {

+ if (missing(w))


+ if (na.rm) {

+ w <- w[i <- !is.na(x)]

+ x <- x[i]

+ }

+ weighted.quantile(x,w,.5)

+ }

> ###########################################################

>

> function.matend = function(dataend){

+ start = 1

+ end = c()

+ for(i in 2:nrow(dataend)){

+ if(

+ length(which(dataend[i,c(1, 3:ncol(dataend))]

+ != dataend[i - 1,c(1, 3:ncol(dataend))])) > 0

+ ){

+ end = c(end, i-1)

+ start = c(start, i)

+ }

+ }

+ end = c(end, i)

+ startend = cbind(start, end)

+ matend = matrix(0, nr = 0, nc = ncol(dataend) + 1)

+ for(i in 1:nrow(startend)){

+ if(startend[i, 1] != startend[i, 2])

+ matend = rbind(matend, c(mean(dataend[startend[i, 1]:startend[i, 2], 1])

+ , dataend[startend[i, 1], 2], dataend[startend[i, 2], 2]

+ , apply(dataend[startend[i, 1]:startend[i, 2], 3:ncol(dataend)], 2, mean)))

+ else{



+ , dataend[startend[i, 1]:startend[i, 2], 3:ncol(dataend)]) )

+ }

+ }

+

+

+ return(matend)

+ }

3



+ }

+ # the treatment of names

+ med

+ }

> ###########################################################

> weighted.median <- function(x,w, na.rm = FALSE) {

+ if (missing(w))


+ if (na.rm) {

+ w <- w[i <- !is.na(x)]

+ x <- x[i]

+ }

+ weighted.quantile(x,w,.5)

+ }

> ###########################################################

>

> function.matend = function(dataend){

+ start = 1

+ end = c()

+ for(i in 2:nrow(dataend)){

+ if(

+ length(which(dataend[i,c(1, 3:ncol(dataend))]

+ != dataend[i - 1,c(1, 3:ncol(dataend))])) > 0

+ ){

+ end = c(end, i-1)

+ start = c(start, i)

+ }

+ }

+ end = c(end, i)

+ startend = cbind(start, end)

+ matend = matrix(0, nr = 0, nc = ncol(dataend) + 1)

+ for(i in 1:nrow(startend)){

+ if(startend[i, 1] != startend[i, 2])



+ , apply(dataend[startend[i, 1]:startend[i, 2], 3:ncol(dataend)], 2, mean)))

+ else{



+ , dataend[startend[i, 1]:startend[i, 2], 3:ncol(dataend)]) )

+ }

+ }

+

+

+ return(matend)

+ }

3

> ########################

>

> fun.find.prob.values <- function(name, seg, ploidy, gain = TRUE){

+ sub.seg <- subset(seg, seg[, 1] == name)

+ tmp.copy <- as.numeric(sub.seg[, 6]) - ploidy[name]

+ tmp.probes <- as.numeric(sub.seg[, 5])

+ if(gain){

+ un.copy.gain <- sort(unique(tmp.copy[tmp.copy > 0]))

+ mat.copy.gain <- matrix(NA, nr = length(un.copy.gain), nc = 2)

+ for(k in 1:length(un.copy.gain))

+ mat.copy.gain[k, ] <- c(un.copy.gain[k]

+ , sum(tmp.probes[tmp.copy == un.copy.gain[k]]))

+ mat.copy.gain <- rbind(c(0, sum(tmp.probes[tmp.copy <= 0]))

+ , mat.copy.gain)

+ mat.copy.gain[, 2] <- mat.copy.gain[, 2]/sum(tmp.probes)

+ return(mat.copy.gain)

+ }else{

+ un.copy.loss <- sort(unique(tmp.copy[tmp.copy < 0]))

+ mat.copy.loss <- matrix(NA, nr = length(un.copy.loss), nc = 2)

+ for(k in 1:length(un.copy.loss))

+ mat.copy.loss[k, ] <- c(un.copy.loss[k]

+ , sum(tmp.probes[tmp.copy == un.copy.loss[k]]))

+ mat.copy.loss <- rbind(c(0, sum(tmp.probes[tmp.copy >= 0]))

+ , mat.copy.loss)

+ mat.copy.loss[, 2] <- mat.copy.loss[, 2]/sum(tmp.probes)

+ return(mat.copy.loss)

+ }

+

+ }

> ##############################################################

>

> fun.sample <- function(vals, perms){

+ mat.tmp <- matrix(NA, nr = perms, nc = length(vals))

+ for(k in 1:perms){

+ if(k %% 1000 == 0)

+ print(k)

+ for(i in 1:length(vals))

+ mat.tmp[k, i] <- sample(vals[[i]][, 1], 1, prob = vals[[i]][, 2])

+ }

+ return(mat.tmp)

+ }

> ##############################################################

>

>

> fun.find.zeros <- function(x){

+ if(length(which(x == 0)) == length(which(!is.na(x))))

4



RESEARCH

> ########################

>

> fun.find.prob.values <- function(name, seg, ploidy, gain = TRUE){

+ sub.seg <- subset(seg, seg[, 1] == name)

+ tmp.copy <- as.numeric(sub.seg[, 6]) - ploidy[name]

+ tmp.probes <- as.numeric(sub.seg[, 5])

+ if(gain){

+ un.copy.gain <- sort(unique(tmp.copy[tmp.copy > 0]))

+ mat.copy.gain <- matrix(NA, nr = length(un.copy.gain), nc = 2)

+ for(k in 1:length(un.copy.gain))

+ mat.copy.gain[k, ] <- c(un.copy.gain[k]

+ , sum(tmp.probes[tmp.copy == un.copy.gain[k]]))

+ mat.copy.gain <- rbind(c(0, sum(tmp.probes[tmp.copy <= 0]))

+ , mat.copy.gain)

+ mat.copy.gain[, 2] <- mat.copy.gain[, 2]/sum(tmp.probes)

+ return(mat.copy.gain)

+ }else{

+ un.copy.loss <- sort(unique(tmp.copy[tmp.copy < 0]))

+ mat.copy.loss <- matrix(NA, nr = length(un.copy.loss), nc = 2)

+ for(k in 1:length(un.copy.loss))

+ mat.copy.loss[k, ] <- c(un.copy.loss[k]

+ , sum(tmp.probes[tmp.copy == un.copy.loss[k]]))

+ mat.copy.loss <- rbind(c(0, sum(tmp.probes[tmp.copy >= 0]))

+ , mat.copy.loss)

+ mat.copy.loss[, 2] <- mat.copy.loss[, 2]/sum(tmp.probes)

+ return(mat.copy.loss)

+ }

+

+ }

> ##############################################################

>

> fun.sample <- function(vals, perms){

+ mat.tmp <- matrix(NA, nr = perms, nc = length(vals))

+ for(k in 1:perms){

+ if(k %% 1000 == 0)

+ print(k)

+ for(i in 1:length(vals))

+ mat.tmp[k, i] <- sample(vals[[i]][, 1], 1, prob = vals[[i]][, 2])

+ }

+ return(mat.tmp)

+ }

> ##############################################################

>

>

> fun.find.zeros <- function(x){

+ if(length(which(x == 0)) == length(which(!is.na(x))))

4

+ return(FALSE)

+ else

+ return(TRUE)

+

+ }

> ###############################################

> # Merge contiguous signififcant regions

> ###############################################

>

> fun.update.sig.regions <- function(sig.regions){

+ row.ind <- as.numeric(rownames(sig.regions))

+ start <- sig.regions[1, 2]

+ end <- sig.regions[1, 3]

+ chrom <- sig.regions[1, 1]

+ count <- 1

+ for(i in 2:length(row.ind)){

+ if(sig.regions[i, 1] == sig.regions[i - 1, 1]

+ & row.ind[i] - row.ind[i -1] == 1){

+ end[count] <- sig.regions[i, 3]

+ }else{

+ start <- c(start, sig.regions[i, 2])

+ end <- c(end, sig.regions[i, 3])

+ chrom <- c(chrom, sig.regions[i, 1])

+ count <- count + 1

+ }

+ }

+ return(cbind(chrom, start, end))

+

+ }

> ###################################################

> # Detect Genes in Significant Regions

> ###################################################

>

> fun.find.siggenes <- function(sig.regions, annot, biotype = "biotype"){

+ annot <- annot[annot[, biotype] == "protein_coding", ]

+ pos.tmp <- annot[, c("chromosome_name", "start_position", "end_position")]

+ pos.tmp <- apply(pos.tmp, 2, as.numeric)

+ genes.matrix <- matrix(NA, nr = 0, nc = ncol(annot))

+ for(i in 1:nrow(sig.regions)){

+ ww <- which( sig.regions[i, 1] == pos.tmp[, 1]

+ & ( (pos.tmp[, 2] <= sig.regions[i, 3]

+ & pos.tmp[, 3] >= sig.regions[i, 3] )

+ |(pos.tmp[, 2] <= sig.regions[i, 2]

+ & pos.tmp[, 3] >= sig.regions[i, 2])

+ |(pos.tmp[, 2] >= sig.regions[i, 2]

+ & pos.tmp[, 3] <= sig.regions[i, 3] )

5



+ return(FALSE)

+ else

+ return(TRUE)

+

+ }

> ###############################################

> # Merge contiguous signififcant regions

> ###############################################

>

> fun.update.sig.regions <- function(sig.regions){

+ row.ind <- as.numeric(rownames(sig.regions))

+ start <- sig.regions[1, 2]

+ end <- sig.regions[1, 3]

+ chrom <- sig.regions[1, 1]

+ count <- 1

+ for(i in 2:length(row.ind)){

+ if(sig.regions[i, 1] == sig.regions[i - 1, 1]

+ & row.ind[i] - row.ind[i -1] == 1){

+ end[count] <- sig.regions[i, 3]

+ }else{

+ start <- c(start, sig.regions[i, 2])

+ end <- c(end, sig.regions[i, 3])

+ chrom <- c(chrom, sig.regions[i, 1])

+ count <- count + 1

+ }

+ }

+ return(cbind(chrom, start, end))

+

+ }

> ###################################################

> # Detect Genes in Significant Regions

> ###################################################

>

> fun.find.siggenes <- function(sig.regions, annot, biotype = "biotype"){

+ annot <- annot[annot[, biotype] == "protein_coding", ]

+ pos.tmp <- annot[, c("chromosome_name", "start_position", "end_position")]

+ pos.tmp <- apply(pos.tmp, 2, as.numeric)

+ genes.matrix <- matrix(NA, nr = 0, nc = ncol(annot))

+ for(i in 1:nrow(sig.regions)){

+ ww <- which( sig.regions[i, 1] == pos.tmp[, 1]

+ & ( (pos.tmp[, 2] <= sig.regions[i, 3]

+ & pos.tmp[, 3] >= sig.regions[i, 3] )

+ |(pos.tmp[, 2] <= sig.regions[i, 2]

+ & pos.tmp[, 3] >= sig.regions[i, 2])

+ |(pos.tmp[, 2] >= sig.regions[i, 2]

+ & pos.tmp[, 3] <= sig.regions[i, 3] )

5

+ ))

+ genes.matrix <- rbind(genes.matrix, annot[ww, ])

+ }

+ return(genes.matrix)

+ }

> ##############################################################

>

> function.plot.smooth <- function(segments, cent

+ , scores.neg, scores.pos, lwd = 1

+ , sigs.neg, sigs.pos, gene.neg, gene.pos, gain.sig.pos

+ ,loss.sig.pos, smooth = TRUE

+ , ylab, ydist = 0.3){

+ if(length(which(is.na(scores.neg))) == length(scores.neg))

+ scores.neg <- rep(NA, nrow(segments))

+ if(length(which(is.na(scores.pos))) == length(scores.pos))

+ scores.pos <- rep(NA, nrow(segments))

+ if(length(which(is.na(sigs.neg))) == length(sigs.neg))

+ sigs.neg <- rep(NA, nrow(segments))

+ if(length(which(is.na(sigs.pos))) == length(sigs.pos))

+ sigs.pos <- rep(NA, nrow(segments))

+

+ if(smooth == TRUE){

+ require(zoo)

+ scores.neg <- fun.smooth.score(scores.neg, segments)

+ #scores.pos <- fun.smooth.score(scores.pos, segments)

+

+ }

+ q.thresh.pos <- scores.pos[gain.sig.pos[1]]

+ q.thresh.neg <- scores.neg[loss.sig.pos[1]]

+

+ segments <- cbind(segments, scores.neg, scores.pos, sigs.neg, sigs.pos)

+

+ # Read centromere positions:

+ cent <- cent[, 2:4]

+ cent[, 1] <- substr(cent[, 1], 4, nchar(cent[, 1]))

+ cent <- centromere[as.numeric(cent[, 1]) %in% 1:22, ]

+ cent <- apply(cent, 2, as.numeric)

+

+ chrom.length <- fun.chrom.length(segments)

+ segments <- fun.add.chrom(segments, chrom.length)

+ cent <- fun.add.chrom(cent, chrom.length)

+

+ if(length(which(is.na(gene.neg))) != length(gene.neg))

+ gene.neg <- fun.add.chrom(gene.neg, chrom.length)

+ if(length(which(is.na(gene.pos))) != length(gene.pos))

+ gene.pos <- fun.add.chrom(gene.pos, chrom.length)

6



RESEARCH

+

+ plot.matrix <- cbind( rep(segments[, 1], rep(2, nrow(segments)))

+ , sort(c(segments[, 2], segments[, 3]))

+ , rep(segments[, 4], rep(2, nrow(segments)))



+ , rep(segments[, 7], rep(2, nrow(segments))))

+

+ colnames(plot.matrix) <- c("Chrom", "Position", "scores.neg"

+ , "scores.pos", "sigs.neg", "sigs.pos")

+ plot.matrix[is.na(plot.matrix[, "scores.neg"]), "scores.neg"] <- 0

+ plot.matrix[is.na(plot.matrix[, "scores.pos"]), "scores.pos"] <- 0

+ plot.matrix[plot.matrix[, "scores.neg"] == 0, "sigs.neg"] <- NA

+ plot.matrix[plot.matrix[, "scores.pos"] == 0, "sigs.pos"] <- NA

+

+ if(length(which(is.na(scores.pos))) != length(scores.pos))

+ plot.matrix[plot.matrix[, "scores.neg"] > 0, "scores.neg"] <- 0

+ if(length(which(is.na(scores.pos))) != length(scores.pos))

+ plot.matrix[plot.matrix[, "scores.pos"] < 0, "scores.pos"] <- 0

+

+ # Assign test statistic values to significant genes

+ if(length(which(is.na(gene.neg))) != length(gene.neg)){

+ score.gene.neg <- c()

+ for(i in 1:nrow(gene.neg))

+ score.gene.neg <- c(score.gene.neg

+ , plot.matrix[which(plot.matrix[, "Position"] > gene.neg[i, 3])[1]

+ , "scores.neg"])

+ gene.neg <- cbind(gene.neg, score.gene.neg)

+

+ }

+

+ if(length(which(is.na(gene.pos))) != length(gene.pos)){

+ score.gene.pos <- c()

+ for(i in 1:nrow(gene.pos))

+ score.gene.pos <- c(score.gene.pos

+ , plot.matrix[which(plot.matrix[, "Position"] > gene.neg[i, 3])[1]

+ , "scores.pos"])

+ gene.pos <- cbind(gene.pos, score.gene.pos)

+

+ }

+

+

+ par(mar = c(5.1, 4.1, 4.1, 4.1))

+ plot(plot.matrix[, "Position"]

+ , plot.matrix[, "scores.neg"], col = "red"

+ , xaxs = "i", yaxs = "i", xaxt = "n"

7

+ , ylim = c(min(plot.matrix[, "scores.neg"]

+ , na.rm = T)- ydist, 0 )

+ , type = "n", xlab = NA, ylab = NA

+ , yaxt = "n")

+

+ axis(1, at = fun.chrom.mean(segments), labels = NA, las = 2, cex.axis = 0.8)

+

+ axis(1, at = fun.chrom.mean(segments), labels = as.character(1:22), las = 2

+ , cex.axis = 0.8, line = -0.3, tick = F)

+

+ axis(2, at = round(seq(min(plot.matrix[, "scores.neg"]

+ , na.rm = T) , 0, 0.3)*10)/10

+ , labels = NA

+ , las = 1, cex.axis = 0.8)

+ mtext(ylab, side = 2, line = 2.3)

+


+ , na.rm = T) , 0, 0.3)*10)/10

+ , labels = round(seq(min(plot.matrix[, "scores.neg"]

+ , na.rm = T)

+ , 0, 0.3)*10)/10

+ , las = 1, cex.axis = 0.8, line = -0.3, tick = F)

+

+ for(i in seq(2 , 22, 2))

+ polygon(c(min(plot.matrix[plot.matrix[, "Chrom"] == i, "Position" ]

+ , na.rm = T)

+ , max(plot.matrix[plot.matrix[, "Chrom"] == i, "Position" ]

+ , na.rm = T)


+ , na.rm = T)

+ , min(plot.matrix[plot.matrix[, "Chrom"] == i, "Position" ]

+ , na.rm = T))

+ , c(min(plot.matrix[, "scores.neg"], na.rm = T) - ydist

+ , min(plot.matrix[, "scores.neg"], na.rm = T) - ydist

+ , 0 , 0 )

+ , col = "#F0F0F0", border = NA)

+

+

+ abline(v = cent[, 2], lty = "dashed", lwd =0.1, col = "gray")

+

+

+ points(plot.matrix[, "Position"], plot.matrix[, "scores.neg"], type = "l"

+ , col = "blue", lwd = lwd)

+

+

+

8



+ , ylim = c(min(plot.matrix[, "scores.neg"]

+ , na.rm = T)- ydist, 0 )

+ , type = "n", xlab = NA, ylab = NA

+ , yaxt = "n")

+

+ axis(1, at = fun.chrom.mean(segments), labels = NA, las = 2, cex.axis = 0.8)

+

+ axis(1, at = fun.chrom.mean(segments), labels = as.character(1:22), las = 2

+ , cex.axis = 0.8, line = -0.3, tick = F)

+


+ , na.rm = T) , 0, 0.3)*10)/10

+ , labels = NA

+ , las = 1, cex.axis = 0.8)

+ mtext(ylab, side = 2, line = 2.3)

+


+ , na.rm = T) , 0, 0.3)*10)/10

+ , labels = round(seq(min(plot.matrix[, "scores.neg"]

+ , na.rm = T)

+ , 0, 0.3)*10)/10

+ , las = 1, cex.axis = 0.8, line = -0.3, tick = F)

+

+ for(i in seq(2 , 22, 2))

+ polygon(c(min(plot.matrix[plot.matrix[, "Chrom"] == i, "Position" ]

+ , na.rm = T)


+ , na.rm = T)


+ , na.rm = T)

+ , min(plot.matrix[plot.matrix[, "Chrom"] == i, "Position" ]

+ , na.rm = T))

+ , c(min(plot.matrix[, "scores.neg"], na.rm = T) - ydist

+ , min(plot.matrix[, "scores.neg"], na.rm = T) - ydist

+ , 0 , 0 )

+ , col = "#F0F0F0", border = NA)

+

+

+ abline(v = cent[, 2], lty = "dashed", lwd =0.1, col = "gray")

+

+

+ points(plot.matrix[, "Position"], plot.matrix[, "scores.neg"], type = "l"

+ , col = "blue", lwd = lwd)

+

+

+

8



+ lines(rep(gene.neg[i, 2], 1000)

+ , seq(min(plot.matrix[, "scores.neg"]) - 0.1

+ , min(plot.matrix[, "scores.neg"]) - ydist, length = 1000)

+ , col = "green")

+ }

+ abline(h = 0, col = "black", lwd = lwd)

+ abline(h = q.thresh.neg, lty = "dashed", lwd = 0.1)

+ axis(4, c(q.thresh.neg), NA, las = 2)

+ axis(4, c(q.thresh.neg), c("Q = 0.25"), las = 2, tick = F, line = -0.3)

+ abline(h = c(min(plot.matrix[, "scores.neg"], na.rm = T) - ydist, 0))

+ abline(v = c(0, max(plot.matrix[, "Position"])))

+

+ }

> ###############################################################################

>

>

> fun.chrom.length <- function(seg){

+ # This function returns the length of chromosome 1,...,22

+ # seg: first three columns of the minimum conistent region matrix

+

+ chrom.length <- c()

+ for(i in 1:22){

+ sub.chrom <- subset(seg, seg[, 1] == i)

+ chrom.length <- c(chrom.length, sub.chrom[nrow(sub.chrom), 3])

+

+ }

+ return(chrom.length)

+

+ }

> ################################################################################

>

> fun.add.chrom <- function(seg, chrom.length){

+ # This function adds the length of chromosome 1,...,(i - 1) to chromsomes 2,...,22


+ # chrom.length: output of fun.chrom.length(...)

+ if(1 %in% seg[, 1])

+ seg.tmp <- subset(seg, seg[, 1] == 1)

+ else

+ seg.tmp <- matrix(NA, nr = 0, nc = ncol(seg))

+ for(i in 2:22){

+ if(!(i %in% seg[, 1]))

+ next()


+ sub.chrom[, 2] <- sub.chrom[, 2] + sum(chrom.length[1:(i - 1)])

9



RESEARCH



+ lines(rep(gene.neg[i, 2], 1000)

+ , seq(min(plot.matrix[, "scores.neg"]) - 0.1

+ , min(plot.matrix[, "scores.neg"]) - ydist, length = 1000)

+ , col = "green")

+ }

+ abline(h = 0, col = "black", lwd = lwd)

+ abline(h = q.thresh.neg, lty = "dashed", lwd = 0.1)

+ axis(4, c(q.thresh.neg), NA, las = 2)

+ axis(4, c(q.thresh.neg), c("Q = 0.25"), las = 2, tick = F, line = -0.3)

+ abline(h = c(min(plot.matrix[, "scores.neg"], na.rm = T) - ydist, 0))

+ abline(v = c(0, max(plot.matrix[, "Position"])))

+

+ }

> ###############################################################################

>

>

> fun.chrom.length <- function(seg){

+ # This function returns the length of chromosome 1,...,22


+

+ chrom.length <- c()

+ for(i in 1:22){


+ chrom.length <- c(chrom.length, sub.chrom[nrow(sub.chrom), 3])

+

+ }

+ return(chrom.length)

+

+ }

> ################################################################################

>

> fun.add.chrom <- function(seg, chrom.length){

+ # This function adds the length of chromosome 1,...,(i - 1) to chromsomes 2,...,22


+ # chrom.length: output of fun.chrom.length(...)

+ if(1 %in% seg[, 1])

+ seg.tmp <- subset(seg, seg[, 1] == 1)

+ else

+ seg.tmp <- matrix(NA, nr = 0, nc = ncol(seg))

+ for(i in 2:22){

+ if(!(i %in% seg[, 1]))

+ next()



9


+ seg.tmp <- rbind(seg.tmp, sub.chrom)

+ }

+ return(seg.tmp)

+

+ }

> ###########################################################################

>

> fun.smooth.score <- function(score, seg){

+ # This function applies smoothing (rollmean) to chromosome 1,...,22 separetely

+ # score: score to be smoothed (e.g. correlation)

+ # seg: Corresponding positions

+ # window: window size for rollmean

+ # The R package "fTrading" is required

+ require(fTrading)

+ score.tmp <- c()

+ for(i in 1:22)

+

+ score.tmp <- c(score.tmp, rollMean(score[seg[, 1] == i]

+ , n = round(length(score[seg[, 1] == i])/20)

+ , trim = F, na.rm = TRUE))

+

+

+ return(score.tmp)

+

+

+ }

> #############################################################################

>

> fun.chrom.mean <- function(seg){

+ # This function finds the middle of the positions for each chromosome.

+ # seg: Matrix with three columns, containing the positions.

+ chrom.mean <- c()

+ for(i in 1:22){


+ chrom.mean <- c(chrom.mean

+ , (min(sub.chrom[, 2]) + max(sub.chrom[, 3]))/2)

+

+ }

+ return(chrom.mean)

+

+ }

> #######################################

>

>

>

10




+ seg.tmp <- rbind(seg.tmp, sub.chrom)

+ }

+ return(seg.tmp)

+

+ }

> ###########################################################################

>

> fun.smooth.score <- function(score, seg){

+ # This function applies smoothing (rollmean) to chromosome 1,...,22 separetely

+ # score: score to be smoothed (e.g. correlation)

+ # seg: Corresponding positions

+ # window: window size for rollmean

+ # The R package "fTrading" is required

+ require(fTrading)

+ score.tmp <- c()

+ for(i in 1:22)

+

+ score.tmp <- c(score.tmp, rollMean(score[seg[, 1] == i]

+ , n = round(length(score[seg[, 1] == i])/20)

+ , trim = F, na.rm = TRUE))

+

+

+ return(score.tmp)

+

+

+ }

> #############################################################################

>

> fun.chrom.mean <- function(seg){

+ # This function finds the middle of the positions for each chromosome.

+ # seg: Matrix with three columns, containing the positions.

+ chrom.mean <- c()

+ for(i in 1:22){


+ chrom.mean <- c(chrom.mean

+ , (min(sub.chrom[, 2]) + max(sub.chrom[, 3]))/2)

+

+ }

+ return(chrom.mean)

+

+ }

> #######################################

>

>

>

10

> fun.genes.copy <- function(x, gene.pos){

+

+

+ genes.copy <- matrix(NA, nr = nrow(gene.pos), nc = ncol(x) - 3)

+ for(i in 1:nrow(gene.pos)){

+ ww <- which( gene.pos[i, 1] == x[, 1]

+ &( (x[, 2] <= gene.pos[i, 3]

+ & x[, 3] >= gene.pos[i, 3])

+ |(x[, 2] <= gene.pos[i, 2]

+ & x[, 3] >= gene.pos[i, 2])

+ |(x[, 2] >= gene.pos[i, 2]

+ & x[, 3] <= gene.pos[i, 3])

+ ))

+ if(length(ww) > 1)

+ genes.copy[i, ] <- apply(x[ww, 4:ncol(x)], 2, median, na.rm = TRUE)

+ if(length(ww) == 1)

+ genes.copy[i, ] <- x[ww, 4:ncol(x)]

+ }

+

+ colnames(genes.copy) <- colnames(x)[4:ncol(x)]

+ rownames(genes.copy) <- rownames(gene.pos)

+ return(genes.copy)

+

+ }

> ####################################################################

>

>

> fun.mapcopy.pos <- function(chrom.pos, seg){

+ vect.final <- rep(NA, length(unique(seg[, 1])))

+ names(vect.final) <- unique(seg[, 1])

+ sub <- subset(seg, as.numeric(seg[, 2]) == chrom.pos[1])

+ sub2 <- apply(sub[, 2:6], 2, as.numeric, na.rm = T)

+ ww <- which(((sub2[, 2] <= chrom.pos[3]

+ & sub2[, 3] >= chrom.pos[ 3] )

+ |(sub2[, 2] <= chrom.pos[2]

+ & sub2[, 3] >= chrom.pos[2])

+ |(sub2[, 2] >= chrom.pos[2]

+ & sub2[, 3] <= chrom.pos[3] )

+ ))

+

+ sub.tmp <- sub[ww, ]

+ dups <-

+ duplicated(sub.tmp[, 1]

+ , fromLast = T) == T |

+ duplicated(sub.tmp[, 1]) == T

+ n.dup <- sub.tmp[!dups, ]

11



RESEARCH

> fun.genes.copy <- function(x, gene.pos){

+

+

+ genes.copy <- matrix(NA, nr = nrow(gene.pos), nc = ncol(x) - 3)

+ for(i in 1:nrow(gene.pos)){

+ ww <- which( gene.pos[i, 1] == x[, 1]

+ &( (x[, 2] <= gene.pos[i, 3]

+ & x[, 3] >= gene.pos[i, 3])

+ |(x[, 2] <= gene.pos[i, 2]

+ & x[, 3] >= gene.pos[i, 2])

+ |(x[, 2] >= gene.pos[i, 2]

+ & x[, 3] <= gene.pos[i, 3])

+ ))

+ if(length(ww) > 1)

+ genes.copy[i, ] <- apply(x[ww, 4:ncol(x)], 2, median, na.rm = TRUE)

+ if(length(ww) == 1)

+ genes.copy[i, ] <- x[ww, 4:ncol(x)]

+ }

+

+ colnames(genes.copy) <- colnames(x)[4:ncol(x)]

+ rownames(genes.copy) <- rownames(gene.pos)

+ return(genes.copy)

+

+ }

> ####################################################################

>

>

> fun.mapcopy.pos <- function(chrom.pos, seg){

+ vect.final <- rep(NA, length(unique(seg[, 1])))

+ names(vect.final) <- unique(seg[, 1])

+ sub <- subset(seg, as.numeric(seg[, 2]) == chrom.pos[1])

+ sub2 <- apply(sub[, 2:6], 2, as.numeric, na.rm = T)

+ ww <- which(((sub2[, 2] <= chrom.pos[3]

+ & sub2[, 3] >= chrom.pos[ 3] )

+ |(sub2[, 2] <= chrom.pos[2]

+ & sub2[, 3] >= chrom.pos[2])

+ |(sub2[, 2] >= chrom.pos[2]

+ & sub2[, 3] <= chrom.pos[3] )

+ ))

+

+ sub.tmp <- sub[ww, ]

+ dups <-

+ duplicated(sub.tmp[, 1]

+ , fromLast = T) == T |

+ duplicated(sub.tmp[, 1]) == T

+ n.dup <- sub.tmp[!dups, ]

11

+ vect.final[n.dup[, 1]] <- as.numeric(n.dup[, 6])

+ if(length(which(dups)) >= 1){

+ dup <- sub.tmp[dups, ]

+ val.dup <- rep(NA, length(unique(dup[, 1])))

+ names(val.dup) <- unique(dup[, 1])

+ for(kk in unique(dup[, 1])){

+ w.tmp <- which(dup[, 1] == kk)

+ tmp <- c()

+ for(ii in w.tmp)

+ tmp <- c(tmp

+ , pmax(c(chrom.pos[ 2], chrom.pos[3]))

+ - pmin(c(as.numeric(dup[ii, 3])

+ , as.numeric(dup[ii, 4]))))

+ val.dup[kk] <- dup[w.tmp, 6][which(tmp == max(tmp))[1]]

+ }

+

+ vect.final[names(val.dup)] <- as.numeric(val.dup)

+ }

+ return(vect.final)

+ }

> ################################################################

>

> fun.struct.complex <- function(sample, seg, thresh = 1e6){

+ vect.num.arm <- rep(0, length(unique(seg[, 2])))

+ names(vect.num.arm) <- as.numeric(unique(seg[, 2]))

+

+ sub.seg <- apply(subset(seg, seg[, 1] == sample)[, 2:6], 2, as.numeric, na.rm = TRUE)

+

+ for(chr in unique(sub.seg[, 1])){

+

+ sub <- subset(sub.seg, sub.seg[, 1] == chr)

+

+ mod.val <- as.numeric(names(which.max(table(rep(sub[, 5], sub[, 4])))))

+

+ chrom.length <- max(sub[, 3], na.rm = TRUE) - min(sub[, 2], na.rm = TRUE)

+ a <- sub[(sub[, 5] > mod.val | sub[, 5] < mod.val)

+ & (sub[, 3] - sub[, 2]) > thresh, ]

+ if(length(a) == 6)

+ vect.num.arm[chr] <- 1

+ if(length(a) > 6)

+ vect.num.arm[chr] <- nrow(a)

+ }

+ return(vect.num.arm)

+

+ }

> #################################################################

12



>

>

> fun.num.complex <- function(sample, seg, thresh = 0.25){

+ require(limma)

+ vect.whole.chrom <- rep(0, length(unique(seg[, 2])))

+ names(vect.whole.chrom) <- unique(seg[, 2])

+ sub.seg <- apply(subset(seg, seg[, 1] == sample)[, 2:6], 2, as.numeric, na.rm = TRUE)

+ ploidy <- weighted.median(sub.seg[, 5], w = sub.seg[, 3] - sub.seg[, 2], na.rm = TRUE)

+ sub.seg[, 5] <- sub.seg[, 5] - ploidy

+

+ for(chr in unique(sub.seg[, 1])){

+ sub <- subset(sub.seg, sub.seg[, 1] == chr)

+ tmp <- rep(sub[, 5], sub[, 4])

+ med.tmp <- median(tmp, na.rm = TRUE)

+ quant.tmp <- quantile(tmp, probs = c(thresh, 1 - thresh) , na.rm = TRUE)

+ if(med.tmp < 0 & quant.tmp[2] < -0.8)

+ vect.whole.chrom[chr] <- abs(quant.tmp)

+ if(med.tmp > 0 & quant.tmp[1] > 0.8)

+ vect.whole.chrom[chr] <- abs(quant.tmp)

+

+ }

+ return(vect.whole.chrom)

+ }

> ####################################################################

>

> fun.fisher.mutations <- function(mut.status, cin.class){

+ n1 <- sum(mut.status == 1 & cin.class == "CIN(+)")

+ n2 <- sum(mut.status == 1 & cin.class == "CIN(-)")

+ n3 <- sum(mut.status == 0 & cin.class == "CIN(+)")

+ n4 <- sum(mut.status == 0 & cin.class == "CIN(-)")

+ mat.fisher <- matrix(c(n1, n2, n3, n4), nr = 2, nc = 2)

+ return(fisher.test(mat.fisher, alternative = "greater")$p.value)

+

+ }

> ####################################################################

>

> fun.fisher <- function(l1, l2, N){

+

+ both <- intersect(l1, l2)

+ mat.fisher <- matrix(NA, nr = 2, nc = 2)

+ mat.fisher[1, 1] <- length(both)

+ mat.fisher[2, 1] <- length(l1) - length(both)

+ mat.fisher[1, 2] <- length(l2) - length(both)

+ mat.fisher[2, 2] <-

+ N - mat.fisher[1, 1] - mat.fisher[1, 2] -

+ mat.fisher[2, 1]

13

+ return(fisher.test(mat.fisher))

+ }

> #####################################################################

>

> fun.hits.loss <- function(x)

+ return(length(which(x < 0)))

>

>

>

> #################################################################

>

1.3 Load and Prepare Data

Load Data for General Use

> load("C:/Home/data_Sweave/Celllines/gene.annotation.HG18.RData")

> gene.annotation.HG18 <- as.matrix(gene.annotation.HG18)

> genes <- c("PIGN", "MEX3C", "ZNF516")

> gene.alias <- c("PIGN", "RKHD2", "ZNF516")

> genes <- cbind(genes

+ , gene.annotation.HG18[

+ match(genes, gene.annotation.HG18[, "external_gene_id"])

+ , c("chromosome_name", "start_position", "end_position")])

> centromere <- read.table("C:/Home/data_Sweave/Celllines/centromere.txt"

+ , header = FALSE)

> centromere <- as.matrix(centromere)

> cent.18q <- 16764896

Load and Prepare Sanger Cell Line Data

> load("C:/Home/data_Sweave/Celllines/cn.cl.data.RData")

> load("C:/Home/data_Sweave/Celllines/seg.colon.CL.RData")

> load("C:/Home/data_Sweave/Celllines/class.CL.RData")

> load("C:/Home/data_Sweave/Celllines/gii.weighted.colon.CL.RData")

> load("C:/Home/data_Sweave/Celllines/mat.mut.colon.CL.RData")

> load("C:/Home/data_Sweave/Celllines/expr.CL.rma.RData")

> cn.cl.data <- cn.cl.data[cn.cl.data[, 1] %in% 1:22, ]

> ploidy.colon.all.CL <- c()

> for(i in unique(seg.colon.CL[, 1])){

+ s = subset(seg.colon.CL, seg.colon.CL[, 1] == i)

+ ploidy.colon.all.CL <- c(ploidy.colon.all.CL

+ , weighted.median(as.numeric(s[, 6])

+ , w = as.numeric(s[, 4]) - as.numeric(s[, 3]) ))

+ }

> names(ploidy.colon.all.CL) <- unique(seg.colon.CL[, 1])

> mat.final.colon.CL <- function.matend(cn.cl.data)

14



RESEARCH

+ return(fisher.test(mat.fisher))

+ }

> #####################################################################

>

> fun.hits.loss <- function(x)

+ return(length(which(x < 0)))

>

>

>

> #################################################################

>

1.3 Load and Prepare Data

Load Data for General Use

> load("C:/Home/data_Sweave/Celllines/gene.annotation.HG18.RData")

> gene.annotation.HG18 <- as.matrix(gene.annotation.HG18)

> genes <- c("PIGN", "MEX3C", "ZNF516")

> gene.alias <- c("PIGN", "RKHD2", "ZNF516")

> genes <- cbind(genes

+ , gene.annotation.HG18[

+ match(genes, gene.annotation.HG18[, "external_gene_id"])

+ , c("chromosome_name", "start_position", "end_position")])

> centromere <- read.table("C:/Home/data_Sweave/Celllines/centromere.txt"

+ , header = FALSE)

> centromere <- as.matrix(centromere)

> cent.18q <- 16764896

Load and Prepare Sanger Cell Line Data

> load("C:/Home/data_Sweave/Celllines/cn.cl.data.RData")

> load("C:/Home/data_Sweave/Celllines/seg.colon.CL.RData")

> load("C:/Home/data_Sweave/Celllines/class.CL.RData")

> load("C:/Home/data_Sweave/Celllines/gii.weighted.colon.CL.RData")

> load("C:/Home/data_Sweave/Celllines/mat.mut.colon.CL.RData")

> load("C:/Home/data_Sweave/Celllines/expr.CL.rma.RData")

> cn.cl.data <- cn.cl.data[cn.cl.data[, 1] %in% 1:22, ]

> ploidy.colon.all.CL <- c()

> for(i in unique(seg.colon.CL[, 1])){

+ s = subset(seg.colon.CL, seg.colon.CL[, 1] == i)

+ ploidy.colon.all.CL <- c(ploidy.colon.all.CL


+ , w = as.numeric(s[, 4]) - as.numeric(s[, 3]) ))

+ }

> names(ploidy.colon.all.CL) <- unique(seg.colon.CL[, 1])

> mat.final.colon.CL <- function.matend(cn.cl.data)

14

> rownames(mat.final.colon.CL) <- 1:nrow(mat.final.colon.CL)

>

>

Load and Prepare Aneuploid Tumour data (BAC Arrays)

> load("C:/Home/data_Sweave/aneuploidTumors_BAC/segment.smoothed.CNA.object.RData")

> load("C:/Home/data_Sweave/aneuploidTumors_BAC/mat.final.gistic.RData")

> gistic.res <-

+ read.table(

+ "C:/Home/data_Sweave/aneuploidTumors_BAC/scores.gp_gistic.gistic.txt"

+ , sep = "\t", header = TRUE)

> gistic.res <- as.matrix(gistic.res)

> gistic.res[gistic.res[, 1] == "Del", "G.score"] <-

+ -as.numeric(gistic.res[gistic.res[, 1] == "Del", "G.score"])

> mat.pos.BAC <- mat.final.gistic[, 1:3]

> seg.BAC <- as.matrix(segment.smoothed.CNA.object$output)

> tmp <- apply(mat.pos.BAC, 1,fun.mapcopy.pos, seg = seg.BAC )

> mat.final.BAC <- cbind(mat.pos.BAC, t(tmp))

Load and Prepare TCGA Tumour data

> load("C:/Home/data_Sweave/TCGA_Tumours/dup.colon.agilent.MA.RData")

> load("C:/Home/data_Sweave/TCGA_Tumours/mat.names.RData")

> load("C:/Home/data_Sweave/TCGA_Tumours/mat.final.colon.RData")

> load("C:/Home/data_Sweave/TCGA_Tumours/seg.mat.colon.GAP.RData")

> load("C:/Home/data_Sweave/TCGA_Tumours/mat.mut.colon.TCGA.RData")

> ploidy.colon.TCGA <- c()

> for(i in unique(seg.mat.colon.GAP[, 1])){

+ s = subset(seg.mat.colon.GAP, seg.mat.colon.GAP[, 1] == i)

+ ploidy.colon.TCGA <- c(ploidy.colon.TCGA


+ , w = as.numeric(s[, 4]) - as.numeric(s[, 3])))

+ }

> DNA.index.colon.TCGA <- c()



+ DNA.index.colon.TCGA <- c(DNA.index.colon.TCGA

+ , weighted.mean(as.numeric(s[, 6])


+ }

> names(DNA.index.colon.TCGA) <- unique(seg.mat.colon.GAP[, 1])

> DNA.index.colon.TCGA <- DNA.index.colon.TCGA/2

> seg.mat.colon.GAP[, 2] <- trunc(as.numeric(seg.mat.colon.GAP[, 2]))

> names(ploidy.colon.TCGA) <- unique(seg.mat.colon.GAP[, 1])

> gii.weighted.colon.TCGA <- sapply(unique(seg.mat.colon.GAP[, 1])

+ , gii.weight.chrom, seg = seg.mat.colon.GAP

15



> rownames(mat.final.colon.CL) <- 1:nrow(mat.final.colon.CL)

>

>

Load and Prepare Aneuploid Tumour data (BAC Arrays)

> load("C:/Home/data_Sweave/aneuploidTumors_BAC/segment.smoothed.CNA.object.RData")

> load("C:/Home/data_Sweave/aneuploidTumors_BAC/mat.final.gistic.RData")

> gistic.res <-

+ read.table(

+ "C:/Home/data_Sweave/aneuploidTumors_BAC/scores.gp_gistic.gistic.txt"

+ , sep = "\t", header = TRUE)

> gistic.res <- as.matrix(gistic.res)

> gistic.res[gistic.res[, 1] == "Del", "G.score"] <-

+ -as.numeric(gistic.res[gistic.res[, 1] == "Del", "G.score"])

> mat.pos.BAC <- mat.final.gistic[, 1:3]

> seg.BAC <- as.matrix(segment.smoothed.CNA.object$output)

> tmp <- apply(mat.pos.BAC, 1,fun.mapcopy.pos, seg = seg.BAC )

> mat.final.BAC <- cbind(mat.pos.BAC, t(tmp))

Load and Prepare TCGA Tumour data

> load("C:/Home/data_Sweave/TCGA_Tumours/dup.colon.agilent.MA.RData")

> load("C:/Home/data_Sweave/TCGA_Tumours/mat.names.RData")

> load("C:/Home/data_Sweave/TCGA_Tumours/mat.final.colon.RData")

> load("C:/Home/data_Sweave/TCGA_Tumours/seg.mat.colon.GAP.RData")

> load("C:/Home/data_Sweave/TCGA_Tumours/mat.mut.colon.TCGA.RData")

> ploidy.colon.TCGA <- c()



+ ploidy.colon.TCGA <- c(ploidy.colon.TCGA



+ }

> DNA.index.colon.TCGA <- c()



+ DNA.index.colon.TCGA <- c(DNA.index.colon.TCGA

+ , weighted.mean(as.numeric(s[, 6])


+ }

> names(DNA.index.colon.TCGA) <- unique(seg.mat.colon.GAP[, 1])

> DNA.index.colon.TCGA <- DNA.index.colon.TCGA/2

> seg.mat.colon.GAP[, 2] <- trunc(as.numeric(seg.mat.colon.GAP[, 2]))

> names(ploidy.colon.TCGA) <- unique(seg.mat.colon.GAP[, 1])

> gii.weighted.colon.TCGA <- sapply(unique(seg.mat.colon.GAP[, 1])

+ , gii.weight.chrom, seg = seg.mat.colon.GAP

15

+ , ploidy = ploidy.colon.TCGA)

> tmp.final.colon <- mat.final.colon

> tmp.final.colon[, 4:ncol(tmp.final.colon)] <-

+ sweep(tmp.final.colon[, 4:ncol(tmp.final.colon)]

+ , MARGIN = 2, FUN = "-", ploidy.colon.TCGA)

>

>

>

>

2 Data Analysis

2.1 Cell Line Analysis

2.1.1 Identify Genes Located in CIN+ Loss Regions

> perms <- 1e4

> loss.values <- sapply(names(ploidy.colon.all.CL)

+ ,fun.find.prob.values

+ , seg = seg.colon.CL

+ , ploidy = ploidy.colon.all.CL

+ , gain = FALSE)

> tmp.perm.loss <- fun.sample(loss.values, perms)

[1] 1000

[1] 2000

[1] 3000

[1] 4000

[1] 5000

[1] 6000

[1] 7000

[1] 8000

[1] 9000

[1] 10000

> d.perms.loss <- d.stat(tmp.perm.loss, cl = class.CL, s0 = 0.5)$d

> tmp <- mat.final.colon.CL[, 4:ncol(mat.final.colon.CL)]

> tmp <- sweep(tmp, MARGIN = 2, FUN = "-", ploidy.colon.all.CL)

> tmp.loss <- tmp

> rm(tmp)

> tmp.loss[tmp.loss > 0] <- 0

> index.loss <- apply(tmp.loss, 1, fun.find.zeros)

> d.loss <- d.stat(tmp.loss, cl = class.CL, s0 = 0.5)$d

> d.loss[is.na(d.loss)] <- 0

> d.loss.tmp <- d.loss[index.loss]

> un.loss <- unique(d.loss.tmp)

16



RESEARCH

+ , ploidy = ploidy.colon.TCGA)

> tmp.final.colon <- mat.final.colon

> tmp.final.colon[, 4:ncol(tmp.final.colon)] <-

+ sweep(tmp.final.colon[, 4:ncol(tmp.final.colon)]

+ , MARGIN = 2, FUN = "-", ploidy.colon.TCGA)

>

>

>

>

2 Data Analysis

2.1 Cell Line Analysis

2.1.1 Identify Genes Located in CIN+ Loss Regions

> perms <- 1e4

> loss.values <- sapply(names(ploidy.colon.all.CL)

+ ,fun.find.prob.values

+ , seg = seg.colon.CL

+ , ploidy = ploidy.colon.all.CL

+ , gain = FALSE)

> tmp.perm.loss <- fun.sample(loss.values, perms)

[1] 1000

[1] 2000

[1] 3000

[1] 4000

[1] 5000

[1] 6000

[1] 7000

[1] 8000

[1] 9000

[1] 10000

> d.perms.loss <- d.stat(tmp.perm.loss, cl = class.CL, s0 = 0.5)$d



> tmp.loss <- tmp

> rm(tmp)

> tmp.loss[tmp.loss > 0] <- 0

> index.loss <- apply(tmp.loss, 1, fun.find.zeros)

> d.loss <- d.stat(tmp.loss, cl = class.CL, s0 = 0.5)$d

> d.loss[is.na(d.loss)] <- 0

> d.loss.tmp <- d.loss[index.loss]

> un.loss <- unique(d.loss.tmp)

16

> p.loss <- sapply(un.loss

+ , function(x) length(d.perms.loss[d.perms.loss <= x]))

> p.loss <- p.loss/length(d.perms.loss)

> p.loss.final <- rep(NA, length(d.loss.tmp))

> for(i in 1:length(un.loss))

+ p.loss.final[d.loss.tmp == un.loss[i]] <- p.loss[i]

> q.loss <- qvalue(p.loss.final)$qvalue

> nam.loss <- as.numeric(rownames(tmp.loss[index.loss, ]))

> stat.loss.CL <- cbind(rep(0

+ , nrow(mat.final.colon.CL))

+ , rep(1, nrow(mat.final.colon.CL)))

> stat.loss.CL[nam.loss, 2] <- q.loss

> stat.loss.CL[nam.loss, 1] <- d.loss.tmp

> tmp.dip <- mat.final.colon.CL[, 4:ncol(mat.final.colon.CL)]

> num.dip.loss.cin <- apply(tmp.dip[, class.CL == 1], 1

+ , function(x) length(which(x < 2)))

> num.dip.loss.stab <- apply(tmp.dip[, class.CL == 0], 1


> num.loss.cin <- apply(tmp.loss[, class.CL == 1], 1


> sig.loss <- mat.final.colon.CL[

+ which(num.dip.loss.cin/ncol(tmp.dip[, class.CL == 1]) >= 0.3

+ & stat.loss.CL[, 2] < 0.25

+ & num.dip.loss.stab <= 1

+ & num.loss.cin >= 10), 1:3]

> sig.loss.update <- fun.update.sig.regions(sig.loss)

> genes.loss.CL <- fun.find.siggenes(sig.loss.update, gene.annotation.HG18)

> sig.loss.all <- mat.final.colon.CL[which(stat.loss.CL[, 2] < 0.25

+ & num.loss.cin >= 10), 1:3]

> sig.loss.update.all <- fun.update.sig.regions(sig.loss.all)

> genes.loss.CL.all <- fun.find.siggenes(sig.loss.update.all

+ , gene.annotation.HG18)

> print("Number Lost Genes (Loss vs Diploid Filter Not Included)")

[1] "Number Lost Genes (Loss vs Diploid Filter Not Included)"

> print(nrow(genes.loss.CL.all))

[1] 2383

2.1.2 Estimate Structural/Numerical Complexity

> struct.complex.CL <- sapply(unique(seg.colon.CL[, 1]), fun.struct.complex, seg.colon.CL)

> struct.score.CL <- colSums(struct.complex.CL, na.rm = TRUE)

> num.complex.CL <- sapply(unique(seg.colon.CL[, 1]), fun.num.complex, seg.colon.CL)

> num.score.CL <- colSums(abs(num.complex.CL), na.rm = TRUE)

> load("C:/Home/data_Sweave/TCGA_Tumours/struct.complex.TCGA.RData")

17



> p.loss <- sapply(un.loss

+ , function(x) length(d.perms.loss[d.perms.loss <= x]))

> p.loss <- p.loss/length(d.perms.loss)

> p.loss.final <- rep(NA, length(d.loss.tmp))

> for(i in 1:length(un.loss))

+ p.loss.final[d.loss.tmp == un.loss[i]] <- p.loss[i]

> q.loss <- qvalue(p.loss.final)$qvalue

> nam.loss <- as.numeric(rownames(tmp.loss[index.loss, ]))

> stat.loss.CL <- cbind(rep(0

+ , nrow(mat.final.colon.CL))

+ , rep(1, nrow(mat.final.colon.CL)))

> stat.loss.CL[nam.loss, 2] <- q.loss

> stat.loss.CL[nam.loss, 1] <- d.loss.tmp

> tmp.dip <- mat.final.colon.CL[, 4:ncol(mat.final.colon.CL)]

> num.dip.loss.cin <- apply(tmp.dip[, class.CL == 1], 1


> num.dip.loss.stab <- apply(tmp.dip[, class.CL == 0], 1


> num.loss.cin <- apply(tmp.loss[, class.CL == 1], 1


> sig.loss <- mat.final.colon.CL[

+ which(num.dip.loss.cin/ncol(tmp.dip[, class.CL == 1]) >= 0.3

+ & stat.loss.CL[, 2] < 0.25

+ & num.dip.loss.stab <= 1

+ & num.loss.cin >= 10), 1:3]


> genes.loss.CL <- fun.find.siggenes(sig.loss.update, gene.annotation.HG18)

> sig.loss.all <- mat.final.colon.CL[which(stat.loss.CL[, 2] < 0.25

+ & num.loss.cin >= 10), 1:3]

> sig.loss.update.all <- fun.update.sig.regions(sig.loss.all)

> genes.loss.CL.all <- fun.find.siggenes(sig.loss.update.all

+ , gene.annotation.HG18)

> print("Number Lost Genes (Loss vs Diploid Filter Not Included)")

[1] "Number Lost Genes (Loss vs Diploid Filter Not Included)"

> print(nrow(genes.loss.CL.all))

[1] 2383

2.1.2 Estimate Structural/Numerical Complexity

> struct.complex.CL <- sapply(unique(seg.colon.CL[, 1]), fun.struct.complex, seg.colon.CL)

> struct.score.CL <- colSums(struct.complex.CL, na.rm = TRUE)

> num.complex.CL <- sapply(unique(seg.colon.CL[, 1]), fun.num.complex, seg.colon.CL)

> num.score.CL <- colSums(abs(num.complex.CL), na.rm = TRUE)

> load("C:/Home/data_Sweave/TCGA_Tumours/struct.complex.TCGA.RData")

17

> load("C:/Home/data_Sweave/TCGA_Tumours/num.complex.TCGA.RData")

> #struct.complex.TCGA <- sapply(unique(seg.mat.colon.GAP[, 1])

> # , fun.struct.complex, seg.mat.colon.GAP)

> struct.score.TCGA <- colSums(struct.complex.TCGA, na.rm = TRUE)

> #num.complex.TCGA <- sapply(unique(seg.mat.colon.GAP[, 1])

> # , fun.num.complex, seg.mat.colon.GAP)

> num.score.TCGA <- colSums(abs(num.complex.TCGA), na.rm = TRUE)

>

>

2.2 BAC Array Analysis

> tmp.loss <- mat.final.BAC[, 4:ncol(mat.final.BAC)]

> tmp.loss[tmp.loss >= -0.1] <- 0

> num.loss <- abs(rowSums(sign(tmp.loss), na.rm = TRUE))

> q <- 10^(-as.numeric(gistic.res[, "X.log10.q.value."]))

> g.loss <- max(as.numeric(gistic.res[gistic.res[, 1] == "Del" & q < 0.25, "G.score"]))

> rownames(mat.final.gistic) <- 1:nrow(mat.final.gistic)

> sig.loss <-

+ mat.final.gistic[which(mat.final.gistic[, "Del"] < g.loss & num.loss >= 13), 1:3]


> genes.loss.BAC <- fun.find.siggenes(sig.loss.update, gene.annotation.HG18)

> print("Number Significant Genes BAC:")

[1] "Number Significant Genes BAC:"

> print(nrow(genes.loss.BAC))

[1] 3027

18



RESEARCH

> load("C:/Home/data_Sweave/TCGA_Tumours/num.complex.TCGA.RData")

> #struct.complex.TCGA <- sapply(unique(seg.mat.colon.GAP[, 1])

> # , fun.struct.complex, seg.mat.colon.GAP)

> struct.score.TCGA <- colSums(struct.complex.TCGA, na.rm = TRUE)

> #num.complex.TCGA <- sapply(unique(seg.mat.colon.GAP[, 1])

> # , fun.num.complex, seg.mat.colon.GAP)

> num.score.TCGA <- colSums(abs(num.complex.TCGA), na.rm = TRUE)

>

>

2.2 BAC Array Analysis

> tmp.loss <- mat.final.BAC[, 4:ncol(mat.final.BAC)]

> tmp.loss[tmp.loss >= -0.1] <- 0

> num.loss <- abs(rowSums(sign(tmp.loss), na.rm = TRUE))

> q <- 10^(-as.numeric(gistic.res[, "X.log10.q.value."]))

> g.loss <- max(as.numeric(gistic.res[gistic.res[, 1] == "Del" & q < 0.25, "G.score"]))

> rownames(mat.final.gistic) <- 1:nrow(mat.final.gistic)

> sig.loss <-

+ mat.final.gistic[which(mat.final.gistic[, "Del"] < g.loss & num.loss >= 13), 1:3]


> genes.loss.BAC <- fun.find.siggenes(sig.loss.update, gene.annotation.HG18)

> print("Number Significant Genes BAC:")

[1] "Number Significant Genes BAC:"

> print(nrow(genes.loss.BAC))

[1] 3027

18

3 Figures and Tables

3.1 Produce Figures

Figure 2a

> function.plot.smooth(mat.final.gistic[, 1:3]

+ , cent = centromere

+ , scores.neg = mat.final.gistic[, 5]

+ , scores.pos = NA

+ , lwd = 1

+ , sigs.neg = NA

+ , sigs.pos = NA

+ ,gain.sig.pos = NA

+ ,loss.sig.pos = which(mat.final.gistic[, 5]

+ == max(as.numeric(gistic.res[gistic.res[, 1] == "Del"

+ & q < 0.25, "G.score"])))[1]

+ , gene.neg = NA

+ , gene.pos = NA

+ , smooth = FALSE

+ , ylab = "GISTIC Score", ydist = 0.1)

>

>

19



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

GIS

TIC

Sco

re

−0.2

Q = 0.25

Figure 2b

> function.plot.smooth(mat.final.colon.CL[, 1:3]

+ , cent = centromere

+ , scores.neg = stat.loss.CL[, 1]

+ , scores.pos = NA

+ , lwd = 1

+ , sigs.neg = NA

+ , sigs.pos = NA

+ ,gain.sig.pos = NA

+ ,loss.sig.pos = which(stat.loss.CL[, 1]

+ == max(stat.loss.CL[stat.loss.CL[, 2] < 0.25, 1]

+ , na.rm = TRUE))

+ , gene.neg = apply(genes.loss.CL[genes.loss.CL[, "chromosome_name"] == 18

+ , c("chromosome_name", "start_position", "end_position")], 2, as.numeric)

+ , gene.pos = NA

+

+ , smooth = TRUE

+ , ylab = "d-Score", ydist = 0.3)

>

20



RESEARCH

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

d−Sc

ore

−1.5

−1.2

−0.9

−0.6

−0.3

0

Q = 0.25

Figure 2e

> gene.pos <- apply(genes[, 2:ncol(genes)], 2, as.numeric)

> gene.copy <- fun.genes.copy(tmp.final.colon, gene.pos)

> rownames(gene.copy) <- genes[, 1]

> int <- intersect(colnames(gene.copy), colnames(dup.colon.agilent.MA))

> gene.copy <- gene.copy[, int]

> dup.colon.agilent.MA <- dup.colon.agilent.MA[, int]

> tmp.match <- match(colnames(gene.copy), names(gii.weighted.colon.TCGA))

> class.TCGA <- gii.weighted.colon.TCGA[tmp.match] < 0.2

> par(mfrow = c(1, 3))

> for(gene in rownames(gene.copy)){

+ gene.expr <- dup.colon.agilent.MA[

+ mat.names[which(mat.names[, 2] == gene), 1]

+ , ]

+ if(length(which(mat.names[, 2] == gene)) > 1){

+ cor.tmp <- c()

+ for(i in 1:nrow(gene.expr))

+ cor.tmp <- c(cor.tmp, cor(gene.expr[i, ]

+ , gene.copy[gene, ]

+ , method = "spearman"

21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

d−Sc

ore

−1.5

−1.2

−0.9

−0.6

−0.3

0

Q = 0.25

Figure 2e

> gene.pos <- apply(genes[, 2:ncol(genes)], 2, as.numeric)

> gene.copy <- fun.genes.copy(tmp.final.colon, gene.pos)

> rownames(gene.copy) <- genes[, 1]

> int <- intersect(colnames(gene.copy), colnames(dup.colon.agilent.MA))

> gene.copy <- gene.copy[, int]

> dup.colon.agilent.MA <- dup.colon.agilent.MA[, int]

> tmp.match <- match(colnames(gene.copy), names(gii.weighted.colon.TCGA))

> class.TCGA <- gii.weighted.colon.TCGA[tmp.match] < 0.2


> for(gene in rownames(gene.copy)){

+ gene.expr <- dup.colon.agilent.MA[

+ mat.names[which(mat.names[, 2] == gene), 1]

+ , ]

+ if(length(which(mat.names[, 2] == gene)) > 1){

+ cor.tmp <- c()

+ for(i in 1:nrow(gene.expr))

+ cor.tmp <- c(cor.tmp, cor(gene.expr[i, ]



21

+ , use = "pairwise.complete.obs"))

+

+ index <- which.max(cor.tmp)

+ cor.copy.expr <- round(1000*cor(gene.expr[index,

+ ], gene.copy[gene, ], method = "spearman"

+ , use = "pairwise.complete.obs"))/1000

+ p.copy.expr <- round(1000*cor.test(gene.expr[index, ]


+ , method = "spearman")$p.value)/1000

+ boxplot(gene.expr[index, ] ~ gene.copy[gene, ]

+ , ylab = "Gene Expression"

+ , xlab = "Copy Number Loss vs Ploidy"

+ , main = paste(gene.alias[genes == gene], "\n", "P < 0.001"

+ , sep = "\t") )

+ legend("topleft", paste("r = ", cor.copy.expr,sep = "")

+ , lty =1 , col = "red")

+ beeswarm(gene.expr[index, ] ~ gene.copy[gene, ], cex = 0.8

+ , add = TRUE, pch = 16

+ , pwcol = ifelse(class.TCGA, "black", "red"))

+ abline(lm(gene.expr[index, ]

+ ~ as.numeric(as.factor((gene.copy[gene, ]))))$coefficients

+ , col = "red")

+ }else{

+ cor.copy.expr <- round(1000*cor(gene.expr, gene.copy[gene, ]



+ p.copy.expr <- round(1000*cor.test(gene.expr

+ , gene.copy[gene,

+ ], method = "spearman")$p.value)/1000

+

+ boxplot(gene.expr ~ gene.copy[gene, ]



+ , main = paste(gene.alias[genes == gene]

+ , "\n", "P < 0.001"

+ , sep = "\t") )


+ , lty =1 , col = "red")

+ beeswarm(gene.expr ~ gene.copy[gene, ], cex = 0.8



+ abline(lm(gene.expr ~

+ as.numeric(as.factor((gene.copy[gene, ]))))$coefficients

+ , col = "red")

+ }

+ }

22



+ , use = "pairwise.complete.obs"))

+

+ index <- which.max(cor.tmp)

+ cor.copy.expr <- round(1000*cor(gene.expr[index,

+ ], gene.copy[gene, ], method = "spearman"


+ p.copy.expr <- round(1000*cor.test(gene.expr[index, ]



+ boxplot(gene.expr[index, ] ~ gene.copy[gene, ]



+ , main = paste(gene.alias[genes == gene], "\n", "P < 0.001"

+ , sep = "\t") )


+ , lty =1 , col = "red")

+ beeswarm(gene.expr[index, ] ~ gene.copy[gene, ], cex = 0.8



+ abline(lm(gene.expr[index, ]

+ ~ as.numeric(as.factor((gene.copy[gene, ]))))$coefficients

+ , col = "red")

+ }else{

+ cor.copy.expr <- round(1000*cor(gene.expr, gene.copy[gene, ]



+ p.copy.expr <- round(1000*cor.test(gene.expr

+ , gene.copy[gene,

+ ], method = "spearman")$p.value)/1000

+

+ boxplot(gene.expr ~ gene.copy[gene, ]



+ , main = paste(gene.alias[genes == gene]

+ , "\n", "P < 0.001"

+ , sep = "\t") )


+ , lty =1 , col = "red")

+ beeswarm(gene.expr ~ gene.copy[gene, ], cex = 0.8



+ abline(lm(gene.expr ~

+ as.numeric(as.factor((gene.copy[gene, ]))))$coefficients

+ , col = "red")

+ }

+ }

22

−3 −2 −1 0

−1.5

−1.0

−0.5

0.0

0.5

1.0

PIGN P < 0.001

Copy Number Loss vs Ploidy

Gen

e Ex

pres

sion

r = 0.561

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−3 −2 −1 0

−3−2

−10

1

RKHD2 P < 0.001


Gen

e Ex

pres

sion

r = 0.583

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−3 −2 −1 0

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5

ZNF516 P < 0.001


Gen

e Ex

pres

sion

r = 0.536

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

3.2 Supplementary Figures

Supplementary Figure 1a and 1b

> MIN <- c(4, 5, 6, 7, 10, 13, 18, 21, 26)

> col <- rep("black", 29)

> col[MIN] <- "red"

> CINMIN <- rep("CIN", 29)

> CINMIN[MIN] <- "MIN"

> wilcox.test(struct.score.CL/(ploidy.colon.all.CL/2) ~ CINMIN)

Wilcoxon rank sum test with continuity correction

data: struct.score.CL/(ploidy.colon.all.CL/2) by CINMIN

W = 178, p-value = 3.638e-05

alternative hypothesis: true location shift is not equal to 0

> wilcox.test(num.score.CL/(ploidy.colon.all.CL/2) ~ CINMIN)


data: num.score.CL/(ploidy.colon.all.CL/2) by CINMIN

23



RESEARCH−3 −2 −1 0

−1.5

−1.0

−0.5

0.0

0.5

1.0

PIGN P < 0.001


Gen

e Ex

pres

sion

r = 0.561

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●●

●●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−3 −2 −1 0

−3−2

−10

1

RKHD2 P < 0.001


Gen

e Ex

pres

sion

r = 0.583

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

−3 −2 −1 0

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

0.0

0.5

ZNF516 P < 0.001


Gen

e Ex

pres

sion

r = 0.536

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

3.2 Supplementary Figures

Supplementary Figure 1a and 1b

> MIN <- c(4, 5, 6, 7, 10, 13, 18, 21, 26)

> col <- rep("black", 29)

> col[MIN] <- "red"

> CINMIN <- rep("CIN", 29)

> CINMIN[MIN] <- "MIN"

> wilcox.test(struct.score.CL/(ploidy.colon.all.CL/2) ~ CINMIN)


data: struct.score.CL/(ploidy.colon.all.CL/2) by CINMIN

W = 178, p-value = 3.638e-05


> wilcox.test(num.score.CL/(ploidy.colon.all.CL/2) ~ CINMIN)


data: num.score.CL/(ploidy.colon.all.CL/2) by CINMIN

23

W = 176, p-value = 4.747e-05


>


> boxplot(struct.score.CL/(ploidy.colon.all.CL/2) ~ CINMIN

+ , ylab = "Structural Complexity", yaxt = "n", outline = F

+ , names = c("CIN(+)", "CIN(-)")

+ , ylim = c(0, max(struct.score.CL/(ploidy.colon.all.CL/2)) + 2)

+ , at = c(2, 1), main = "P < 0.0001")

> axis(2, seq(0, max(struct.score.CL/(ploidy.colon.all.CL/2)), 10)

+ , seq(0,max(struct.score.CL/(ploidy.colon.all.CL/2)), 10)

+ , las = 2)

> beeswarm(struct.score.CL/(ploidy.colon.all.CL/2) ~ CINMIN

+ , add = TRUE, pch = 16, pwcol = col, at = c(2, 1))

> boxplot(num.score.CL/(ploidy.colon.all.CL/2) ~ CINMIN

+ , ylab = "Numerical Complexity", yaxt = "n", outline = F

+ , names = c("CIN(+)", "CIN(-)")

+ , ylim = c(0, max(num.score.CL/(ploidy.colon.all.CL/2)) + 2)

+ , at = c(2, 1), main = "P < 0.0001")

> axis(2, seq(0, max(num.score.CL/(ploidy.colon.all.CL/2)), 2)

+ , seq(0,max(num.score.CL/(ploidy.colon.all.CL/2)), 2

+ ), las = 2)

> beeswarm(num.score.CL/(ploidy.colon.all.CL/2) ~ CINMIN

+ , add = TRUE, pch = 16, pwcol = col, at = c(2, 1))

24



CIN(−) CIN(+)

P < 0.0001

Stru

ctur

al C

ompl

exity

0

10

20

30

40

50

60

70

80

90

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●●●●

●

●

●

CIN(−) CIN(+)

P < 0.0001

Num

eric

al C

ompl

exity

0

2

4

6

8

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

● ●

●

●

●●● ●

●

●

●

●

●

Supplementary Figure 1c

> struct.score.norm.TCGA <- struct.score.TCGA/(ploidy.colon.TCGA/2)

> num.score.norm.TCGA <- num.score.TCGA/(ploidy.colon.TCGA/2)

> fit.norm<- sreg(round(num.score.norm.TCGA),round(struct.score.norm.TCGA))

> print(cor(num.score.norm.TCGA, struct.score.norm.TCGA, method = "spearman"))

[1] 0.4554851

> print(cor.test(num.score.norm.TCGA, struct.score.norm.TCGA, method = "spearman"))

Spearman's rank correlation rho

data: num.score.norm.TCGA and struct.score.norm.TCGA

S = 2600268, p-value < 2.2e-16

alternative hypothesis: true rho is not equal to 0

sample estimates:

rho

0.4554851

>

25

> boxplot(round(struct.score.norm.TCGA) ~ round(num.score.norm.TCGA)

+ , outline = F

+ , ylab = "Structural Complexity Score"

+ , xlab = "Numerical Complexity Score")

> beeswarm(round(struct.score.norm.TCGA) ~ round(num.score.norm.TCGA)

+ , add = TRUE, pch = 16, cex = 0.6)

> points(fit.norm$predicted$x + 1, fit.norm$predicted$y

+ , col = "red", type = "l")

>

26



RESEARCH

0 2 4 6 8 10 12 14 20 26

020

4060

8010

0

Numerical Complexity Score

Stru

ctur

al C

ompl

exity

Sco

re

●●● ●●

●

●● ●●

●

●●

●● ●●

●

● ●●

● ●

●

●

●

●●●

●

●

●

●●

● ●●

●

●

●

●● ●●●

●

●

●

●

●●

●

●

●●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

Supplementary Figure 6a

> mut.match <- match(colnames(mat.mut.colon.TCGA), names(gii.weighted.colon.TCGA))

> class.mut.TCGA <- gii.weighted.colon.TCGA[mut.match] > 0.2

> p.fisher.mut.TCGA <- apply(mat.mut.colon.TCGA, 1, fun.fisher.mutations

+ , cin.class = ifelse(class.mut.TCGA, "CIN(+)", "CIN(-)"))

> p.adj.mut.TCGA <- mt.rawp2adjp(p.fisher.mut.TCGA, proc = "BH")

> p.adj.mut.TCGA <- p.adj.mut.TCGA$adjp[order(p.adj.mut.TCGA$index), ]

> rownames(p.adj.mut.TCGA) <- names(p.fisher.mut.TCGA)

> tt1 <- table(mat.mut.colon.TCGA["TP53", ]

+ , ifelse(class.mut.TCGA

+ , "CIN(+)", "CIN(-)"))

> tt2 <- table(mat.mut.colon.TCGA["APC", ]


+ , "CIN(+)", "CIN(-)"))

> tt3 <- table(mat.mut.colon.TCGA["KRAS", ]


+ , "CIN(+)", "CIN(-)"))

> tt4 <- table(mat.mut.colon.TCGA["SMAD4", ]


+ , "CIN(+)", "CIN(-)"))

27

>

> b <- barplot(cbind(tt1, tt2, tt3, tt4), col = c("black", "red")

+ , ylab = "Number Samples", ylim = c(0, 100)

+ , yaxt = "n", xlab = NA, las = 2

+ , space = c(0.1, 0.1, 1, 0.1, 1, 0.1, 1, 0.1)

+ , names = rep(c("CIN-", "CIN+"), 4))

> legend("top", c("Mutant", "WT"), col = c("red", "black")

+ , pch = 15, bty = "n", horiz = TRUE)

> axis(2, seq(0, 80, 10), seq(0, 80, 10), las = 2)

> points(c(b[1] - 0.5, b[2] + 0.5), c(max(colSums(tt1)) + 1

+ , max(colSums(tt1)) + 1)

+ , type = "l", lwd = 2)

> text((b[1] + b[2])/2, max(colSums(tt1)) + 7.5

+ , paste("TP53\n", "P = "

+ , signif(p.fisher.mut.TCGA["TP53"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.TCGA["TP53", 2], 2) ,sep = ""))



+ , type = "l", lwd = 2)


+ , paste("APC\n", "P = "

+ , signif(p.fisher.mut.TCGA["APC"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.TCGA["APC", 2], 2), sep = ""))



+ , type = "l", lwd = 2)


+ , paste("KRAS\n", "P = "

+ , signif(p.fisher.mut.TCGA["KRAS"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.TCGA["KRAS", 2], 2), sep = ""))



+ , type = "l", lwd = 2)


+ , paste("SMAD4\n", "P = "

+ , signif(p.fisher.mut.TCGA["SMAD4"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.TCGA["SMAD4", 2], 2), sep = ""))

>

28



>




+ , space = c(0.1, 0.1, 1, 0.1, 1, 0.1, 1, 0.1)




> axis(2, seq(0, 80, 10), seq(0, 80, 10), las = 2)



+ , type = "l", lwd = 2)


+ , paste("TP53\n", "P = "

+ , signif(p.fisher.mut.TCGA["TP53"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.TCGA["TP53", 2], 2) ,sep = ""))



+ , type = "l", lwd = 2)



+ , signif(p.fisher.mut.TCGA["APC"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.TCGA["APC", 2], 2), sep = ""))



+ , type = "l", lwd = 2)



+ , signif(p.fisher.mut.TCGA["KRAS"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.TCGA["KRAS", 2], 2), sep = ""))



+ , type = "l", lwd = 2)



+ , signif(p.fisher.mut.TCGA["SMAD4"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.TCGA["SMAD4", 2], 2), sep = ""))

>

28

CIN

−

CIN

+

CIN

−

CIN

+

CIN

−

CIN

+

CIN

−

CIN

+

Num

ber S

ampl

es

Mutant WT

0

10

20

30

40

50

60

70

80TP53

P = 5e−06adj. P = 0.0051

APCP = 0.041adj. P = 1


SMAD4P = 0.74adj. P = 1

Supplementary Figure 6b

> MIN <- c(4, 5, 6, 7, 10, 13, 18, 21, 26)

> cin.class <- rep("CIN(+)", 29)

> cin.class[MIN] <- "CIN(-)"

> p.fisher.mut.CL <- apply(mat.mut.colon.CL, 1, fun.fisher.mutations

+ , cin.class = cin.class)

> p.adj.mut.CL <- mt.rawp2adjp(p.fisher.mut.CL, proc = "BH")

> p.adj.mut.CL <- p.adj.mut.CL$adjp[order(p.adj.mut.CL$index), ]

> rownames(p.adj.mut.CL) <- names(p.fisher.mut.CL)

> tt1 <- table(mat.mut.colon.CL["TP53", ], cin.class)

> tt2 <- table(mat.mut.colon.CL["APC", ], cin.class)

> tt3 <- table(mat.mut.colon.CL["KRAS", ], cin.class)

> tt4 <- table(mat.mut.colon.CL["SMAD4", ], cin.class)

>

>




+ , space = c(0.1, 0.1, 1, 0.1, 1, 0.1, 1, 0.1)

29



RESEARCH

CIN

−

CIN

+

CIN

−

CIN

+

CIN

−

CIN

+

CIN

−

CIN

+

Num

ber S

ampl

es

Mutant WT

0

10

20

30

40

50

60

70

80TP53

P = 5e−06adj. P = 0.0051

APCP = 0.041adj. P = 1


SMAD4P = 0.74adj. P = 1

Supplementary Figure 6b

> MIN <- c(4, 5, 6, 7, 10, 13, 18, 21, 26)

> cin.class <- rep("CIN(+)", 29)

> cin.class[MIN] <- "CIN(-)"

> p.fisher.mut.CL <- apply(mat.mut.colon.CL, 1, fun.fisher.mutations

+ , cin.class = cin.class)

> p.adj.mut.CL <- mt.rawp2adjp(p.fisher.mut.CL, proc = "BH")

> p.adj.mut.CL <- p.adj.mut.CL$adjp[order(p.adj.mut.CL$index), ]

> rownames(p.adj.mut.CL) <- names(p.fisher.mut.CL)

> tt1 <- table(mat.mut.colon.CL["TP53", ], cin.class)

> tt2 <- table(mat.mut.colon.CL["APC", ], cin.class)

> tt3 <- table(mat.mut.colon.CL["KRAS", ], cin.class)

> tt4 <- table(mat.mut.colon.CL["SMAD4", ], cin.class)

>

>




+ , space = c(0.1, 0.1, 1, 0.1, 1, 0.1, 1, 0.1)

29




> axis(2, seq(0, 25, 5), seq(0, 25, 5), las = 2)

> points(c(b[1] - 0.5, b[2] + 0.5), c(max(colSums(tt1)) + .5

+ , max(colSums(tt1)) + .5)

+ , type = "l", lwd = 2)


+ , paste("TP53\n", "P = "

+ , signif(p.fisher.mut.CL["TP53"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.CL["TP53", 2], 2) ,sep = ""))



+ , type = "l", lwd = 2)



+ , signif(p.fisher.mut.CL["APC"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.CL["APC", 2], 2), sep = ""))



+ , type = "l", lwd = 2)



+ , signif(p.fisher.mut.CL["KRAS"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.CL["KRAS", 2], 2), sep = ""))



+ , type = "l", lwd = 2)



+ , signif(p.fisher.mut.CL["SMAD4"], 2), "\nadj. P = "

+ ,signif(p.adj.mut.CL["SMAD4", 2], 2), sep = ""))

>

30



CIN

−

CIN

+

CIN

−

CIN

+

CIN

−

CIN

+

CIN

−

CIN

+

Num

ber S

ampl

es

Mutant WT

0

5

10

15

20

25TP53

P = 0.022adj. P = 0.84

APCP = 0.037

adj. P = 0.84


SMAD4P = 0.05

adj. P = 0.84

Supplementary Figure 8a

> xx2 <- as.list(hthgu133aALIAS2PROBE)

> affy.ids <- unlist(xx2[genes])



> mat.final.colon.CL <- cbind(mat.final.colon.CL[, 1:3], tmp)

> gene.pos <- gene.pos <- apply(genes[, 2:ncol(genes)], 2, as.numeric)

> gene.copy <- fun.genes.copy(mat.final.colon, gene.pos)

> gene.copy.CL <- fun.genes.copy(mat.final.colon.CL, gene.pos)

> rownames(gene.copy.CL) <- genes[, 1]

> gene.copy.CL[gene.copy.CL > 0] <- 0

> MIN <- c(4, 5, 6, 7, 10, 13, 18, 21, 26)

> class.cl <- rep("CIN", 29)

> class.cl[MIN] <- "MIN"


> for(gene in genes[, 1]){

+ gene.expr.CL <- expr.CL.rma[affy.ids[gene], -c(5, 21)]

+ cor.copy.expr.CL <- round(1000*cor(gene.expr.CL

+ , gene.copy.CL[gene, -c(5, 21)], method = "spearman"


31

+ p.copy.expr.CL <- round(1000*cor.test(gene.expr.CL

+ , gene.copy.CL[gene, -c(5, 21)]


+ boxplot(gene.expr.CL ~ gene.copy.CL[gene, -c(5, 21)]

+ , ylab = "Gene Expression", xlab = "Copy Number vs Median Ploidy"

+ , main = paste(gene.alias[genes[, 1] == gene], "\n", "P = "

+ , p.copy.expr.CL ,sep = "\t") , ylim = c(min(gene.expr.CL) - .5

+ , max(gene.expr.CL) + .5))

+ legend("topleft", paste("r = ", cor.copy.expr.CL,sep = "")

+ , lty =1 , col = "red")

+ beeswarm(gene.expr.CL ~ gene.copy.CL[gene, -c(5, 21)]


+ , pwcol = ifelse(class.cl[-c(5, 21)] == "CIN"

+ , "red", "black"))

+ abline(lm(gene.expr.CL ~

+ as.numeric(as.factor((gene.copy.CL[gene, -c(5, 21)]))))$coefficients

+ , col = "red")

+ }

>

>

>

32



RESEARCH

+ p.copy.expr.CL <- round(1000*cor.test(gene.expr.CL

+ , gene.copy.CL[gene, -c(5, 21)]


+ boxplot(gene.expr.CL ~ gene.copy.CL[gene, -c(5, 21)]

+ , ylab = "Gene Expression", xlab = "Copy Number vs Median Ploidy"

+ , main = paste(gene.alias[genes[, 1] == gene], "\n", "P = "

+ , p.copy.expr.CL ,sep = "\t") , ylim = c(min(gene.expr.CL) - .5

+ , max(gene.expr.CL) + .5))

+ legend("topleft", paste("r = ", cor.copy.expr.CL,sep = "")

+ , lty =1 , col = "red")

+ beeswarm(gene.expr.CL ~ gene.copy.CL[gene, -c(5, 21)]


+ , pwcol = ifelse(class.cl[-c(5, 21)] == "CIN"

+ , "red", "black"))

+ abline(lm(gene.expr.CL ~

+ as.numeric(as.factor((gene.copy.CL[gene, -c(5, 21)]))))$coefficients

+ , col = "red")

+ }

>

>

>

32

−6 −4 −2

45

67

8

PIGN P = 0.13

Copy Number vs Median Ploidy

Gen

e Ex

pres

sion

r = 0.299

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

−6 −4 −2

45

67

89

RKHD2 P = 0.005


Gen

e Ex

pres

sion

r = 0.526

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●●

●

●

●

●

−6 −4 −2

4.0

4.5

5.0

5.5

6.0

ZNF516 P = 0.01


Gen

e Ex

pres

sion

r = 0.488

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

33

supplementary information - nature...supplementary information 4 | research 8 o x o-gu a nin e l eve...

Documents