flow data analysis challenges deck from amgen attendees bioinformatics/ biostatistics molecular...

20
Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng Su Katie Newhall Hugh Rand Bill Rees Mark Dalphin Gary Means Wednesday, September 20, 2006

Upload: julian-lang

Post on 16-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

Flow Data AnalysisChallenges Deck from Amgen Attendees

Bioinformatics/ Biostatistics MolecularComputational Biology Sciences

John Gosink Cheng Su Katie Newhall

Hugh Rand Bill Rees

Mark Dalphin Gary Means

Wednesday, September 20, 2006

Page 2: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 2

Sample and meta-data tracking can be complicated

Misccytokines

Miscdrugs

Stimulation/inhibitioncombinations

Bloodsamples

Multiple celltypes

FCSfiles

FSC-H SSC-H FL1-H FL2-H FL3-H FL1-A FL4-H Time44 25 65 63 0 0 53 0

196 143 90 110 0 0 74 1211 129 98 97 3 0 48 172 74 109 25 0 2 20 287 22 153 72 0 13 58 2

173 144 94 139 0 0 72 2

Cellevents

1,000 – 10,00010 – 100

5 – 1010,000 – 100,000

5 – 20

blood samplesstimulations / samplecell types/mixcell events/ cell typechannels/cell

5,000 samples x 50 stims/sample x 7 cell-types/cocktail x 5 Mbytes/FCS file

10 terabytes

An FCS file

Approx. the size of an AffymetrixMicroarray .CEL file

Need a relational database and associated code infrastructure

John Gosink, Bioinformatics/Computational Biology, Amgen

Page 3: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 3

Some meta-data that we need to capture, store, and index (let alone the actual FCS files/data)

• Sample meta-data• Sample ID• Sample to well mapping• Stimulation conditions• Dilutions

• Reagent meta-data• Reagent batches• Labeling scheme

• Machine meta-data (FCS format currently captures most)

• measurement windows• PMT settings• compensation (and matrix)• transformation

• Gating parameters• coordinates• thresholds• gate hierarchy

John Gosink, Bioinformatics/Computational Biology, Amgen

Page 4: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 4

More interesting questions involve natural cell populations and their variation

• Catalog of all cell types- What are their distributions in all of flow parameter space- How to standardize between samples and runs

• What are fruitful approaches to characterizing these distributions-Baseline catagorization

- Number of “typical” cell volumes (archetypes)- Location of archetypes- Shapes of archetypes- Relationships of cell counts in the archetypes

- Characterization of the “void”- How empty is the void- How smooth is the void

• Detection of novel (sub) populations and unforseen changes

John Gosink, Bioinformatics/Computational Biology, Amgen

Page 5: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 5

Question: How do we best quantitate multiple overlapping peaks

One Approach: Fit peaks as a sum of small numbers of basis set functions.

Issues: Basis set choice, sensitivity, accuracy, …

Separation of Overlapping Peaks

Hugh Rand, Bioinformatics/Computational Biology, Amgen

Page 6: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 6

Example histograms

Noisy Overlap

Small Peaks

Shape

More Overlap

Hugh Rand, Bioinformatics/Computational Biology, Amgen

Page 7: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 7

Receptor Occupancy Assay and Analysis

Unlabeled Drug AbLabeled Drug AbLabeled Recpt AbLabeled Isotype Ctrl Ab

Unlabeled Ab @0 – sat’d dose

FlowCytometer

LabeledAb

Labeledanti-recpt Ab

Labeledisotype ctrl

Cell with specificand non-specific receptors

Cell with specificand non-specific receptors.

Ab induces more recpt.

3

31FracBound

6

21FracBoundSome Drug in AnimalNo Drug in Animal

Mark Dalphin, Bioinformatics/Computational Biology, Amgen

Page 8: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 8

Some math…Simple form, without non-specific binding

0

011

R

Rd

D

Dd

Rd

Dd

FFFF

N

NOccupancy

Add non-specific binding and things are not so tidy

)(

)(1

,,

0

0

000

IdRdRd

IdDdDd

R

I

I

IdId

R

RdRd

D

DdDd

PPfP

PPfPOccpancy

N

Nf

F

FP

F

FP

F

FP

Mark Dalphin, Bioinformatics/Computational Biology, Amgen

Page 9: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 9

Problems with receptor occupancy assays

Even with 1:1 conjugates, MFI varies significantly from Ab to Ab against the same receptor

“Can’t see less than 1,000 receptors per cell”

Large variability from instrument to instrument and run to run

Why doesn’t this behave like a well-controlled physical experiment; why is it “semi-quantitative”?

I’d like to see:– Easy loading of data-sets and meta-data– Module to compute occupancy– Some way to look at associated binding curves

Mark Dalphin, Bioinformatics/Computational Biology, Amgen

Page 10: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 10

Gating Sensitivity

If gates change slightly, will results change? Reasons for considering gating sensitivity:

– Quantitative analysis of the responses– Gating is done per individual samples– Gating is somewhat subjective, even auto-gating– Multiple gates used– Subgroups of small size

Cheng Su, Biostatistics, Amgen

Page 11: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 11

Gating Sensitivity Analysis

Sensitivity Analysis– Get new gates by moving the boundary of gates– Conduct analysis– Compare the results

Challenges– software/system: to import the gate boundary

– methodology: methods to automate gate movement

and compare results

Cheng Su, Biostatistics, Amgen

Page 12: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 12

System Outline

Samples LSRII

XML FCS Gating

B Cells

T Cells

NK Cells

Analysis(R,SAS,Java,…)

Result

Checkagainst

Cheng Su, Biostatistics, Amgen

Page 13: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 13

How to move what we do in proprietary graphical tools into a more high-throughput environments?

Question: Are there applications available that can accommodate the size of FCS files that I generate, allow me to compare data across a plate, and provide data output in an acceptable format?

Problem: Currently using a 9-color, 12-parameter antibody panel in whole blood (and it’s only getting bigger!)– FCS file size = 10,000 to 30,000 KB– Analysis time = 8 hours for 32 samples/wells– Export time = 20-30 minutes for 32 FCS files– Output = at least 7 gated files for each FCS file

Katie Newhall, Molecular Sciences, Amgen

Page 14: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 14

How to move what we do in proprietary graphical tools into a more high-throughput environments?

Potential solutions– Analysis

• Automated gating• Sample flagging• Comparison of samples across a plate• Output of histogram statistics in an excel format

– Export time• Gating information and experimental metadata

exported with FCS/TXT files

Katie Newhall, Molecular Sciences, Amgen

Page 15: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 15

immunophenotyping

experiment: – 80 clinical whole blood samples– no ex vivo manipulation– 4 dose cohorts– 38 3-color, RBClyse/no-wash stains– 3280 6-parameter FCS files

What populations of events change in some way as a function of drug dose or disease state or changes in other populations?

Bill Rees, Molecular Sciences, Amgen

Page 16: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 16

An immunophenotyping panel FI TC PE PerCP APC gate

1 CD4 CCR4 CD45 CCR5 L + G 2 CD4 CCR7 CD45 CCR5 L + G 3 CD4 CCR8 CD45 CCR5 L + G 4 CD4 CXCR3 CD45 CCR5 L + G 5 CD4 CD25 CD45 CCR5 L 6 CD4 CD27 CD45 CCR5 L 7 CD4 CD28 CD45 CCR5 L 8 CD4 CD38 CD45 CCR5 L 9 CD4 CD54 CD45 CCR5 L

10 CD4 KI R CD45 CCR5 L 11 CD4 CD161 CD45 CCR5 L 12 CD4 CD212 CD45 CCR5 L 13 CD4 HLA-DR CD45 CCR5 L 14 CD4 CCR6 CD45 CCR5 L 15 CD45RA CD4 CD45 CCR5 L 16 CD244 CD4 CD45 CCR5 L 17 CD26 CD8 CD45 CCR5 L 18 CLA CD8 CD45 CCR5 L 19 CD94 CD8 CD45 CCR5 L 20 CD8 CD161 CD45 CCR5 L 21 CD8 NKG2D CD45 CCR5 L 22 CD8 4-1BB-L CD45 CCR5 L 23 CD8 CD30 CD45 CCR5 L 24 CD8 CD70 CD45 CCR5 L 25 CD20 CD54 CD45 CCR5 L 26 CD20 CD69 CD45 CCR5 L 27 CD4 CD152 CD45 CCR5 L 28 CD45RA CD152 CD45 CD56 L 29 CD8 CD152 CD45 CD3 L 30 I gD CD152 CD45 CD19 L 31 I gM CD152 CD45 CD19 L 32 CD27 CD152 CD45 CD19 L 33 CLA CD152 CD45 CD19 L 34 CD16 CD56 CD45 CD19 L 35 CD14 CD80 CD45 CCR5 M 36 CD14 CD86 CD45 CCR5 M 37 CD14 HLA-DR CD45 CCR5 M 38 CD16 CD11b CD45 CCR5 G

T cells

B cells

NK cellsmonocytes

Bill Rees, Molecular Sciences, Amgen

Page 17: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 17

Immunophenotyping I will not deal with this 2-dimensions at a time

– time

– too many populations in each stain, only some do I know to look for– don’t know what I’m looking for with minimal biological insight

Issues:– definitions of terms– Metrics, e.g. MFI and %CD45+ events, % responders– Linking raw data to other study data/protocols and to analysis product– Autogating with visual QC– Can the identification of the major cell types (operationally defined by

robust stains, e.g. CD3+ CD8+ CD56-) be automated to incrementally reduce the analysis time?

Bill Rees, Molecular Sciences, Amgen

Page 18: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 18

Whole blood stimulation assays where leukocytes are evaluated for phosphoprotein pathway activation inhibition

Note: This is the region where notes could be placed

Specimen_001_C1_C01.fcs

CD8/CD33

210

310

410

510

Par

amet

er 2

0

65536

131072

196608

262144

grans

CD33+ Specimen_001_C1_C01.fcs

CD3

210

310

410

CD

56

210

310

410

CD3+

CD3+/CD56+

Bcells

CD56+

70.52%25.82%

0.21%3.45%Specimen_001_C1_C01.fcs

CD8/CD33

310

410

510

CD

4

210

310

410

510

DN

CD4+

CD8+

66.04% 0.31%

5.12% 28.54%

Specimen_001_C1_C01.fcs

CD45RO

310

410

510

CD

4

210

310

410

510

CD4+ mem

0.00%0.00%

29.34%70.66%

Specimen_001_C1_C01.fcs

CD45RO

310

410

510

CD

8/C

D33

210

310

410

510

CD8+ mem

86.13% 13.87%

0.00% 0.00%

Gary Means, Molecular Sciences, Amgen

Page 19: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 19

Process Cells

Whole blood

Stimulate LabelingFlow

SampleData File

Soft-ware

T cell

Granulocyte

NK B cell

Monocyte

DN

Lymphocyte

CD4+/CD8+ CD8+CD4+

CD8+memory

CD4+memory

Use bioinformatics tools to evaluate

coordinate regulation of at multiple different

intracellular targets

11 gates x 4 targets x 96 wells11 gates x 4 targets x 96 wells

Problem?Each set of gated data must be independently exported and kept linked to the experimental process

metadata

Gary Means, Molecular Sciences, Amgen

Page 20: Flow Data Analysis Challenges Deck from Amgen Attendees Bioinformatics/ Biostatistics Molecular Computational Biology Sciences John Gosink Cheng SuKatie

For Internal Use Only. Amgen Confidential. 20

Automatically export events with additional columns which contain all of the gating information associated with each event.

Metadata must be inextricably associated with the experimental results.

Solutions?

Gary Means, Molecular Sciences, Amgen