using geworkbench: working with sets of data fan lin, ph. d. molecular analysis tools knowledge...

21
Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard

Upload: monica-dickerson

Post on 19-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Using geWorkbench:Working with

Sets of DataFan Lin, Ph. D.

Molecular Analysis Tools Knowledge Center

Columbia University

and

The Broad Institute of MIT and Harvard

Page 2: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Background

geWorkbench makes extensive use of the notion of sets: it allows the full set of markers or arrays/phenotypes to be divided into different subsets.

The multiple different subsets of the data allows the same data to be characterized and analyzed in different ways in geWorkbench.

Page 3: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

geWorkbench offers two different way to group data:

1. Individual markers or arrays can be grouped into sets :

► Sets can be defined by the user, or may be created as a result of an analysis.

► Sets of arrays can be used to distinguish between different experimental states, for example as part of a statistical analysis.

♦ The t-test requires two states, represented by sets, be defined for comparison.

► Sets of markers are returned from various analysis routines. For example the t-test returns a list of markers showing significant differential expression, and after hierarchical clustering, the markers in a subtree of the resulting dendrogram can be saved.

2. Sets of markers or arrays are grouped into collections. A collection named “Default” is automatically created by geWorkbench.

Different types of data grouping

Page 4: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Overview

►How to create a set of markers or arrays.

► How to mark a set of arrays as "Active“.

► How to classify a set of arrays, e.g. as "case" vs. "control".

► How to deactivate a data set from data analysis.

►How to group markers or arrays in different ways with descriptive tags.

In this presentation you will learn

Page 5: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Sets of Markers or Arrays Overview

Individual markers (genes) or arrays can be grouped into Set.

A Set of markers or array can be used to dissect the potentially massive expression data into more manageable chunks.

Page 6: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Sets of Markers or ArraysSample data

To demonstrate how to create sets of markers or arrays, we will use the samples data from a congestive cardiomyopathy experiment, which are found in geWorkbench tutorial data section:

http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_Data

we will load 10 individual Affymetrix MAS5 format files (all beginning with JB-) and merge them into a single dataset as our sample data.

Page 7: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Sets of Markers or ArraysSample data preparation

2. Next, right-click on the new Project entry and select Open Files.

1. Create a Project. All data must belong to a project. Right-click on the Workspace entry in the Project Folders window at upper left to create a new project.

To load the sample data, following steps below:

Page 8: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Sets of Markers or Arrays Sample data preparation

3. Select file type Affymetrix MAS5/GCOS as shown.

4. Make sure to check the Merge files checkbox.

5. Select 10 MAS5 format text files from the tutorial data directory.

6. Click Open.

The chip type HG_U95Av2 is recognized...

Page 9: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

The merged dataset is now listed in the Project folder. The data is displayed, in single array format, in the Microarray Viewer. Note we have increased the intensity slider to maximum here.

Sets of Markers or Arrays Sample data preparation

Page 10: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

In this example, we will create two sets of array data for disease and normal states and leave them in the Default collection.

1. In the Arrays/Phenotypes component, select the six arrays beginning with JB-ccmp, which represent the samples from the congestive cardiomyopathy disease state.

2. Right click, select Add to Set.

Sets of Markers or ArraysAssigning arrays to sets

1

2

First Select and label arrays which contain samples from the congestive cardiomyopathy disease state:

Page 11: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

3. Enter "CCMP" in the input box and click OK.

4. Next, similarly label the arrays beginning with JB-n as "Normal“.

The Array/Phenotype Sets component will now show the two sets added:

4

Sets of Markers or ArraysAssigning arrays to sets

3

Page 12: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Sets of Markers or ArraysActivating sets

The boxes next to the set name can be checked to indicate that a setof arrays is "Active". Various analysis and visualization components can be set to only use/display activated arrays or markers.

Note – if no Array sets are explicitly activated, then all Array are implicitly active. The same applies to Marker .

Page 13: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

For statistical tests such as the t-test, Case and Control groups can be specified.

1. Left-click on the thumb-tack icon in front of the phenotype name.

2. Select Case to specify the disease arrays as the "Case". The remaining "Normal" arrays are by default considered Control.

Sets of Markers or ArraysClassifying data set for statistical tests

1

2

Page 14: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

3. A red thumbtack indicates an array set has been marked as "Case".

3

Sets of Markers or ArraysClassifying dataset for statistical tests

Page 15: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Sets of Markers or ArraysDeactivate a data set

To deactivate a set, click on the set and the selected set will be highlighted. Then perform one of the following actions:

Right-clicking on the set and then select Deactivate

Unselecting the checkbox next to the set

Through the main menu, select Commands Panel> Deactivate Panel

Page 16: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Collection of Sets Overview

There could be different grouping requirements of the same arrays in the Arrays/Phenotypes and Marker components. geWorkbench uses Collections to hold sets of arrays or markers to facilitate a better data management.

Different collection of sets can be made, both for Markers and for Arrays. They may differ in membership or in how members are named (e.g. amount of detail).

The collection of sets in geWorkbench offers a highly efficient way for users to manage sets of data with descriptive tags.

Page 17: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Collection of SetsCreating a new collection

Both Marker and Array/Phenotypes tab have two sections in the GUI: the upper frame lists the full data set, and the lower frame lists any user-defined groupings.

geWorkbench automatically creates a default collection “Default ” to hold sets of data.

To create a new collection for the array, click on the New button on Array/Phenotype Sets located at the lower left in the application (arrow labeled New).

The drop down collection list (arrow on the left) will be updated to reflect the addition in the collection.

.

New

Page 18: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Collection of SetsExamples of array collections

► Here we show how several different collections are defined in the example data file "Bcell-100.exp”, which can be found in geWorkbench’s tutorial data (Bcell-100.zip).

(http://wiki.c2b2.columbia.edu/workbench/index.php/Tutorial_-_Data)

► After loading this file into geWorkbench as type "Affymetrix File Matrix", four collections of sets can be seen in the Arrays/Phenotypes group pull-down menu at right.

Page 19: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

If we choose the collection called "Class", the sets of arrays at right are displayed:

Collection of Sets Examples of array collections

Page 20: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

If instead we choose the collection “Source detailed", a different collection of sets of the same arrays is seen:

Collection of Sets Examples of array collections

Page 21: Using geWorkbench: Working with Sets of Data Fan Lin, Ph. D. Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT

Need More Information?

NCI is developing an extensive knowledge base to support various NCI molecular

analysis tools. Visit us at NCI’s Molecular Analysis Tool Knowledge center at: https://cabig-kc.nci.nih.gov/MediaWiki/index.php/Main_Page.

• For more information on how to use geWorkbench, please visit NCI Knowledge Center, geWorkbench section at : https://cabig-kc.nci.nih.gov/Molecular/KC/index.php/GeWorkbench .

• Have a geWorkbench related question? Find the answers in geWorkbench FAQ section at: https://cabig-kc.nci.nih.gov/Molecular/KC/index.php/GeWorkbench_FAQ.

• New more helps? Post it in geWorkbench Forum at : https://cabig-kc.nci.nih.gov/Molecular/forums/viewforum.php?f=3 .