caarray: juli klemm (ncicb)

14
Support for MAGE-TAB in caArray 2.0 Overview and feedback MAGE-TAB Workshop January 24, 2008

Upload: niranabey

Post on 08-Jul-2015

741 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: caArray: Juli Klemm (NCICB)

Support for MAGE-TAB in

caArray 2.0Overview and feedback

MAGE-TAB Workshop

January 24, 2008

Page 2: caArray: Juli Klemm (NCICB)

Agenda

• Brief overview of caArray 2.0

• caArray 2.0 and MAGE-TAB• MAGE-TAB feedback

Page 3: caArray: Juli Klemm (NCICB)

What is caArray?

• caArray is a caBIG™-compliant microarray data repository at the NCICB

• Developed to support a federated model of microarray data sharing• Developed in line with MIAME and MAGE guidelines

caArray 1.6 caArray 2.0

Page 4: caArray: Juli Klemm (NCICB)

Goals of caArray 2.0

• Address Adopter feedback gained from our 1.x experience

• Improve the user experience for storing and retrieving data produced• Simplify and improve the performance of data access through the API and

grid service, for analytical applications• Harmonize with caBIG™ tissue repository (caTissue) and annotation

repository (caBIO)

• Support additional array platforms, including SNP arrays

• Organize the application around workflow between investigators and the labs that serve them

• Use an agile software development approach that will allow more frequent feature additions and better responsiveness to the user community

Page 5: caArray: Juli Klemm (NCICB)

Features of caArray 2.0

• Store array data associated with experiment and sample annotations

• Data entry through graphical user interface or MAGE-TAB• Parse Affymetrix, Illumina and GenePix formats for expression and SNP

arrays• Role-based permissions for data access

• Programmatic access via a Java API and grid service

• Manage protocols and controlled vocabularies• MGED Ontoloty 1.3.1 comes pre-loaded

• Basic Browse and Search Functionality

Page 6: caArray: Juli Klemm (NCICB)

caArray 2.0 Annotations

• Capture information for

• Experiment information• Contacts• Publications

• Sample Annotations• Source• Sample• Extract• Labeled Extracts• Hybridizations

Page 7: caArray: Juli Klemm (NCICB)

caArray 2.0 supported formats

Parsable file formats• Annotation

• MAGE-TAB .ADF, IDF, SDRF• Array data - parsed

• Affymetrix Expression and SNP• . CDF, .CEL, .CHP

• Illumina Expression and SNP• .CSV

• GenePix• .GAL, .GPR

Unparsed formats• Affymetrix: .dat, .exp, .rpt, .txt• Illumina: .txt, .idat• Agilent: .txt, .tsv• ImaGene: .txt, .tiv• Nimblegen: .txt, .gff

Page 8: caArray: Juli Klemm (NCICB)

caArray 2.0 permissions

• Role-based permissions for each Installation

• Anonymous user• System Administration• Principle investigator/Biostatistician/Lab Administrator/Lab Scientist

• Data is Private until made Public

• Experiment title, PI, # samples are visible but experiment content is not available to the anonymous user

• Collaboration groups can be managed by the PI for pre-public collaboration

• CSM 4.0• Experiment-level and samples-level security

Page 9: caArray: Juli Klemm (NCICB)

caArray 2.0 API and Grid Service

• Support for MAGE-TAB level of annotation – Simplified implementation of MAGE

• API provides a data service and analytical services• Data service allows users to use CQL to issue queries that traverse the

domain model

• Analytical services provide convenience methods for data access

Page 10: caArray: Juli Klemm (NCICB)

caArray 2.0 browse and search

•Browse by• Experiments• Organism• Provider• Array design

•Search by specifying• Keyword• Category

Page 11: caArray: Juli Klemm (NCICB)

MAGE-TAB in caArray 2.0

• Support MAGE-TAB v1.0 – ADF, IDF, SDRF

• Term Source providers and associated Terms are captured as Controlled Vocabularies (Manage Vocabularies)

• Protocols imported and viewable in Manage Protocols• Characteristics displayed on the relevant detail pages

• Original files are stored in association with the Experiment

• Edits made to the information in the UI are not reflected in these files• Future feature – MAGE-TAB export based on current database values

Page 12: caArray: Juli Klemm (NCICB)

MAGE-TAB for data migration

caArray 1.6 >> caArray 2.0

• Experiments in caArray 1.6 being migrated to 2.0 are being exported in MAGE-TAB format along with the associated native array data files

• Challenges included• MAGE-OM >>MAGE-TAB mapping

• Most challenges due to validation that all data “made it” over (not really a MAGE-TAB issue)

• Manual checking still needed

Jackson Labs internal MAD database >> caArray 2.0

Page 13: caArray: Juli Klemm (NCICB)

MAGE-TAB Feedback

• Initial experience with end-user-type customers is that there is a learning curve associated with using the SDRF, especially with regard to applying controlled vocabularies

• Need tools to facilitate this• Source vs. Sample vs. Extract vs. Labeled Extract

• Often confusion over “what goes where”

• From Jackson Labs:

• Documentation is good for a biologist-type end-user, but software engineer would like more detail

• More real-life examples would be helpful

Page 14: caArray: Juli Klemm (NCICB)

Specific requests to consider

• Need a way to specify required fields for particular implementations

• caArray UI has certain required fields – need to be able to specify these in a MAGE-TAB template

• Associate “Supplemental” files with an experiment• In IDF, recommend adding a field to specify the type of array experiment

(Gene Expression, SNP, aCGH, etc.)