abstract barleybase is a usda-funded public repository for plant microarray data. barleybase houses...

1
Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix Barley1 and Arabidopsis ATH1 GeneChips, presently the only two available Affymetrix high-density arrays from plants, along with experiment and sample information. BarleyBase features a web-based, MIAME-compliant, experiment submission tool, BarleyExpress. BarleyExpress allows users to efficiently submit and manage their experiment descriptions, array design and expression analysis information. BarleyBase contains a broad set of query and display options at all data levels, from experiment, hybridization to probe set and probe levels. Users can query microarray elements by expression profile and by biological information of the probe sets. Probe set queries are seamlessly integrated with visualization and analysis tools such as scatter plots, the R statistical toolbox, and data filters. BarleyBase collaborates with PlantGDB and Gramene databases to perform gene prediction and cross- species comparison at the genome level using the Barley1 GeneChip exemplar sequences. BarleyBase is accessible at http://www.BarleyBase.org/ BARLEYBASE – AN EXPRESSION PROFILING DATABASE FOR CEREAL GENOMICS Xiaoyun Tang, Jian Gong, Jianqiang Xin, Lishuang Shen, Stacy Turner, Rico A. Caldo, Dan Nettleton, Roger P. Wise, Julie A. Dickerson* Virtual Reality Applications Center, Iowa State University, Ames, Iowa 50011 Acknowledgments 1. BarleyBase is funded by USDA-NRI/CGP #2002-03582; USDA-CSREES North American Barley Genome Project; USDA Initiative for Future Agriculture and Food Systems (IFAFS) #01-52100-11346. 2. PlantGDB, Gramene, KEGG, TAIR for providing tools or genomic data. 3. Many people who provided technical support and advice on BarleyBase development. Fig. 2. BarleyBase Homepage BarleyExpress Features MIAME-compliant, web-based data submission and annotation tool Experiment, array design, protocol, sample, expression submissions Enforces plant ontology in collaboration with Gramene. Uses controlled vocabulary for descriptions wherever possible First database to explicitly capture information on experiment factors and levels for presenting experiment in factorial design. Images and other supporting information can be uploaded. Minimal requirements on user’s computer skills and effort. Flexible access control for submitters to designate individuals or groups access to their private data before publication. BarleyBase Data Model BarleyBase uses a hierarchical data model to store gene expression data that is based on the Affymetrix GeneChip data formats. The highest level data structure is experiment, each of which contains one or more treatments, each treatment has one or more samples as replicates, each sample has one or more hybridizations. Protocols are associated with experiment at the hybridization level. Five types of tables: Array, Expression, Experiment, Protocol, Submitter. Follows MIAME principles recommended by MGED and implemented in MIAMExpress, but removes the Extract level and captures the information for hybridization protocol. Added statistical experimental design factors fields. Using plant ontology and controlled vocabulary in experiment description. Biological annotation for microarray probe sets and exemplars. Presently, only stores expression data from Affymetrix GeneChips. Data Access Download complete data sets for experiment annotation, raw and normalized expression data in MAGE-ML, comma- separated values (CSV), or cel-file formats. Experiment, hybridization and probe set browse & query. Query and filter probe sets by expression profiles. Search by biological criteria: annotation keywords, sequence, probe set names, pathway or gene family membership. Data set management and creation for filtered probe sets. Owner-controlled, group access to private submissions. Visualization & Analysis Visualization for experiments, hybridizations, probe sets, and probes. Data analysis uses data sets obtained from probe set filtering. Analysis methods include hierarchical clustering, k-means partitioning, PCA, SOM, and multi-dimensional scaling (MDS) Identification of differentially expressed and co-expressed genes. Most data analysis & visualizations use R and Bioconductor. Probe alignments with exemplar sequence. Gene prediction through interconnections with PlantGDB database. Cross-species comparative genomics through the Gramene database. Future Plans 1. Cross-experiment analysis. 2. Visualization and analysis tool development. 3. Barley1 exemplar annotation. BarleyExpress Submission Steps Experiment design information submission. Submit experiment factors and factor level as treatments. Batch upload raw GeneChip data. Associate raw data files with each studied treatment. Protocol submission – optional. Sample preparation details for each hybridization. Finalize experiment submission. Grant access to designated individuals and groups. Data Acquisition & Processing Experiment and expression raw data submission by submitter. BarleyBase normalizes submitted raw data. Methods are the statistical algorithm from Affymetrix MAS 5 and RMA (Robust Multi- Array Analysis) from Bioconductor. Compute summary statistics and graphs for raw and normalized expression data for summary and quality diagnostics. Store all types of data in an open-source MySQL database. BarleyBase assigns unique accession numbers to experiments, hybridizations & samples. • BarleyBase generates MAGE-ML files and CSV files for batch download. Experiment submission and associated data are available for online access and analysis. Fig. 1. BarleyBase Overview Fig. 3. Major Steps in Experiment Submission Fig. 6. Graphs for Hybridization Expression & Cluster Fig. 5. Probe Set Query and Result Visualization BarleyBase: BarleyBase.org Fig. 4. Probe Alignment with Barley1 GeneChip Exemplar Batch Download MAGE-ML Raw Data CSV BarleyBase Overview BarleyBase Data Processing Pipeline Internet User BarleyExpress MAS5.0 RMA Query & Analysis

Upload: myrtle-elliott

Post on 12-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Abstract BarleyBase is a USDA-funded public repository for plant microarray data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix

Abstract

BarleyBase is a USDA-funded public repository for plant microarray

data. BarleyBase houses raw and normalized expression data from the 22K Affymetrix Barley1 and Arabidopsis ATH1 GeneChips, presently

the only two available Affymetrix high-density arrays from plants, along with experiment and sample information.

BarleyBase features a web-based, MIAME-compliant, experiment submission tool, BarleyExpress. BarleyExpress allows users to efficiently

submit and manage their experiment descriptions, array design and expression analysis information.

BarleyBase contains a broad set of query and display options at all data

levels, from experiment, hybridization to probe set and probe levels. Users can query microarray elements by expression profile and by

biological information of the probe sets. Probe set queries are seamlessly integrated with visualization and analysis tools such as scatter plots, the

R statistical toolbox, and data filters.

BarleyBase collaborates with PlantGDB and Gramene databases to perform gene prediction and cross-species comparison at the genome

level using the Barley1 GeneChip exemplar sequences.

BarleyBase is accessible at http://www.BarleyBase.org/

BARLEYBASE – AN EXPRESSION PROFILING DATABASE FOR CEREAL GENOMICS

Xiaoyun Tang, Jian Gong, Jianqiang Xin, Lishuang Shen, Stacy Turner, Rico A. Caldo, Dan Nettleton, Roger P. Wise, Julie A. Dickerson* Virtual Reality Applications Center, Iowa State University, Ames, Iowa 50011

Acknowledgments1. BarleyBase is funded by USDA-NRI/CGP #2002-03582; USDA-CSREES

North American Barley Genome Project; USDA Initiative for Future

Agriculture and Food Systems (IFAFS) #01-52100-11346. 2. PlantGDB, Gramene, KEGG, TAIR for providing tools or genomic data.3. Many people who provided technical support and advice on BarleyBase

development.

Fig. 2. BarleyBase Homepage

BarleyExpress Features• MIAME-compliant, web-based data submission and annotation tool• Experiment, array design, protocol, sample, expression submissions• Enforces plant ontology in collaboration with Gramene.• Uses controlled vocabulary for descriptions wherever possible• First database to explicitly capture information on experiment factors

and levels for presenting experiment in factorial design.• Images and other supporting information can be uploaded.• Minimal requirements on user’s computer skills and effort.• Flexible access control for submitters to designate individuals or groups

access to their private data before publication.

BarleyBase Data Model • BarleyBase uses a hierarchical data model to store gene expression data that is based on the

Affymetrix GeneChip data formats.• The highest level data structure is experiment, each of which contains one or more treatments,

each treatment has one or more samples as replicates, each sample has one or more hybridizations. • Protocols are associated with experiment at the hybridization level.• Five types of tables: Array, Expression, Experiment, Protocol, Submitter.• Follows MIAME principles recommended by MGED and implemented in MIAMExpress, but

removes the Extract level and captures the information for hybridization protocol.• Added statistical experimental design factors fields.• Using plant ontology and controlled vocabulary in experiment description.• Biological annotation for microarray probe sets and exemplars. • Presently, only stores expression data from Affymetrix GeneChips.

Data Access • Download complete data sets for experiment annotation, raw and

normalized expression data in MAGE-ML, comma-separated values (CSV), or cel-file formats.

• Experiment, hybridization and probe set browse & query. • Query and filter probe sets by expression profiles.• Search by biological criteria: annotation keywords, sequence, probe

set names, pathway or gene family membership.• Data set management and creation for filtered probe sets.• Owner-controlled, group access to private submissions.

Visualization & Analysis

• Visualization for experiments, hybridizations, probe sets, and probes.• Data analysis uses data sets obtained from probe set filtering.• Analysis methods include hierarchical clustering, k-means

partitioning, PCA, SOM, and multi-dimensional scaling (MDS)• Identification of differentially expressed and co-expressed genes.• Most data analysis & visualizations use R and Bioconductor.• Probe alignments with exemplar sequence.• Gene prediction through interconnections with PlantGDB database. • Cross-species comparative genomics through the Gramene database.

Future Plans1. Cross-experiment analysis.2. Visualization and analysis tool development.3. Barley1 exemplar annotation.

BarleyExpress Submission Steps• Experiment design information submission.• Submit experiment factors and factor level as treatments.• Batch upload raw GeneChip data.• Associate raw data files with each studied treatment.• Protocol submission – optional.• Sample preparation details for each hybridization.• Finalize experiment submission.• Grant access to designated individuals and groups.

Data Acquisition & Processing

• Experiment and expression raw data submission by submitter.• BarleyBase normalizes submitted raw data. Methods are the statistical algorithm from

Affymetrix MAS 5 and RMA (Robust Multi-Array Analysis) from Bioconductor.• Compute summary statistics and graphs for raw and normalized expression data for

summary and quality diagnostics.• Store all types of data in an open-source MySQL database.• BarleyBase assigns unique accession numbers to experiments, hybridizations &

samples.• BarleyBase generates MAGE-ML files and CSV files for batch download.• Experiment submission and associated data are available for online access and

analysis.

Fig. 1. BarleyBase Overview

Fig. 3. Major Steps in Experiment Submission

Fig. 6. Graphs for Hybridization Expression & Cluster

Fig. 5. Probe Set Query and Result Visualization

BarleyBase: BarleyBase.org

Fig. 4. Probe Alignment with Barley1 GeneChip Exemplar

Batch DownloadMAGE-MLRaw Data

CSV

BarleyBase Overview

BarleyBaseData Processing

Pipeline

Internet

User

BarleyExpress MAS5.0RMA

Query & Analysis