www.fludb.org
Influenza Research Database (IRD) & Virus Pathogen Resource (ViPR) Bioinformatics Resource Centers (BRCs)
and support for Systems Biology data
06 November 2011
Richard H. Scheuermann, Ph.D.
Department of Pathology
U.T. Southwestern Medical Center
www.viprbrc.org
www.fludb.orgViPR Overview
www.viprbrc.org
www.fludb.orgIRD Overview
www.fludb.org
www.fludb.orgData Summary & Sources
www.fludb.orgSearch Access to Data
www.fludb.org
www.fludb.orgData Types
www.fludb.orgAnalysis and Visualizationwww.fludb.org
www.fludb.orgAnalysis and Visualization Tools
www.fludb.orgWorkbench Access
www.fludb.org
www.fludb.orgData Submission
www.fludb.org
www.fludb.orgData Submission Page
www.fludb.org
SYSTEMS BIOLOGY “OMICS” DATA
www.fludb.org
Overview of Systems Biology & DBP Projects
• Four systems biology groups funded by NIAID, including:– Systems Virology (Michael Katze group, Univ. Washington)
• Influenza H1N1 and H5N1 and SARS Coronavirus• statistical models, algorithms and software, raw and processed gene expression
data, and proteomics data
– Systems Influenza (Alan Aderem group, Institute for Systems Biology)• various Influenza viruses• microarray, mass spectrometry, and lipidomics data
• ViPR Driving Biological Projects– Abraham Brass, Mass. General Hospital
• Dengue virus host factor database from RNAi screen
– Lynn Enquist / Moriah Szpara, Princeton University• Deep sequencing and neuronal microarrays for functional genomic analysis of
Herpes Simplex Virus
www.fludb.org
Andrew R. Joyce & Bernhard Ø. Palsson, Nature Reviews Molecular Cell Biology 7, 198-210 (March 2006)
Omics Data
www.fludb.orgAcknowledgement
• Lynn Law, U. Washington• Richard Green, U. Washington• Jyothi Noronha, U.T. Southwestern• Eva Sadat, U.T. Southwestern• Brett Pickett, U.T. Southwestern
www.fludb.orgDiscussion Points
• Relationship between data archives (e.g. GEO, PRIDE) and the BRCs (e.g. IRD, ViPR, PATRIC)
– Integration– Metadata standards
• Metadata– What kind of data– What kind of standards – MIBBI vs. “MIBBI lite”
• Raw versus processed data• Primary results data
– Need to define what is considered “primary” data for each platform• Microarray example: raw image files (.tiff) vs probe intensity values (.cel)• Opportunity for re-processing leading to re-interpretation
• Derived/processed results– “Interesting gene/protein lists” from microarray, RNAi, proteomics, and other experimental platforms– “Interesting metabolite lists”– Data processing metadata
• Visualize and analyze “interesting gene/protein/metabolite lists”
www.fludb.org
1. “Omics” data management (host)a) Project metadata
b) Experiment metadata
c) Experiment sample metadata
d) Data analysis metadata
e) Primary results
f) Derived results (e.g. “interesting gene/protein/metabolite lists” (Host Factor Biosets))
2. Add additional related datasets from other sources
3. Visualize Host Factor Biosets in context of biological pathways and networks
4. Statistical analysis of pathway sub-network overrepresentation
5. Re-analysis of primary data using assembled pipeline tools (?)
Proposal for “Omics” Data
www.fludb.orgGEO Representation
GEO data representations based on free text
www.fludb.orgNon-standard Descriptions
Guess samples details
www.fludb.org Metadata (MIBBI-compliant)
• Project Level Metadata– Hypothesis, rationale, study design, etc.– Publications and links pertaining to the project– Data providers - PI, other key personnel, affiliations, contact information
• Experiment/Assay Level Metadata– Experiment platform– Experiment data type
• Experiment Sample Level Metadata– Sample source and characteristics of source– Sample type– Source/sample treatment information– Assay details
• Data Processing/Analysis Level Metadata– Algorithm(s) used for transforming primary to derived data– Configuration parameters
www.fludb.orgMetadata Submission Modules
• Study• Experiment• Animal/human subject• Biosample• Reagent• Protocol• Experiment Sample• Analysis method• Host factor bioset
www.fludb.orgStudy
www.fludb.orgExperiment
www.fludb.orgSubject
www.fludb.orgBiosample
www.fludb.orgReagent
www.fludb.orgProtocol
www.fludb.orgExperiment Sample
www.fludb.orgAnalysis Method
www.fludb.orgHost Factor Bioset
www.fludb.orgPossible Data Submission Workflows
Study metadata
Experiment sample metadata
Primary results
Analysis metadata
Host factor bioset
GEO free text metadataGEO
ViPR/IRD
Primary resultsStudy metadata
Experiment sample metadata
Primary results
Analysis metadata
Host factor bioset
GEO free text metadataGEO
ViPR/IRD
A B
Study metadata
Experiment sample metadata
Primary results
Analysis metadata
Host factor bioset
GEO free text metadata
GEOViPR/IRDPrimary results
C
www.fludb.org
Search Analyze Save to WorkbenchSearch our robust database for: Genomes Genes & proteins Immune epitopes 3D protein structures
Analyze your results online. We offer: Identify similar sequences (BLAST) Align sequences (MSA) Find short peptides in proteins Visualize aligned sequences
Sign up for a Workbench to: Store data in working sets for future analysis Integrate ViPR data with your laboratory data Store analysis results Share results and data with collaborators
Browse All Search Types Browse All Tools Sign Up! Sign In
HighlightsViPR Highlight for All FamiliesYou can now search for, and visualize, 3D protein structures from within the ViPR website. Simply navigate to the protein of interest and then follow the links to use this feature.
View Tutorial View Example Results Start Search
Data Summary Updated 2 Weeks Ago
Genome Statistics for Virus Families
Families 13
Genera 54
Species 661
Strains 38,886
Segments 45,154
FAMILY_NAME
TEXT SEARCH
Genomes
Genes & Proteins
Immune Epitopes
Host Factor Biosets
3D Protein Structures
Protein Domains
Protein Motifs
HISTORY
Your Analysis History
Retrieve a Download
Ortholog Groups
SEARCH OR FIND:
Quick Text Search
Sequence Feature Variant Type
ViPRVirus Pathogen Resource
...About Us Supported Projects Announcements Resources Support
SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME
www.fludb.org
ViPRVirus Pathogen Resource
SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAMEYou are logged in as [email protected]
Sign Out
Host Factor BiosetsBack to Previous Page
Please Select an Experiment to Explore to view the associated Bioset (Interesting Gene List )
Study Title Experiment Name Experiment Type
Host Species & Biomaterial (Cell Line)
Virus Strain Name
Host factors in DENV replication
Dengue whole-genome siRNA library screen siRNA Screen Human
(Huh-7) New Guinea C
About Us Supported Projects Announcements Resources Support
PI PI Institution Contract / Grant Title Description Keywords
Contract / Grant
NumberDate
Submitted
Abraham L. Brass
Massachusetts General Hospital
Dengue Virus-Host Interactiosn Using Functional Genomics
Identify host factors required for DENV replication by using siRNA libraries.
Dengue, DENV, siRNA, host factors
HHSN135711131(NIAID) Dec 2011
Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Jan 7, 2012This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library and Wellcome Images.
Study Title Experiment Name Experiment Type
Host Species & Biomaterial (Cell Line)
Virus Strain Name
Host factors in HSV-1 replication
Host neuronal response to HSV-1
Gene Expression Microarray
Human Neurons(SHSY5Y ) 17
PI PI Institution Contract / Grant Title Description Keywords Contract /
Grant NumberDate
Submitted
Moriah SzparaLynn Enquist Princeton Univ.
Deep Sequencing and Neuronal Microarrays for Functional Genomic Analysis of HSV-1
Characterize the response to HSV-1 infection through differential gene
expression in human neuronal cells.
Herpes Simplex Virus 1, HSV-1,
neurons, HHSN24681012
(NIAID) Apr 2012
www.fludb.org
ViPRVirus Pathogen Resource
SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAME
About Us Announcements Resources Support
You are logged in as [email protected]
Sign Out
Host Factor Biosets(“Omics” Experiment Details) Download Host Factors Download AllDownload Primary Results
EXPERIMENT METADATA (from Host response to Influenza virus infection)-
PRIMARY RESULTS-
HOST FACTOR BIOSETS+
Experiment Sample ID
Source Biological
SampleTreatment Agent 1
NameTreatment Agent 1
Amount Treatment 1 Duration Download Experiment Data
251485048497_1_2_RNACalu-3 cells A/Vietnam/1203-
CIP048_RG1/2004(H5N1) 1 MOI 0 hrs Experiment Sample 1
251485048466_1_3_RNACalu-3 cells A/Vietnam/1203-
CIP048_RG1/2004(H5N1) 1 MOI 12 hrs Experiment Sample 2
251485048497_1_1_RNACalu-3 cells A/Vietnam/1203-
CIP048_RG1/2004(H5N1) 1 MOI 24 hrs Experiment Sample 3
251485048467_1_1_RNA Calu-3 cells Mock Mock 24 hrs Experiment Sample 4
Study Title Experiment Name Experiment Type
Host Species & Biomaterial (Cell Line) Virus Strain Name Conditional
Variables
NIAID Systems Virology Center
VN1203/2004 infection in Calu3 cell: A time course
Gene Expression Microarray Calu3 A/Vietnam/1203-
CIP48_RG1/2004(H5N1)+/- virus infection, time after infection
www.fludb.org
Gene NameEntrez Gene
ID
Entrez Gene Name
ImmPort Page GenBank Accession Bioset 1
ScoreBioset 2
Score
Mus musculus contactin 1 12805 Cntn1 NM_007727 0.588131 0.812506913
Mus musculus preproenkephalin 1 18619 Penk1 NM_001002927 0.277487726 0.920335884
Mus musculus 5-hydroxytryptamine (serotonin) receptor 7 15566 Htr7 NM_008315 NS 0.078352547
Cite ViPR Tutorials Report a Bug Request Web Training Contact UsRelease Date: Jan 7, 2012
This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library and Wellcome Images.
ViPRVirus Pathogen Resource
SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAME
About Us Announcements Resources Support
You are logged in as [email protected]
Sign Out
Host Factor Biosets(“Omics” Experiment Details)
Run Analysis
Download Host Factors Download AllDownload Primary Results
Bioset Name Description of Bioset Bioset Type Name of Application or Analysis Method
Download Bioset Data
Compendium_Digital_signature_by_Fishers_summary_statistic
From a compendium of 12 studies that included responses to influenza A subtype H5N1, reconstructed 1918 influenza A virus, and SARS-CoV, we
used meta-analysis to derive multiple gene expression signatures. Meta-analysis
Fishers summary-statistic: MADAM (Meta-Analysis Data
Aggregation Methods)Bioset 1
Microarray_results_flu_calu-3 Included RNA isolated from cells infected with influenza A subtype H5N1 at various timepoints
List of differentially expressed genes
Inter-array normalization;Median-background
subtractionBioset 2
+
PRIMARY RESULTS+
HOST FACTOR BIOSETS-
EXPERIMENT METADATA (from Host response to Influenza virus infection)
www.fludb.org
Gene NameEntrez Gene
ID
Entrez Gene Name
ImmPort Page GenBank Accession Bioset 1
ScoreBioset 2
Score
Mus musculus contactin 1 12805 Cntn1 NM_007727 0.588131 0.812506913
Mus musculus preproenkephalin 1 18619 Penk1 NM_001002927 0.277487726 0.920335884
Mus musculus 5-hydroxytryptamine (serotonin) receptor 7 15566 Htr7 NM_008315 NS 0.078352547
Cite ViPR Tutorials Report a Bug Request Web Training Contact UsRelease Date: Jan 7, 2012
This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library and Wellcome Images.
ViPRVirus Pathogen Resource
SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAME
About Us Announcements Resources Support
You are logged in as [email protected]
Sign Out
Host Factor Biosets(“Omics” Experiment Details)
Download Host Factors Download AllDownload Primary Results
Bioset Name Description of Bioset Bioset Type Name of Application or Analysis Method
Download Bioset Data
Compendium_Digital_signature_by_Fishers_summary_statistic
From a compendium of 12 studies that included responses to influenza A subtype H5N1, reconstructed 1918 influenza A virus, and SARS-CoV, we
used meta-analysis to derive multiple gene expression signatures. Meta-analysis
Fishers summary-statistic: MADAM (Meta-Analysis Data
Aggregation Methods)Bioset 1
Microarray_results_flu_calu-3 Included RNA isolated from cells infected with influenza A subtype H5N1 at various timepoints
List of differentially expressed genes
Inter-array normalization;Median-background
subtractionBioset 2
+
PRIMARY RESULTS+
HOST FACTOR BIOSETS-
EXPERIMENT METADATA (from Host response to Influenza virus infection)
Visualize Protein Network
Run Analysis
www.fludb.org
Gene NameEntrez Gene
ID
Entrez Gene Name
ImmPort Page GenBank Accession Bioset 1
ScoreBioset 2
Score
Mus musculus contactin 1 12805 Cntn1 NM_007727 0.588131 0.812506913
Mus musculus preproenkephalin 1 18619 Penk1 NM_001002927 0.277487726 0.920335884
Mus musculus 5-hydroxytryptamine (serotonin) receptor 7 15566 Htr7 NM_008315 NS 0.078352547
Cite ViPR Tutorials Report a Bug Request Web Training Contact UsRelease Date: Jan 7, 2012
This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library and Wellcome Images.
ViPRVirus Pathogen Resource
SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAME
About Us Announcements Resources Support
You are logged in as [email protected]
Sign Out
Host Factor Biosets(“Omics” Experiment Details)
Download Host Factors Download AllDownload Primary Results
Bioset Name Description of Bioset Bioset Type Name of Application or Analysis Method
Download Bioset Data
Compendium_Digital_signature_by_Fishers_summary_statistic
From a compendium of 12 studies that included responses to influenza A subtype H5N1, reconstructed 1918 influenza A virus, and SARS-CoV, we
used meta-analysis to derive multiple gene expression signatures. Meta-analysis
Fishers summary-statistic: MADAM (Meta-Analysis Data
Aggregation Methods)Bioset 1
Microarray_results_flu_calu-3 Included RNA isolated from cells infected with influenza A subtype H5N1 at various timepoints
List of differentially expressed genes
Inter-array normalization;Median-background
subtractionBioset 2
-
PRIMARY RESULTS+
HOST FACTOR BIOSETS-
EXPERIMENT METADATA (from Host response to Influenza virus infection)
Visualize Protein Network
Run Analysis
ViPR Protein-Protein Interactions and Pathway Visualization
Please choose the type of data that you would like to view relating to your selection(s):
First-Degree (Direct) Interactions
Second-Degree Interactions
Functional Modules
Metabolic Pathways
VisualizeCancel
www.fludb.org
ViPRVirus Pathogen Resource
SEARCH DATA ANALYZE & VISUALIZE ACCESS WORKBENCH VIRUS FAMILIES HOME FAMILY_NAME
About Us Announcements Resources Support
You are logged in as [email protected]
Sign Out
Cite ViPR Tutorials Report a Bug Request Web Training Contact Us Release Date: Jan 20, 2011This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN272200900041C and is a collaboration between Northrop Grumman Health IT, University of Texas Southwestern Medical Center and Vecna Technologies. Virus images courtesy of CDC Public Health Image Library and Wellcome Images.
Visualize Host Factor InteractionsPROTEIN-PROTEIN INTERACTIONS INFORMATIONNumber of “hits” from Interesting Gene List: 1 Number of Nodes: $num_nodes
Number of Edges: $num_edgesSave Analysis
www.fludb.org
39
• U.T. Southwestern– Richard Scheuermann (PI)– Burke Squires– Jyothi Noronha– Victoria Hunt– Eva Sadat– Brett Pickett– Yun Zhang
• Vecna– Chris Larsen– Al Ramsey
• LANL– Catherine Macken– Mira Dimitrijevic
• U.C. Davis– Nicole Baumgarth
• USDA– David Suarez
• Sage Analytica– Robert Taylor– Lone Simonsen
• U. Washington– Michael Gale
• Northrop Grumman– Ed Klem– Mike Atassi– Jon Dietrich– Patty Berger– Jawwad Cheema– Zhiping Gu– Sherry He– Wenjie Hua– Wei Jen– Sanjeev Kumar– Xiaomei Li– Jason Lucas– Bruce Quesenberry– Barbara Rotchford– Prabhu Shankar– Hongbo Su– Bryan Walters– Sam Zaremba– Liwei Zhou
• U. Washington– Lynn Law– Richard Green
• IRD SWG– Gillian Air, OMRF– Carol Cardona, Univ. Minnesota– Adolfo Garcia-Sastre, Mt Sinai– Elodie Ghedin, Univ. Pittsburgh– Martha Nelson, Fogarty– Daniel Perez, Univ. Maryland– Gavin Smith, Duke Singapore– Dave Stallknecht, Univ. Georgia– David Topham, Rochester– Richard Webby, St Jude
• ViPR SWG– Richard Kuhn, Purdue– Raul Andino, UCSF– Slobodan Paessler, UTMB Galveston– X.J. Meng, VBI– Colin Parrish, Cornell– Elliot Lefkowitz, UAB– Carla Kuiken, LANL– David Knipe, Harvard– Matthew Henn, Broad Institute– Richard Whitley, UAB– John Young, Salk Institute
Acknowledgments
N01AI40041N01AI2008038