strategies & examples for functional modeling
DESCRIPTION
Strategies & Examples for Functional Modeling. COST Functional Modeling Workshop 22-24 April, Helsinki. Types of data sets and modeling. Commercial array data – more likely to have tools that support the use of array IDs. - PowerPoint PPT PresentationTRANSCRIPT
Strategies & Examples for Functional Modeling
COST Functional Modeling Workshop22-24 April, Helsinki
Types of data sets and modeling• Commercial array data – more likely to have tools that
support the use of array IDs.• Custom/USDA array data – problems with updating IDs,
linking to function and using array IDs directly in functional modeling tools.
• Proteomics data – larger data sets; need to make background references to determine enrichment.
• RNA-Seq data – largerand more complex data sets; novel transcripts currently can’t be included in modeling (contact AgBase to assign GO).
• Real-time data or quantitative proteomics data – hypothesis testing.
Functional Modeling Strategies1. GO summary (using Slim sets)2. GO enrichment (statistical!)3. Pathways analysis4. Interaction or networks analysis5. Hypothesis testing
Note:• Functional modeling should be integrated.• Approaches are complementary, not exclusive.• Modeling is driven by the biology (not the other way round).
Modeling Strategy• Think about using multiple functional approaches.
• GO, pathways, networks• complementary
• What is available for your species?• What GO is available?• What species does the pathways/network analysis use?
• What resources do you have?• at your institute (e.g. commercial pathways analysis)• open source (e.g. GO Enrichment analysis)• using online vs installed
• Iterative – further functional modeling based on initial results• GO hypothesis testing?
1. GO Functional Summary• high throughput data sets gives us 1000s -10,000s of gene
products• can’t know everything about all gene products• tendency to ‘cherry pick’ ones you recognize
• instead, can group gene products by function• this gives us a manageable number of categories to process• enables us to see trends, patterns, etc
• Use GO Slim sets to ‘summarize’ data• Lose details (but can gain perspective).• Some GO Slim sets are ageing – not being updated as changes to
the GO are made.• Different Slim sets have different terms – which is best for
your data?AgBase GOSlimViewer tool.
http://www.agbase.msstate.edu/help/slimviewerhelp.htm
The Slim set you use matters - need to determine which one to use & report it in Methods.
Functional Summary
• Not all GO terms are annotated equally, e.g., metabolism!• can slim the complete GO for a species as a
background set and then determine terms in your data are disproportionately expressed.
• Can use Slims to compare two data sets (e.g., control vs treatment).
• Use Slims for your own sanity – are you seeing what you expect to see?
ion/proton transportcell migration
cell adhesioncell growthapoptosisimmune response
cell cycle/cell proliferation
cell-cell signalingfunction unknowndevelopmentendocytosisproteolysis and peptidolysis
protein modificationsignal transduction
B-cells StromaMembrane proteins grouped by GO BP:
B-cells StromaMembrane proteins grouped by GO BP:
cell migration
apoptosis
immune response
cell cycle/cell proliferation
cell-cell signalingfunction unknown
BVDV Infection – cytopathic (CP) vs non-cytopathic (NCP) infection(comparing function between 2 different conditions)
2. Determining over-represented or under-represented function.
• most typically used functional analysis method• many, many tools that do this – see:
http://www.geneontology.org/GO.tools.microarray.shtml• very different visualization• will use some of these tools in practical session
http://david.abcc.ncifcrf.gov/home.jsp
Some useful expression analysis tools:• Database for Annotation, Visualization and Integrated
Discovery (DAVID)• http://david.abcc.ncifcrf.gov/
• AgriGO -- GO Analysis Toolkit and Database for Agricultural Community
• http://bioinfo.cau.edu.cn/agriGO/• used to be EasyGO• chicken, cow, pig, mouse, cereals, dicots• adding new species by request
• Onto-Express• http://vortex.cs.wayne.edu/projects.htm#Onto-Express• can provide your own gene association file
• Ontologizer• WebStart widget (requires Java); now on Galaxy• http://compbio.charite.de/contao/index.php/ontologizer2.html• requires OBO file & GAF (enables users to select their own annotations)
GO Enrichment tools that support agricultural species.
• structurally and functionally re-annotated a microarray• quantified the impact of this re-annotation based on GO
annotations & pathways represented on the array• tested using a previously published experiment that used
this microarray• re-annotation allows more comprehensive GO based
modeling and improves pathway coverage • re-annotation resulted in a different model from
previously published research findings
Evaluating GO toolsSome criteria for evaluating GO Tools:1. Does it include my species of interest (or do I have to “humanize”
my list)?2. What does it require to set up (computer usage/online)3. What was the source for the GO (primary or secondary) and when
was it last updated?4. Does it report the GO evidence codes (and is IEA included)?5. Does it report which of my gene products has no GO?6. Does it report both over/under represented GO groups and how
does it evaluate this?7. Does it allow me to add my own GO annotations?8. Does it represent my results in a way that facilitates discovery?
RNASeq GO Enrichment• RNASeq experiments: longer transcripts and more highly expressed
transcript are more likely to be differentially expressed.• Current GO enrichment tools do not account for RNASeq platform
bias (most based upon arrays).• assume that all genes are independent and equally likely to be selected
as DE
3. Pathway Analysis• Freely available tools:
• from public databases, e.g. KEGG & Reactome• Freely available tools, e.g. Cytoscape
• Commercial pathways analysis tools: e.g., Ingenuity Pathways Analysis (IPA), Pathway Studio, etc.• some tools only have limited species – need to “humanize” animal
data, etc for plants with Arabidopsis• everything gives you cancer
• Many pathways analysis tools combine pathways analysis, network analysis.
Reactome Skypainterhttp://www.reactome.org/cgi-bin/skypainter2
KEGG Pathwayshttp://www.kegg.jp/kegg/download/kegtools.html
Analysis tools (commercial)
Ingenuity Pathway Analysis
NetworksPathwaysfunctions and diseases
Gene Ontology (GO) groupsPathway StudioGSEAPathways
http://www.ingenuity.com
http://www.ariadnegenomics.com/
IPA analysis included as IPA.txt
Data Curation• Ingenuity: Manually curated database by Ph.D level scientists
(mining 32 different peer reviewed journals).• Pathway studio: Automated curation by Medscan Reader using
Natural language processing (NLP) technology. Mining Pubmed abstracts and peer reviewed journals • users can do their own text mining
(Comparison by Divya Peddinti)
Comparison Criteria• Features• Proportion of proteins involved in modeling• Data generation• Display• Test Dataset: 3,600 bovine spermatozoa proteins
Feature Ingenuity Pathway analysis (IPA)
Pathway studio
Input GI numberMicroarray IDAffymetrix IDGenBankSwiss Prot AccessionUnigene IDName orAliasHUGO ID
Entrez geneGenBankMicroarray IDSwiss Prot AccessionUnigene IDName or AliasHUGO ID
Databases Contains biological interactions data for human, mouse, rat Orthologous mapping available for dog, Cow, Chimp, Chicken, Rhesus macaque monkey, Arabidopsis thaliana, Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio
Contains biological data for human, mouse, rat, bacteria, chicken, Zebra fish, frog, cow, bee, dog, Arabidopsis, Drosophila, Yeast, and transplantation research etc.
Ingenuity Pathway analysis (IPA)
Pathway studio
Statistical test The significance value (p value) assigned to the function / pathways using Fischer’s exact test
The statistical significance of the overlap between the protein list and a GO group or pathway using the Fischer’s exact test.
Updates Quarterly Quarterly
Networks Builds networks with a maximum of 35 genes/ proteins
-
Proteins involved in modeling
Ingenuity
Pathwaystudio
0
20
40
60
80
100
120
57.5
99.85
42.5
0.15
Proteins not involved in modelingProteins involved in modeling
Data generation
Pathways05
101520253035404550
44
33
Ingenuity pathway anlaysisPathway studio
37 7 26
Pathway display EGF signaling pathway
4. Network Analysis• IPA & Pathway Studio equally efficient at drawing networks of
relationships.• IPA : simplifies the pathway display and creates more
manageable user friendly network for users to analyze.• Pathway Studio: Shows the relations in a table format. • STRING Database - known and predicted protein interactions.
http://string-db.org/
http://www.cytoscape.org/
5. Hypothesis Testing• high throughput data sets – ‘fishing expedition’
or hypothesis generation• but GO also serves as a repository of biological
function – can be used for hypothesis testing based on these data sets
days post infection
mea
n to
tal l
esio
n sc
ore
0
2
4
6
8
10
12
14
16
18
0 20 40 60 80 100
Susceptible (L72)
Resistant (L61)
Genotype
Non-MHC associated resistance and susceptibility
The critical time point in MD lymphomagenesis
Hypothesis At the critical time point of 21
dpi, MD-resistant genotypes have a T-helper (Th)-1 microenvironment (consistent with CTL activity), but MD-susceptible genotypes have a T-reg or Th-2 microenvironment (antagonistic to CTL).
Th-1 Th-2
NAIVE CD4+ T CELL
CYTOKINES AND T HELPER CELL DIFFERENTIATION
APC T reg
Shyamesh Kumar
Th-1 Th-2
NAIVE CD4+ T CELL
IFN γ IL 12 IL 18
Macrophage
NK Cell
IL 12 IL 4
IL 4 IL10
APC
CTL
TGFβ
T regSmad 7
L6 Whole
L7 Whole
L7 Micro
Th-1, Th-2, T-reg ?
Inflammatory?
Step I. GO-based Phenotype Scoring.
Gene product Th1 Th2 Treg Inflammation
IL-2 1.58 1.58 -1.58
IL-4 0.00 0.00 0.00 0.00
IL-6 0.00 -1.20 1.20 -1.20
IL-8 0.00 0.00 1.18 1.18
IL-10 0.00 0.00 0.00 0.00
IL-12 0.00 0.00 0.00 0.00
IL-13 1.51 -1.51 0.00 0.00
IL-18 0.91 0.91 0.91 0.91
IFN-g 0.00 0.00 0.00 0.00
TGF-b -1.71 0.00 1.71 -1.71
CTLA-4 -1.89 -1.89 1.89 -1.89
GPR-83 -1.69 -1.69 1.69 -1.69
SMAD-7 0.00 0.00 0.00 0.00
Net Effect -1.29 -5.38 10.15 -5.98
Step III. Inclusion of quantitative data to the phenotype scoring table and calculation of net affect.
1-111SMAD-7
-11-1-1GPR-83
-11-1-1CTLA-4
-110-1TGF-b
11-11IFN-g
1111IL-18
NDND1-1IL-13
NDND-11IL-12
011-1IL-10
11NDNDIL-8
1-11IL-6
ND11-1IL-4
-11ND1IL-2
InflammationTregTh2Th1Gene product
ND = No data
Step II. Multiply by quantitative data for each gene product.
- 20
- 10
0
10
20
30
40
50
60
Th-1 Th-2 T-regInflammation
Phenotype
Net
Effe
ct
5mm
Microscopic lesions
L6 (R)
L7 (S)
ProT-reg Pro
Th-1Anti Th-2
Pro CTLAnti CTL
L7 Susceptible
Pro CTLAnti CTL
L6 Resistant
ProT-reg Pro
Th-2AntiTh-1
Concluding thoughts on functional modeling.
“By doing just a little every day, I can gradually let the task overwhelm me.”
Ashleigh Brilliant
Bringing it all together…
• There is no one “correct” way; there is no “right” answer.
• Using multiple functional modeling strategies (e.g., GO, pathways, networks) can help with insights.
• Need to use biological knowledge to bring these different approaches together.
• Functional modeling is often iterative.• Need to focus not only on what is known but
what is new!
Protein/Gene identifiers
GORetriever
GO annotations
Genes/Proteins with no GO annotations
GOanna
Pathways and network analysisIngenuity Pathways Analysis (IPA)Pathway StudioCytoscapeDAVID
GO Enrichment analysisIngenuity Pathways Analysis (IPA)Pathway StudioCytoscapeDAVIDAgriGOOnto-tools
ArrayIDer
GOSlimViewer
Yellow boxes represent AgBase toolsGreen boxes are non-AgBase resources
Overview of Functional Modeling Strategy
AutoSlim
Proteomics
RNASeqGenome2seq
Microarrays
Blast2GO
Functional Modeling Considerations
• Should I add my own GO?• use GOProfiler to see how much GO is available for your species• use GORetriever to find existing GO for your dataset• Does analysis tool allow me to add my own GO?
• Should I do GO analysis and pathway analysis and network analysis?• different functional modeling methods show different aspects about your data
(complementary)• is this type of data available for your species (or a close ortholog)?
• What tools should I use?• which tools have data for your species of interest?• what type of accessions are accepted?• availability (commercial and freely available)
Some Limitations• Annotation is not complete.
• not all the data is annotated• some gene products have no functional information
• Gene Ontology is only one aspect of functional modeling.• anatomy, tissue expression, phenotype, disease, etc
• Gene nomenclature – need to know what we are annotating!
• Functional modeling tools need to handle larger data sets (& multiple ontologies?).