um/ut microarray short course may 4, 2006 functional gene clustering by latent semantic indexing of...

29
UM/UT Microarray Short Course UM/UT Microarray Short Course May 4, 2006 May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department of Neurology University of Tennessee Health Science Center Center for Neurobiology of Brain Diseases

Upload: simon-hunter

Post on 19-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

UM/UT Microarray Short CourseUM/UT Microarray Short CourseMay 4, 2006May 4, 2006

Functional Gene Clustering by Latent Semantic Indexing

of MEDLINE Abstracts

Ramin Homayouni, Ph.D. Department of Neurology

University of Tennessee Health Science Center

Center for Neurobiology of Brain Diseases

Page 2: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Gene Expression ProfilingGene Expression Profiling

Alizadeh, et al., (2000) Nature 403:503.

Now What?Now What?

Page 3: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Some Web ResourcesSome Web Resources

NCBI SitesOMIM http://www.ncbi.nlm.nih.gov/Literature/index.html LocusLink http://www.ncbi.nlm.nih.gov/LocusLink/ PubMed http://www.ncbi.nlm.nih.gov/entrez/

OthersHAPI http://array.ucsd.edu/hapi/ GenMAPP http://www.genmapp.org/ GO Tree Machine http://genereg.ornl.gov/gotm/ PubGene http://www.pubgene.org Arrowsmith http://arrowsmith.psych.uic.edu/Chillibot http://www.chilibot.net/ iHOP http://www.ihop-net.org/

Page 4: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Defining Functional Relationships Defining Functional Relationships between Genesbetween Genes

Direct Relationship

Gene relationships already known (e.g., A-B or B-C)• Term co-occurrence

• Gene symbol: PubGene (Jenssen et al., Nature Genetics 2001 28:21)

• Gene names (synonyms and aliases) – biochemical

Indirect Relationship

Gene relationships unknown (e.g., such as A-C)

C

B

A

Page 5: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Reelin Signaling PathwayReelin Signaling Pathway

Dab1

ApoE

Reelin

VLDLRApoER2

APP

p35Cdk5

Amyloidplaques

pTau

fyn

Page 6: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Miscellaneous

Trp53FosNras

Rasa1Rab1Src

Notch1Dll1Jag1

Robo1PtchSmo

Reeler

RelnDab1

VLDLRLpr8

Gene Document Test SetGene Document Test Set

Alzheimer Disease

APP Aplp2Aplp1Psen1Psen2Lrp1MaptApoeA2m

Apbb1Apba1Cdk5Cdk5r

Cdk5r2

Page 7: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

PubGene Query: Dab1PubGene Query: Dab1http://www.pubgene.org/http://www.pubgene.org/

Reln 7 timesCdk5r 6 timesCdk5 5 timesGli2 3 timesSrc 3 timesDab2 2 timesFyn 2 timesSam68 1 timesCdkn1a 1 timesTbr1 1 timesGli 1 timesScr 1 timesShh 1 timescdf 1 timesAsh 1 timesDlgh4 1 timesp80 1 timesLck 1 timesEmx1 1 timesPcdh18 1 timesAgrn 1 timesArg2 1 times

Mouse Human

DAB2 3 timesGAD1 3 timesRELN 3 timesGSN 2 timesTNFSF5 2 timesHLA-DQA1 1 timesBAT2 1 timesGAD2 1 times

PubMed Query: Dab1 AND Reln = 10PubMed Query: Dab1 AND reelin = 57 !

Page 8: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

iHOP Query: Dab1iHOP Query: Dab1http://www.ihop-net.org/http://www.ihop-net.org/

Page 9: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

iHOP Query: Dab1; Sentence StructureiHOP Query: Dab1; Sentence Structurehttp://www.ihop-net.org/http://www.ihop-net.org/

Page 10: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

iHOP Query: Dab1; Network buildingiHOP Query: Dab1; Network buildinghttp://www.ihop-net.org/http://www.ihop-net.org/

Page 11: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Vector Space Model:Vector Space Model:Latent Semantic IndexingLatent Semantic Indexing

w1

w2

w3

QueryW1

W2

W3

.

.

.

Wx

Query

G1 G2 ... Gx

aij

G1

aij = lij gi

Page 12: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Semantic Gene OrganizerSemantic Gene Organizer©© User InterfaceUser Interface

Page 13: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Reelin Accession # QueryReelin Accession # Query

Page 14: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Reelin Keyword QueryReelin Keyword Query

Page 15: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

50-Gene Document Collection50-Gene Document Collection

Development

CancerAlzheimer

1511

5

163

Page 16: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Hierarchical TreeHierarchical Tree

Development Cancer AlzheimerDevelopment

Page 17: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Unrooted Tree (Graph)Unrooted Tree (Graph)

Page 18: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Variation in Abstract RepresentationVariation in Abstract Representation

Reduce Reduce NoiseNoise

Page 19: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Abstract References in LocusLinkAbstract References in LocusLink

Page 20: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Gene symbols and names that are not Gene symbols and names that are not used in the literatureused in the literature

IncreaseIncreaseRepresentationRepresentation

Page 21: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Alternate Names and AliasesAlternate Names and Aliases

Page 22: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Log-entropy Term Weighting Log-entropy Term Weighting

W1

W2

W3

.

.

.

Wx

Query

G1 G2 ... Gx

aij

aij = lij gi

Page 23: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Top Terms in Gene DocumentTop Terms in Gene Document

reelin (4.0323)reeler (3.7762) positioning (1.9135) lissencephaly (1.8491) schizophrenia (1.7113) apoer2 (1.5637) cr (1.5544) esophageal (1.5339) dab1 (1.5118) vldlr (1.4973) carcinoma (1.4881) wild-type (1.4862) cask (1.4288) psychiatric (1.4266) apoe (1.3739) positioned (1.3726)

reelin (4.0323)reeler (3.7762) positioning (1.9135) lissencephaly (1.8491) schizophrenia (1.7113) apoer2 (1.5637) cr (1.5544) esophageal (1.5339) dab1 (1.5118) vldlr (1.4973) carcinoma (1.4881) wild-type (1.4862) cask (1.4288) psychiatric (1.4266) apoe (1.3739) positioned (1.3726)

Page 24: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Abstract retrieval by combining Abstract retrieval by combining weightedweighted terms in gene name, symbol or aliases terms in gene name, symbol or aliases

Query Description # abstracts

symbol Cdk5r2 0

alias p39 70

name cyclin-dependent kinase 5, regulatory subunit 2

0

c1 p39 AND cdk5 18

c2 p39 AND cyclin-dependent 17

c3 p39 AND kinase 24

c4 p39 AND cdk5 AND cyclin-dependent

17

c5 p39 AND cdk5 AND cyclin-dependent AND kinase

17

alias

c3

c1

53

171 7

Page 25: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Weighted PubMed QueriesWeighted PubMed Queries

Cdk5r2

Lrp8

Atoh1

Cdk5r

kit

egfr

fos

myc

Under-represented Genes Over-represented Genes

Page 26: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Weighted Query Weighted Query AlgorithmAlgorithm

Gene symbolGene Name Gene Aliases

Combination of highest weighted terms

Extract overlapping abstracts

RESULTS:2-59 fold increase in the number of abstracts associated with genes compared to those referenced in LL

RESULTS:2-59 fold increase in the number of abstracts associated with genes compared to those referenced in LL

Page 27: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Summary and ConclusionsSummary and Conclusions

Log-entropy weighting identifies descriptive or ‘useful’ aliases for genes.

Weighted PubMed Querying increases abstracts for under-represented genes and decreases abstracts for over-represented genes with high specificity.

This automated method improves gene abstract assignment 2 to 59 fold beyond those assigned by LocusLink indexers.

Page 28: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

Vs.

Word x Gene DocMatrix

Word x Gene DocMatrix

PubMedAbstracts gene descriptor gene descriptor

word weights word weights

SearchTerm

Refinement

clustering clustering

pairwise Score pairwise ScoreGeneDoc

GeneDoc

GeneDoc

GeneDoc

PMID Citations inLocusLink

SGO overviewSGO overview

Page 29: UM/UT Microarray Short Course May 4, 2006 Functional Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Ph.D. Department

AcknowledgmentsAcknowledgments

UT MemphisUT MemphisNeurology

Lijing Xu, M.S.

Lai Wei, M.D.

Molecular Sciences

Yan Cui, Ph.D.

Mi Zhou, M.S.

UT KnoxvilleUT KnoxvilleComputer Science

Michael Berry, Ph.D.

Kevin Heinrich

Center for Neurobiology of Brain Diseases