recommendation engines for scientific literature
DESCRIPTION
I gave this talk at the Workshop on Recommender Enginer@TUG (http://bit.ly/yuxrAM) on 2012/12/19. It presents a selection of algorithms and experimental data that are commonly used in recommending scientific literature. Real-world results from Mendeley's article recommendation system are also presented. The work presented here has been partially funded by the European Commission as part of the TEAM IAPP project (grant no. 251514) within the FP7 People Programme (Marie Curie).TRANSCRIPT
Recommendation Engines for Scientific
Literature
Kris Jack, PhDData Mining Team Lead
➔ 2 recommendation use cases
➔ literature search with Mendeley
➔ use case 1: related research
➔ use case 2: personalised recommendations
Summary
Use Cases
2) Personalised Recommendations● given a user's profile (e.g. interests)● find articles of interest to them
1) Related Research● given 1 research article● find other related articles
Two types of recommendation use cases:
Use Cases
2) Personalised Recommendations● given a user's profile (e.g. interests)● find articles of interest to them
1) Related Research● given 1 research article● find other related articles
My secondment(Dec-Feb):
Literature Search Using Mendeley
● Use only Mendeley to perform literature search for:● Related research● Personalised recommendations
Challenge!
Eating your own dog food...
Queries: “content similarity”, “semantic similarity”, “semantic relatedness”, “PubMed related articles”, “Google Scholar related articles”
Found:
0
Queries: “content similarity”, “semantic similarity”, “semantic relatedness”, “PubMed related articles”, “Google Scholar related articles”
Found:
1
Found:
1
Found:
2
Found:
4
Found:
4
Literature Search Using Mendeley
Summary of Results
Strategy Num Docs Found
Comment
Catalogue Search 19 9 from “Related Research”
Group Search 0 Needs work
Perso Recommendations 45 Led to a group with 37 docs!
Found:
64
Literature Search Using Mendeley
Summary of Results
Eating your own dog food... Tastes good!
Strategy Num Docs Found
Comment
Catalogue Search 19 9 from “Related Research”
Group Search 0 Needs work
Perso Recommendations 45 Led to a group with 37 docs!
Found:
64
64 => 31 docs, read 14 so far, so what do they say...?
Use Cases
1) Related Research● given 1 research article● find other related articles
Use Case 1: Related Research
User study (e.g. Likert scale to rate relatedness between documents). (Beel & Gipp, 2010)
TREC collections with hand classified 'related articles' (e.g. TREC 2005 genomics track). (Lin & Wilbur, 2007)
Try to reconstruct a document's reference list (Pohl, Radlinski, & Joachims, 2007; Vellino, 2009)
7 highly relevant papers (related research for scientific articles)
Q1/4: How are the systems evaluated?
Use Case 1: Related Research
Paper reference lists (Pohl et al., 2007; Vellino, 2009)
Usage data (e.g. PubMed, arXiv) (Lin & Wilbur, 2007)
Document content (e.g. metadata, co-citation, bibliographic coupling) (Gipp, Beel, & Hentschel, 2009)
Collocation in mind maps (Jöran Beel & Gipp, 2010)
7 highly relevant papers (related research for scientific articles)
Q2/4: How are the systems trained?
Use Case 1: Related Research
bm25 (Lin & Wilbur, 2007)
Topic modelling (Lin & Wilbur, 2007)
Collaborative filtering (Pohl et al., 2007)
Bespoke heuristics for feature extraction (e.g. in-text citation metrics for same sentence, paragraph). (Pohl et al., 2007; Gipp et al., 2009)
7 highly relevant papers (related research for scientific articles)
Q3/4: Which techniques are applied?
Use Case 1: Related Research
Topic modelling slighty improves on BM25 (MEDLINE abstracts) (Lin & Wilbur, 2007):- bm25 = 0.383 precision @ 5- PMRA = 0.399 precision @ 5
Seeding CF with usage data from arXiv won out over using citation lists (Pohl et al., 2007)
Not yet found significant results that show content-based or CF methods are better for this task
7 highly relevant papers (related research for scientific articles)
Q4/4: Which techniques have most success?
Use Case 1: Related Research
Progress so far...
Q1/2 How do we evaluate our system?
Construct a non-complex data set of related research:● include groups with 10-20 documents (i.e. topics)● no overlaps between groups (i.e. documents in common)● only take documents that are recognised as being in English● document metadata must be 'complete' (i.e. has title, year, author, published in, abstract, filehash, abstract, tags/keywords/MeSH terms)
→ 4,382 groups → mean size = 14 → 60,715 individual documents
Given a doc, aim to retrieve the other docs from its group● tf-idf with lucene implementation
Use Case 1: Related Research
Progress so far...
Q1/2 How do we evaluate our system?
Construct a non-complex data set of related research:● include groups with 10-20 documents (i.e. topics)● no overlaps between groups (i.e. documents in common)● only take documents that are recognised as being in English● document metadata must be 'complete' (i.e. has title, year, author, published in, abstract, filehash, abstract, tags/keywords/MeSH terms)
→ 4,382 groups → mean size = 14 → 60,715 individual documents
Given a doc, aim to retrieve the other docs from its group● tf-idf with lucene implementation
Use Case 1: Related Research
Progress so far...
Q1/2 How do we evaluate our system?
Construct a non-complex data set of related research:● include groups with 10-20 documents (i.e. topics)● no overlaps between groups (i.e. documents in common)● only take documents that are recognised as being in English● document metadata must be 'complete' (i.e. has title, year, author, published in, abstract, filehash, abstract, tags/keywords/MeSH terms)
→ 4,382 groups → mean size = 14 → 60,715 individual documents
Given a doc, aim to retrieve the other docs from its group
title
year
author
publishedIn
fileHash
abstract
generalKeyw
o rd
meshT
erms
keywords
tags
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
100.00%
Metadata Presence in Documents
Evaluation Data Det
Group
Catalogue
metadata field
% o
f do
cum
en
ts th
at f
ield
ap
pe
ars
in
Use Case 1: Related Research
Progress so far...
Q2/2 What are our results?
abstracttitle
generalKeywordmesh-term
authorkeyword
tag
0
0.05
0.1
0.15
0.2
0.25
0.3
tf-idf Precision per Field for Complete Data Set
metadata field
Pre
cisi
on
@ 5
Use Case 1: Related Research
Progress so far...
Q2/2 What are our results?
tag abstract mesh-term title general-keyword author keyword0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
tf-idf Precision per Field when Field is Available
metadata field
Pre
cisi
on
@ 5
Use Case 1: Related Research
Progress so far...
Q2/2 What are our results?
BestCombo = abstract+author+general-keyword+tag+title
bestComboabstract
titlegeneralKeyword
mesh-termauthor
keywordtag
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
tf-idf Precision for Field Combos for Complete Data Set
metadata field(s)
pre
cisi
on
@ 5
Use Case 1: Related Research
Progress so far...
Q2/2 What are our results?
BestCombo = abstract+author+general-keyword+tag+title
tagbestCombo
abstractmesh-term
titlegeneral-keyword
authorkeyword
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
tf-idf Precision for Field Combos when Field is Available
metadata field(s)
pre
cisi
on
@ 5
Use Case 1: Related Research
Future directions...?
Evaluate multiple techniques on same data set
Construct public data set● similar to current one but with data from only public groups● analyse composition of data set in detail
Train:● content-based filtering● collaborative filtering● hybrid
Evaluate the different systems on same data set
...and let's brainstorm!
Use Cases
2) Personalised Recommendations● given a user's profile (e.g. interests)● find articles of interest to them
Use Case 2: Perso Recommendations
Cross validation on user libraries (Bogers & van Den Bosch, 2009; Wang & Blei, 2011)
User studies (McNee, Kapoor, & Konstan, 2006; Parra-Santander & Brusilovsky, 2009)
7 highly relevant papers (perso recs for scientific articles)
Q1/4: How are the systems evaluated?
Use Case 2: Perso Recommendations
CiteULike libraries (Bogers & van Den Bosch, 2009; Parra-Santander & Brusilovsky, 2009; Wang & Blei, 2011)
Documents represent users and their citations documents of interest (McNee et al., 2006)
User search history (N Kapoor et al., 2007)
7 highly relevant papers (perso recs for scientific articles)
Q2/4: How are the systems trained?
Use Case 2: Perso Recommendations
CF (Parra-Santander & Brusilovsky, 2009; Wang & Blei, 2011)
LDA (Wang & Blei, 2011)
Hybrid of CF + LDA (Wang & Blei, 2011)
BM25 over tags to form user neighbourhood (Parra-Santander & Brusilovsky, 2009)
Item-based and content-based CF (Bogers & van Den Bosch, 2009)
User-based CF, Naïve Bayes classifier, Probabilistic Latent Semantic Indexing, textual TF-IDF-based algorithm (uses document abstracts) (McNee et al., 2006)
7 highly relevant papers (perso recs for scientific articles)
Q3/4: Which techniques are applied?
Use Case 2: Perso Recommendations
CF is much better than topic modelling (Wang & Blei, 2011)
CF-topic modelling hybrid, slightly outperforms CF alone (Wang & Blei, 2011)
Content-based filtering performed slightly better than item-based filtering on a test set with 1,322 CiteULike users (Bogers & van Den Bosch, 2009)
User-based CF and tf-idf outperformed Naïve Bayes and Probabilistic Latent Semantic Indexing significantly (McNee et al., 2006)
BM25 gave better results than CF but the study was with just 7 CiteULike users so small scale (Parra-Santander & Brusilovsky, 2009)
7 highly relevant papers (perso recs for scientific articles)
Q4/4: Which techniques have most success?
Use Case 2: Perso Recommendations
7 highly relevant papers (perso recs for scientific articles)
Q4/4: Which techniques have most success?
Advantage Disadvantage
Content-based
Human readable form of their profile
Quickly absorb new content without need for ratings
Tends to over-specialise
CF Works on an abstract item-user level so you don't need to 'understand' the content
Tends to give more novel and creative recommendations
Requires a lot of data
Use Case 2: Perso Recommendations
Our progress so far...
Q1/2 How do we evaluate our system?
Construct an evaluation data set from user libraries● 50,000 user libraries● 10-fold cross validation● libraries vary from 20-500 documents● preference values are binary (in library = 1; 0 otherwise)
Train:● item-based collaborative filtering recommender
Evaluate:● train recommender and test how well it can reconstruct the users' hidden testing libraries● mulitple similarity metrics (e.g. cooccurrence, loglikelihood)
Use Case 2: Perso Recommendations
Our progress so far...
Q2/2 What are our results?
Cross validation:● 0.1 precision @ 10 articles
Usage logs:● 0.4 precision @ 10 articles
Use Case 2: Perso Recommendations
Our progress so far...
Q2/2 What are our results?
Use Case 2: Perso Recommendations
Our progress so far...
Q2/2 What are our results?
Pre
cis i
on a
t 10
art
icle
s
Number of articles in user library
Use Case 2: Perso Recommendations
Future directions...?
Q2/2 What are our results?Evaluate multiple techniques on same data set
Construct data set● similar to current one but with more up-to-date data● analyse composition of data set in detail
Train:● content-based filtering● collaborative filtering (user-based and item-based)● hybrid
Evaluate the different systems on same data set
...and let's brainstorm!
➔ 2 recommendation use cases
➔ similar problems and techniques
➔ good results so far
➔ combining CF with content would likely improve both
Conclusion
www.mendeley.com
Beel, Jöran, & Gipp, B. (2010). Link Analysis in Mind Maps : A New Approach to Determining Document Relatedness. Mind, (January). Citeseer. Retrieved from http://scholar.google.com/scholar?hl=en&btnG=Search&q=intitle:Link+Analysis+in+Mind+Maps+:+A+New+Approach+to+Determining+Document+Relatedness#0Bogers, T., & van Den Bosch, A. (2009). Collaborative and Content-based Filtering for Item Recommendation on Social Bookmarking Websites. ACM RecSys ’09 Workshop on Recommender Systems and the Social Web. New York, USA. Retrieved from http://ceur-ws.org/Vol-532/paper2.pdfGipp, B., Beel, J., & Hentschel, C. (2009). Scienstein: A research paper recommender system. Proceedings of the International Conference on Emerging Trends in Computing (ICETiC’09) (pp. 309–315). Retrieved from http://www.sciplore.org/publications/2009-Scienstein_-_A_Research_Paper_Recommender_System.pdfKapoor, N, Chen, J., Butler, J. T., Fouty, G. C., Stemper, J. A., Riedl, J., & Konstan, J. A. (2007). Techlens: a researcher’s desktop. Proceedings of the 2007 ACM conference on Recommender systems (pp. 183-184). ACM. doi:10.1145/1297231.1297268Lin, J., & Wilbur, W. J. (2007). PubMed related articles: a probabilistic topic-based model for content similarity. BMC Bioinformatics, 8(1), 423. BioMed Central. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/17971238McNee, S. M., Kapoor, N., & Konstan, J. A. (2006). Don’t look stupid: avoiding pitfalls when recommending research papers. Proceedings of the 2006 20th anniversary conference on Computer supported cooperative work (p. 180). ACM. Retrieved from http://portal.acm.org/citation.cfm?id=1180875.1180903Parra-Santander, D., & Brusilovsky, P. (2009). Evaluation of Collaborative Filtering Algorithms for Recommending Articles. Web 3.0: Merging Semantic Web and Social Web at HyperText ’09 (pp. 3-6). Torino, Italy. Retrieved from http://ceur-ws.org/Vol-467/paper5.pdfPohl, S., Radlinski, F., & Joachims, T. (2007). Recommending related papers based on digital library access records. Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries (pp. 418-419). ACM. Retrieved from http://portal.acm.org/citation.cfm?id=1255175.1255260Vellino, A. (2009). The Effect of PageRank on the Collaborative Filtering Recommendation of Journal Articles. Retrieved from http://cuvier.cisti.nrc.ca/~vellino/documents/PageRankRecommender-Vellino2008.pdfWang, C., & Blei, D. M. (2011). Collaborative topic modeling for recommending scientific articles. Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 448–456). ACM. Retrieved from http://dl.acm.org/citation.cfm?id=2020480
References