recommender systems and information extraction for researchers
TRANSCRIPT
Recommender Systems andInformation Extraction for researchers Marco Rossetti @ross85 6/11/2015 Mendeley, London
2
Outline
Recommender Systems and Information Extraction for researchers
• What is Data Science
• What is Mendeley
• Recommender Systems at Mendeley
• Information Extraction at Mendeley
06/11/2015
5
Who want a data scientist?
Recommender Systems and Information Extraction for researchers 06/11/2015
6
Who want a data scientist? [2]
Recommender Systems and Information Extraction for researchers 06/11/2015
7
Two main types
Recommender Systems and Information Extraction for researchers
https://www.quora.com/What-is-data-science/answer/Michael-Hochster
06/11/2015
8
Two main types [2]
Recommender Systems and Information Extraction for researchers
https://www.quora.com/What-is-data-science/answer/Michael-Hochster
06/11/2015
9
Skills for Data Science
Recommender Systems and Information Extraction for researchers
http://businessoverbroadway.com/investigating-data-scientists-their-skills-and-team-makeup
06/11/2015
11
Mendeley builds tools tohelp researchers … [2]
Recommender Systems and Information Extraction for researchers
Read &
Organize
Search &
Discover
Collaborate &
Network
Experiment &
Synthesize
06/11/2015
12
Read & Organize
Recommender Systems and Information Extraction for researchers
Reference management
Cite-as-you-
write
Full-text article search
Digitalised
annotations
06/11/2015
13
Search & Discover
Recommender Systems and Information Extraction for researchers
Mendeley Suggest
Literature
Search
Related Documents
06/11/2015
14
Collaborate & Network
Recommender Systems and Information Extraction for researchers
Research network
Professional
research groups
06/11/2015
18
What is aRecommender System?
Recommender Systems and Information Extraction for researchers
Recommender systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item. [Wikipedia]
06/11/2015
19
Why Recommender Systemsat Mendeley?
Recommender Systems and Information Extraction for researchers
Vision: “To build a personalised research advisor that helps you to organise your work, contextualise it within the global body of research, and connect you with relevant researchers and artifacts.”
06/11/2015
20
Recommender Systemsat Mendeley – Related Documents
Recommender Systems and Information Extraction for researchers 06/11/2015
21
Recommender Systemsat Mendeley – Mendeley Suggest
Recommender Systems and Information Extraction for researchers https://www.mendeley.com/suggest/
06/11/2015
22
Recommender SystemComponents
Recommender Systems and Information Extraction for researchers
Algorithms
Business Logic and Analytics
User Experience
Data Sources Algorithms
Business Logic
& Analytics
User Interface
06/11/2015
23
Data Sources
Recommender Systems and Information Extraction for researchers
• Mendeley – User Libraries
• What the users have in their libraries (what they read, what they annotate, what they highlight, what folders they have, etc. etc.)
– Articles metadata (title, authors, abstract, keywords, tags, etc. etc.) – Groups
• Scopus – Citation network
• Science Direct – Logs
• …
06/11/2015
24
Algorithms
Recommender Systems and Information Extraction for researchers
1. Collaborative filtering User-based
If Alice read X, Y, Z and Bob read X, Y, Z and W, we recommend W to Alice + Work well for us because users << items - Only for users with enough articles in the library
Item-based
Users who read X also read Y + Item-item similarity matrix is useful to model last n articles read - Expensive in our setting (millions of items)
06/11/2015
25
Algorithms [2]
Recommender Systems and Information Extraction for researchers
1. Collaborative filtering (still) Matrix factorization
+ Best CF model in literature - Generate recommendations on a catalog of million of items is too slow
1 1 11 1 1? ? 1 ? 1 ?
1 1 11 1
1 1 1
U n x k
V k x m
X n x m
X ≈
06/11/2015
26
Algorithms [3]
Recommender Systems and Information Extraction for researchers
2. Content-based I read articles about text mining, show me other stuff about text mining + Good for cold users (users without data) - Overspecialisation: items recommended are too similar
3. Popularity/Trending I work in Computer Science, show me popular/trending articles in Computer Science + Perfect for cold users - Non personalised, discipline too broad
06/11/2015
27
Algorithms [4]
Recommender Systems and Information Extraction for researchers
4. Citation Network § Articles similar to articles I cited
§ Articles that cite me
§ Articles from my co-author
+ Good for some kind of users
- Young researchers do not have (enough) publications
06/11/2015
28
Evaluation
Recommender Systems and Information Extraction for researchers
• Offline Evaluation of 100+ algorithms variations on an historical dataset
• Split data into training and testing based on timestamps: train until day X, try to predict what users will add in the next day/week/month • Computed different metrics to measure different dimensions:
• Accuracy (precision, recall, f-score, nDCG, MAP) • Diversity • Recency • Popularity • Consistency • Coverage
• Online Evaluation computing CTR on logs data • Do offline and online correlate?
06/11/2015
29
Business Logic / Analytics
Recommender Systems and Information Extraction for researchers
• Business put some constraints that could have an impact on the recommendation experience
– Don’t show articles outside the user discipline – Show articles only with a minimum readership – Show only recommendations that you can explain (especially for people recommendations, a different matter)
• Analytics – Dashboard on the recommender statistics:
• Number of recommendations served • Number of users with recommendations • …
06/11/2015
30
User Interface
Recommender Systems and Information Extraction for researchers
• Original idea: One list fits all
Create a single list with the best recommendations for the user: use advanced methods to take into account every signal and provide what is best for you!
06/11/2015
31
User Interface [2]
Recommender Systems and Information Extraction for researchers
• However… – Different kinds of users can have different information needs!
– The same user in different contexts can have different information needs!
VS
06/11/2015
32
User Interface [3]
Recommender Systems and Information Extraction for researchers
• Solution: different lists! • Provide multiple lists that satisfy different information needs • More likely for a user to find something he is interested in
06/11/2015
33
Lesson learned
Recommender Systems and Information Extraction for researchers
• It’s not about the best algorithm, it’s about the entire user experience!
• Easier (if you can) to put together different lists that serve different information needs than to try to satisfy every user with a single list
06/11/2015
35
Lots of content in an article
Recommender Systems and Information Extraction for researchers 06/11/2015
36
Metadata Extraction
Recommender Systems and Information Extraction for researchers
• Metadata extraction from PDFs was one of the first features of Mendeley
• It makes easy to organize your articles
• It powersMendeley catalog
06/11/2015
37
Citation Extraction
Recommender Systems and Information Extraction for researchers
• Citation extraction from any source and link to the Mendeley catalog • It extracts citable references and a narrative path in the Mendeley environment
06/11/2015
38
Machine learning forextraction
Recommender Systems and Information Extraction for researchers
• Conditional Random Fields (CRF) [1] • We label sequences of tokens yt given feature functions fk(yt, xt) • E.g. ‘yt is AUTHOR and xt-1 is bold’ and ‘yt is AUTHOR and yt-1 is TITLE’
[1] J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In ICML, 2001
Fig. 2.4 in Sutton & McCallum 2011 observations states
06/11/2015
40
What cites this work [2]
Recommender Systems and Information Extraction for researchers 06/11/2015
41
Mendeley Research Maps
Recommender Systems and Information Extraction for researchers
https://marcorossettiblog.wordpress.com/2015/07/05/mendeley-research-maps/
06/11/2015