recommender systems and information extraction for researchers

42
Recommender Systems and Information Extraction for researchers Marco Rossetti @ross85 6/11/2015 Mendeley, London

Upload: marco-rossetti

Post on 23-Jan-2018

190 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Recommender Systems andInformation Extraction for researchers Marco Rossetti @ross85 6/11/2015 Mendeley, London

2

Outline

Recommender Systems and Information Extraction for researchers

• What is Data Science

• What is Mendeley

• Recommender Systems at Mendeley

• Information Extraction at Mendeley

06/11/2015

3

What is Data Science?

Recommender Systems and Information Extraction for researchers 06/11/2015

4

Why a data scientist?

Recommender Systems and Information Extraction for researchers 06/11/2015

5

Who want a data scientist?

Recommender Systems and Information Extraction for researchers 06/11/2015

6

Who want a data scientist? [2]

Recommender Systems and Information Extraction for researchers 06/11/2015

7

Two main types

Recommender Systems and Information Extraction for researchers

https://www.quora.com/What-is-data-science/answer/Michael-Hochster

06/11/2015

8

Two main types [2]

Recommender Systems and Information Extraction for researchers

https://www.quora.com/What-is-data-science/answer/Michael-Hochster

06/11/2015

9

Skills for Data Science

Recommender Systems and Information Extraction for researchers

http://businessoverbroadway.com/investigating-data-scientists-their-skills-and-team-makeup

06/11/2015

10

What is Mendeley

Recommender Systems and Information Extraction for researchers 06/11/2015

11

Mendeley builds tools tohelp researchers … [2]

Recommender Systems and Information Extraction for researchers

Read &

Organize

Search &

Discover

Collaborate &

Network

Experiment &

Synthesize

06/11/2015

12

Read & Organize

Recommender Systems and Information Extraction for researchers

Reference management

Cite-as-you-

write

Full-text article search

Digitalised

annotations

06/11/2015

13

Search & Discover

Recommender Systems and Information Extraction for researchers

Mendeley Suggest

Literature

Search

Related Documents

06/11/2015

14

Collaborate & Network

Recommender Systems and Information Extraction for researchers

Research network

Professional

research groups

06/11/2015

15

Mendeley & Elsevier

Recommender Systems and Information Extraction for researchers 06/11/2015

16

Elsevier Products

Recommender Systems and Information Extraction for researchers 06/11/2015

17

Recommender Systems

Recommender Systems and Information Extraction for researchers 06/11/2015

18

What is aRecommender System?

Recommender Systems and Information Extraction for researchers

Recommender systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item. [Wikipedia]

06/11/2015

19

Why Recommender Systemsat Mendeley?

Recommender Systems and Information Extraction for researchers

Vision: “To build a personalised research advisor that helps you to organise your work, contextualise it within the global body of research, and connect you with relevant researchers and artifacts.”

06/11/2015

20

Recommender Systemsat Mendeley – Related Documents

Recommender Systems and Information Extraction for researchers 06/11/2015

21

Recommender Systemsat Mendeley – Mendeley Suggest

Recommender Systems and Information Extraction for researchers https://www.mendeley.com/suggest/

06/11/2015

22

Recommender SystemComponents

Recommender Systems and Information Extraction for researchers

Algorithms

Business Logic and Analytics

User Experience

Data Sources Algorithms

Business Logic

& Analytics

User Interface

06/11/2015

23

Data Sources

Recommender Systems and Information Extraction for researchers

• Mendeley – User Libraries

• What the users have in their libraries (what they read, what they annotate, what they highlight, what folders they have, etc. etc.)

– Articles metadata (title, authors, abstract, keywords, tags, etc. etc.) – Groups

• Scopus – Citation network

• Science Direct – Logs

• …

06/11/2015

24

Algorithms

Recommender Systems and Information Extraction for researchers

1.  Collaborative filtering User-based

If Alice read X, Y, Z and Bob read X, Y, Z and W, we recommend W to Alice + Work well for us because users << items - Only for users with enough articles in the library

Item-based

Users who read X also read Y + Item-item similarity matrix is useful to model last n articles read - Expensive in our setting (millions of items)

06/11/2015

25

Algorithms [2]

Recommender Systems and Information Extraction for researchers

1.  Collaborative filtering (still) Matrix factorization

+ Best CF model in literature - Generate recommendations on a catalog of million of items is too slow

1 1 11 1 1? ? 1 ? 1 ?

1 1 11 1

1 1 1

U n x k

V k x m

X n x m

X ≈

06/11/2015

26

Algorithms [3]

Recommender Systems and Information Extraction for researchers

2.  Content-based I read articles about text mining, show me other stuff about text mining + Good for cold users (users without data) - Overspecialisation: items recommended are too similar

3.  Popularity/Trending I work in Computer Science, show me popular/trending articles in Computer Science + Perfect for cold users - Non personalised, discipline too broad

06/11/2015

27

Algorithms [4]

Recommender Systems and Information Extraction for researchers

4.  Citation Network § Articles similar to articles I cited

§ Articles that cite me

§ Articles from my co-author

+ Good for some kind of users

- Young researchers do not have (enough) publications

06/11/2015

28

Evaluation

Recommender Systems and Information Extraction for researchers

• Offline Evaluation of 100+ algorithms variations on an historical dataset

• Split data into training and testing based on timestamps: train until day X, try to predict what users will add in the next day/week/month • Computed different metrics to measure different dimensions:

• Accuracy (precision, recall, f-score, nDCG, MAP) • Diversity • Recency • Popularity • Consistency • Coverage

• Online Evaluation computing CTR on logs data • Do offline and online correlate?

06/11/2015

29

Business Logic / Analytics

Recommender Systems and Information Extraction for researchers

• Business put some constraints that could have an impact on the recommendation experience

– Don’t show articles outside the user discipline – Show articles only with a minimum readership – Show only recommendations that you can explain (especially for people recommendations, a different matter)

• Analytics – Dashboard on the recommender statistics:

• Number of recommendations served • Number of users with recommendations • …

06/11/2015

30

User Interface

Recommender Systems and Information Extraction for researchers

• Original idea: One list fits all

Create a single list with the best recommendations for the user: use advanced methods to take into account every signal and provide what is best for you!

06/11/2015

31

User Interface [2]

Recommender Systems and Information Extraction for researchers

• However… – Different kinds of users can have different information needs!

– The same user in different contexts can have different information needs!

VS

06/11/2015

32

User Interface [3]

Recommender Systems and Information Extraction for researchers

• Solution: different lists! • Provide multiple lists that satisfy different information needs • More likely for a user to find something he is interested in

06/11/2015

33

Lesson learned

Recommender Systems and Information Extraction for researchers

• It’s not about the best algorithm, it’s about the entire user experience!

• Easier (if you can) to put together different lists that serve different information needs than to try to satisfy every user with a single list

06/11/2015

34

Information Extraction

Recommender Systems and Information Extraction for researchers 06/11/2015

35

Lots of content in an article

Recommender Systems and Information Extraction for researchers 06/11/2015

36

Metadata Extraction

Recommender Systems and Information Extraction for researchers

• Metadata extraction from PDFs was one of the first features of Mendeley

• It makes easy to organize your articles

• It powersMendeley catalog

06/11/2015

37

Citation Extraction

Recommender Systems and Information Extraction for researchers

• Citation extraction from any source and link to the Mendeley catalog • It extracts citable references and a narrative path in the Mendeley environment

06/11/2015

38

Machine learning forextraction

Recommender Systems and Information Extraction for researchers

•  Conditional Random Fields (CRF) [1] •  We label sequences of tokens yt given feature functions fk(yt, xt) •  E.g. ‘yt is AUTHOR and xt-1 is bold’ and ‘yt is AUTHOR and yt-1 is TITLE’

[1] J. Lafferty, A. McCallum and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In ICML, 2001

Fig. 2.4 in Sutton & McCallum 2011 observations states

06/11/2015

39

What cites this work

Recommender Systems and Information Extraction for researchers 06/11/2015

40

What cites this work [2]

Recommender Systems and Information Extraction for researchers 06/11/2015

41

Mendeley Research Maps

Recommender Systems and Information Extraction for researchers

https://marcorossettiblog.wordpress.com/2015/07/05/mendeley-research-maps/

06/11/2015

42

Thank you

Recommender Systems and Information Extraction for researchers 06/11/2015