multiple ways of building a recommender system with elasticsearch - elastic meetup switzerland -...

Post on 21-Apr-2017

223 Views

Category:

Data & Analytics

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

The copyright of images belongs to their authors. Drop me a message at andrii@vozniuk.com to remove

Talk description: https://www.meetup.com/elasticsearch-switzerland/events/237184939/

MULTIPLE WAYS OF BUILDING A RECOMMENDER SYSTEM WITH ELASTICSEARCH

ANDRII VOZNIUK REACT-EPFL

Elastic MeetupLausanne, March 2017

1

ANDRII VOZNIUK https://about.me/vozniuk

RESEARCHER Interaction Systems @ REACT-EPFL

SOFTWARE ENGINEER Web, Data, Cloud

ENTREPRENEUR Knowledge Sharing Systems

WHY RECOMMENDATIONS

• Increase engagement

• Address information overload

• Improve information findability

• Not aware of its existence

• Do not know particular keywords

• New content appearing

• Facilitate discovery of relevant content

• Not only search or tags

3

TYPES OF RECOMMENDERS

Content-based

4

Collaborative filtering

Hybrid approaches

recommend

interacts

similar

interacts

recommend

interacts

interacts

similar

A COLLABORATIVE KNOWLEDGE SHARING ENVIRONMENT

graasp.net

GRAASP

5

A SOCIAL MEDIA PLATFORM

AN ADVANCED CONTENT MANAGEMENT SYSTEM

GRAASP IS A MEAN WEB APP

M MongoDB E Express.js A AngularJs N Node.js

Front-end

mongoose

express

Server

Database

6

GRAASP DEMO TWO TYPES OF RECOMMENDATIONS

7

GRAASP RECOMMENDATIONS

8

Contextual recommendations

Personalized recommendations

In theory, can be both at a time

HOWTO CONTEXTUAL RECOMMENDATIONS IN GRAASP

9

GOALS

• Provide contextually relevant recommendations

• Should work for individual items and for spaces (collections of items)

• Will allow the user to discover contextually relevant content items or users

10

BRINGING DATA TO ELASTICSEARCH

Front-end

mongoose

express

Server

Database

mongoosastic

mongoosastic is a mongoose plugin updating ES on mongoose events

11

ELASTICSEARCH COMPUTING RELEVANCE

12

STEP 1. Represent each content item using the document vector model

STEP 0. Compute TF-IDF for each term in the vectors

STEP 2. Use vector cosine similarity for scoring and ranking

ELASTICSEARCH RELEVANCE, VISUALLY

13Source: https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html

Query: happy hippopotamus

1. I am happy in summer.2. After Christmas I’m a hippopotamus.3. The happy hippopotamus helped Harry

Three documents1.Document 1: (happy,____________)—[2,0]2.Document 2: (___,hippopotamus)—[0,5]3.Document 3: (happy,hippopotamus)—[2,5]

TFIDF

ELASTICSEARCH MORE LIKE THIS (MLT) QUERY

14Source: More Like This Query https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-mlt-query.html

Text-based

Can be a combination of both

Document Id-based

“The MLT query simply extracts the text from the input document, analyzes it, usually using the same analyzer at the field, then selects the top K terms with highest tf-idf to form a disjunctive query of these terms.”

ELASTICSEARCH MORE LIKE THIS (MLT) LIMITATIONS

15Source: Lucene MoreLikeThis.java

• Earlier, in 2016 when the doc id is supplied, the text content was concatenated, the search was done over all specified fields

• No way to boost individual fields. Matching on title can be more important than on content

• Now, the query is done field-by-field. Cannot boost, or match desc field with the content field.

• We wanted to do cross-field matching with boosting

16

USING SEARCH FOR RECOMMENDATIONS

Decided to concat fields manually and use the match query

+can boost fields

+can do cross-field matching

+can do cross-type matching

- slower

HOWTO INTERACTIVE RECOMMENDER WITH CONTENT AND ACTIONS IN GRAASP

17

GOALS

• Recommendations matching the user interests rather than the context

• The user should understand the recommender model (interpretability)

• The user should be able to adjust the recommender (interactive)

• In general, we wanted the user to understand and control the recommendations when needed

18

PROPOSAL RECOMMENDATION MODEL

ProvideRecommendations

RecordUser-

ContentInteractions

ExtractConceptsfrom theContent Build

UserInterestsProfile

Interpretable Interactive

19

CONCEPT IDENTIFICATION PIPELINE

20

ExtractedText

Content

Items on platform

Binary Text File

.pdf .docx

Imagewith text

.png .jpg .tiff

Image

Audio

Video

Content Extraction

Plain Text File

Optical Character

Recognition

Speech-To-Text

Visual Image Recognition

Visual Video Recognition

Content Analysis

Content and ConceptsIndexing

IdentifiedConcepts

IndexedIdentifiedConcepts

andText

Content

RecommenderSystem

Leptonica Tesseract

Pdf Report

PowerpointPresentation

Image withText

YoutubeVideo

Σw*UA*DC

accessed

rated

commented

downloadedEducationEducational psychologyKnowledgeLearningKnowledge ManagementHuman-Computer InteractionInterdisciplinarityAcademiaSystems thinkingScientific methodEducational technologyVirtual learning environment

User

Identified Concepts (DC)

Identified User Concepts(UC)

Tracked Activities (UA)

EducationEducational psychology

KnowledgeLearning

Knowledge ManagementSystems thinkingScientific method

Educational technologyVirtual learning environment

LearningKnowledge Management

Human-Computer InteractionInterdisciplinarity

EducationEducational psychology

Academia

21

PROPOSAL INTERESTS PROFILE

IDENTIFIED USER INTERESTS

22

23

USING SEARCH FOR RECOMMENDATIONS

SUMMARY

24

DEMONSTRATED HOW TO USE ELASTICSEARCH FOR

• Contextual recommendations (relevant to the context)

• Personalized recommendations (relevant to the user)

• More LikeThis vs Common queries (e.g., match)

POSSIBLE EXTENSIONS

• Displaying highlights to explain the recommendations

• Using the Percolator to notify the user about new relevant content as it gets uploaded

• Alternative ways of constructing the user profile

• Trying collaborative filtering, user-user similarity can be implemented with Elasticsearch

QUESTIONS? FEEDBACK?

andrii@vozniuk.com

25

about.me/vozniuk

LOOKING WHAT TO DO NEXT

TALK TO ME :)

pomolab.com

top related