multiple ways of building a recommender system with elasticsearch - elastic meetup switzerland -...
TRANSCRIPT
The copyright of images belongs to their authors. Drop me a message at [email protected] to remove
Talk description: https://www.meetup.com/elasticsearch-switzerland/events/237184939/
MULTIPLE WAYS OF BUILDING A RECOMMENDER SYSTEM WITH ELASTICSEARCH
ANDRII VOZNIUK REACT-EPFL
Elastic MeetupLausanne, March 2017
1
ANDRII VOZNIUK https://about.me/vozniuk
RESEARCHER Interaction Systems @ REACT-EPFL
SOFTWARE ENGINEER Web, Data, Cloud
ENTREPRENEUR Knowledge Sharing Systems
WHY RECOMMENDATIONS
• Increase engagement
• Address information overload
• Improve information findability
• Not aware of its existence
• Do not know particular keywords
• New content appearing
• Facilitate discovery of relevant content
• Not only search or tags
3
TYPES OF RECOMMENDERS
Content-based
4
Collaborative filtering
Hybrid approaches
recommend
interacts
similar
interacts
recommend
interacts
interacts
similar
A COLLABORATIVE KNOWLEDGE SHARING ENVIRONMENT
graasp.net
GRAASP
5
A SOCIAL MEDIA PLATFORM
AN ADVANCED CONTENT MANAGEMENT SYSTEM
GRAASP IS A MEAN WEB APP
M MongoDB E Express.js A AngularJs N Node.js
Front-end
mongoose
express
Server
Database
6
GRAASP DEMO TWO TYPES OF RECOMMENDATIONS
7
GRAASP RECOMMENDATIONS
8
Contextual recommendations
Personalized recommendations
In theory, can be both at a time
HOWTO CONTEXTUAL RECOMMENDATIONS IN GRAASP
9
GOALS
• Provide contextually relevant recommendations
• Should work for individual items and for spaces (collections of items)
• Will allow the user to discover contextually relevant content items or users
10
BRINGING DATA TO ELASTICSEARCH
Front-end
mongoose
express
Server
Database
mongoosastic
mongoosastic is a mongoose plugin updating ES on mongoose events
11
ELASTICSEARCH COMPUTING RELEVANCE
12
STEP 1. Represent each content item using the document vector model
STEP 0. Compute TF-IDF for each term in the vectors
STEP 2. Use vector cosine similarity for scoring and ranking
ELASTICSEARCH RELEVANCE, VISUALLY
13Source: https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html
Query: happy hippopotamus
1. I am happy in summer.2. After Christmas I’m a hippopotamus.3. The happy hippopotamus helped Harry
Three documents1.Document 1: (happy,____________)—[2,0]2.Document 2: (___,hippopotamus)—[0,5]3.Document 3: (happy,hippopotamus)—[2,5]
TFIDF
ELASTICSEARCH MORE LIKE THIS (MLT) QUERY
14Source: More Like This Query https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-mlt-query.html
Text-based
Can be a combination of both
Document Id-based
“The MLT query simply extracts the text from the input document, analyzes it, usually using the same analyzer at the field, then selects the top K terms with highest tf-idf to form a disjunctive query of these terms.”
ELASTICSEARCH MORE LIKE THIS (MLT) LIMITATIONS
15Source: Lucene MoreLikeThis.java
• Earlier, in 2016 when the doc id is supplied, the text content was concatenated, the search was done over all specified fields
• No way to boost individual fields. Matching on title can be more important than on content
• Now, the query is done field-by-field. Cannot boost, or match desc field with the content field.
• We wanted to do cross-field matching with boosting
16
USING SEARCH FOR RECOMMENDATIONS
Decided to concat fields manually and use the match query
+can boost fields
+can do cross-field matching
+can do cross-type matching
- slower
HOWTO INTERACTIVE RECOMMENDER WITH CONTENT AND ACTIONS IN GRAASP
17
GOALS
• Recommendations matching the user interests rather than the context
• The user should understand the recommender model (interpretability)
• The user should be able to adjust the recommender (interactive)
• In general, we wanted the user to understand and control the recommendations when needed
18
PROPOSAL RECOMMENDATION MODEL
ProvideRecommendations
RecordUser-
ContentInteractions
ExtractConceptsfrom theContent Build
UserInterestsProfile
Interpretable Interactive
19
CONCEPT IDENTIFICATION PIPELINE
20
ExtractedText
Content
Items on platform
Binary Text File
.pdf .docx
Imagewith text
.png .jpg .tiff
Image
Audio
Video
Content Extraction
Plain Text File
Optical Character
Recognition
Speech-To-Text
Visual Image Recognition
Visual Video Recognition
Content Analysis
Content and ConceptsIndexing
IdentifiedConcepts
IndexedIdentifiedConcepts
andText
Content
RecommenderSystem
Leptonica Tesseract
Pdf Report
PowerpointPresentation
Image withText
YoutubeVideo
Σw*UA*DC
accessed
rated
commented
downloadedEducationEducational psychologyKnowledgeLearningKnowledge ManagementHuman-Computer InteractionInterdisciplinarityAcademiaSystems thinkingScientific methodEducational technologyVirtual learning environment
User
Identified Concepts (DC)
Identified User Concepts(UC)
Tracked Activities (UA)
EducationEducational psychology
KnowledgeLearning
Knowledge ManagementSystems thinkingScientific method
Educational technologyVirtual learning environment
LearningKnowledge Management
Human-Computer InteractionInterdisciplinarity
EducationEducational psychology
Academia
21
PROPOSAL INTERESTS PROFILE
IDENTIFIED USER INTERESTS
22
23
USING SEARCH FOR RECOMMENDATIONS
SUMMARY
24
DEMONSTRATED HOW TO USE ELASTICSEARCH FOR
• Contextual recommendations (relevant to the context)
• Personalized recommendations (relevant to the user)
• More LikeThis vs Common queries (e.g., match)
POSSIBLE EXTENSIONS
• Displaying highlights to explain the recommendations
• Using the Percolator to notify the user about new relevant content as it gets uploaded
• Alternative ways of constructing the user profile
• Trying collaborative filtering, user-user similarity can be implemented with Elasticsearch