recommendation engines with ruby and redis

69
Evan Light @elight evan.light@tripledogda re.net

Upload: evanlight

Post on 20-Jun-2015

646 views

Category:

Engineering


4 download

DESCRIPTION

A couple of years ago, a client asked Evan to build a recommendation engine for them. Coming into this with a minimal knowledge of statistical math, Evan ultimately built a relatively simple recommendation engine in Ruby. The design made heavy use of Redis Sets, Lists, and Hashes in order to greatly reduce the number of SQL queries to provide recommendations. This talk will be a case study discussing the object-oriented considerations in designing a scalable service, how Redis was a good fit for the project, and some of the painful lessons that Evan learned along the way so that you don’t have to repeat them.

TRANSCRIPT

2. RecommendationEngineswith Ruby and Redis 3. AgendaCase studyProblemSolutionRedis-related tangentsLessons learned 4. The caseFutbol Soccersocial network 5. The problemDisplay popular and relevant content to usersin near real-time 6. The solutionA recommendationengine! 7. Approximation 8. Statistics and I 9. Why not statistical methods? 10. Why Ruby? 11. The nounsUserPostCommentTeamPlayer 12. The verbs 13. Submitting a PostOften tagged with:TeamsPlayers 14. Commenting on a PostSometimes tagged with otherusersThink twitter mentions 15. Other verbs Favoriting a Team or Player Liking a Post 16. Given this 17. and maybe this 18. we want this! 19. That maybe 20. We want 2 kinds of PostsPopularityRelevance 21. PopularityCommentsLikes 22. RelevanceFavoriting DC UnitedSubmitting a Post tagged with DC UnitedLiking a Post tagged with DC UnitedCommenting on a Post tagged with DC UnitedBeing mentioned in a Comment on a Post taggedwith DC United 23. Yes, once upon a time, Icared about sports 24. Relevance (contd)Given an arbitrary event on PostFor each associated TagFor each User interested in Tag on Post(Re)score User's interest in Post 25. O(n2) 26. Redefine success 27. We only care aboutrecent Posts! 28. We start with ActiveRecord 29. Ok, really,ActiveRecord::Observers 30. CallbacksPost / Comment / Favorite / Likecreate/destroy 31. We need a consumer forthese lifecycle events 32. Resque to the rescue! 33. Redis in 120 seconds or less(Dont time me) Key value store In memory Few but adequate persistent options TTL configurable by key 34. Redis (contd)"value" can also be a data structure: List Hash Set Sorted set 35. AR::Observers push eventsto Resque 36. Example event 37. Calculator 38. CalculatorResque workerStrategyHandles all persistence 39. Persistence Hide Redis behind abstraction Pluggable backend 40. Trendingness Calculator(Re)compute individual Post's popularityInputEventOutputNew score for Post 41. Post trendingness in RedisRepresented as key value pairsKeyed to namespaced Post IDse.g., "trend42" 42. Trendingness in Redis3 day TTL per scoreSorted in Ruby when requesting trending posts 43. User Interest Calculator(Re)compute Users' interest in TagsOften more than 1 User effectedArbitrarily assigned weights per event typeInputEventOutputUpdated User interest per Tag 44. User interest 45. User interest in Redis Hash per User Key: User ID Field names: Tags Values: Scalar User interest for Tag Intentionally no TTL 46. Post Score Calculator(Re)score User interests for Users alreadyinterested in TagInputEventUser's interests per Tag from User InterestCalculator 47. O(n2) 48. But inverted indices helped 49. Inverted index Index content to location of content Tag -> Redis Set of Post IDs Tag -> Redis Set of interested User IDs 50. User Post scores in Redis Hash Key: User ID Field: Post ID Value: User interest score 51. What is this? 52. Eagerly optimized to reduceRDBMS queries 53. Cache interesting primary keys in Redis Sets Reduces need for RDBMS queries 54. Break up Calculator Make each X Calculator in a Resque worker Allows X Calculator to scale independently 55. Good enough? It ran successfully in production Customer was happy 56. Emergent behavior Polymorphic tags Everything is "taggable" in app Most AR object callbacks initially enqueuedevents in resque Calculator scored every Tag relationship... ... including how interested I am in other people! 57. Pivot into sports-relateddating website? ;-) 58. Lessons learned 59. Statistical methodsBecause O(n2) burnses usprecioussssss 60. TTL all the things 61. Choose Key Value over Hashwith care 62. Pruning 63. Reduce chattiness w/ Redis Pipelining Redis Lua scripting 64. Let Redis do the workTrendingnessSorted Set 65. Replace Resque Sidekiq Kafka 66. Curious to evaluate Redisalternatives Fault tolerance Maybe a graph database 67. Questions?Evan [email protected]@tripledogdare.net