delivering personalized music discovery
TRANSCRIPT
![Page 1: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/1.jpg)
Delivering Music Recommendations to Millions!
!
Sriram Malladi Rohan Singh (@rohansingh)
![Page 2: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/2.jpg)
A little bit about us
Over 24 million active users !
55 countries, 4 data centers !
20+ million tracks
![Page 3: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/3.jpg)
Discovering music — then
3
![Page 4: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/4.jpg)
4
![Page 5: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/5.jpg)
5
![Page 6: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/6.jpg)
What do you want to listen to?
6
![Page 7: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/7.jpg)
The right music for every moment
7
![Page 8: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/8.jpg)
The Plan
8
![Page 9: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/9.jpg)
1. Collect lots of data
2. Generate personalized recommendations
3. Serve those to millions of listeners each day
9
![Page 10: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/10.jpg)
Data, data, data
Track plays !
Radio feedback !
Playlists !
Follows
10
![Page 11: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/11.jpg)
Collecting it
All data into and out of Spotify goes through access points !
Access points and services log data !
Logs are aggregated and shipped to Hadoop
11
![Page 12: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/12.jpg)
12
access points
HadoopCassandraDiscover backend
![Page 13: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/13.jpg)
Hadoop
Framework to store and process big, distributed data !
Hadoop Distributed File System (HDFS) !
Hadoop MapReduce !
hadoop.apache.org
13
![Page 14: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/14.jpg)
14
![Page 15: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/15.jpg)
15
![Page 16: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/16.jpg)
Crunch the data
16
![Page 17: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/17.jpg)
What you listen to !
Artists you follow !
What similar users listen to
17
![Page 18: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/18.jpg)
Your buddy listened to Daft Punk all day !
An artist you follow released a new album !
A friend just created a new playlist
18
![Page 19: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/19.jpg)
Region !
Age group !
Gender
19
![Page 20: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/20.jpg)
What do you want to listen to?
20
![Page 21: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/21.jpg)
21
![Page 22: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/22.jpg)
Roll it out
22
![Page 23: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/23.jpg)
Storing recommendations
About 80 kilobytes of data per user !
Regenerated daily for all active users !
> 1 TB of recommendations overall
23
![Page 24: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/24.jpg)
Storing recommendations
Terabyte a day is a fair amount of data to write and index (+ replicas)
!
Tradeoff between storing data or looking things up on the fly
24
![Page 25: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/25.jpg)
Apache Cassandra
Highly available, distributed, scalable !
Fast writes, decent reads !
We have one cluster per data center
25
![Page 26: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/26.jpg)
Transferring worldwide
26
![Page 27: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/27.jpg)
Recommendations all generated in Hadoop, in London !
Need to be shipped to our data centers worldwide !
So much Internet weather
27
![Page 28: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/28.jpg)
hdfs2cass
Internal tool to copy data from Hadoop to Cassandra !
Creates table from data in HDFS !
Loads tables into Cassandra using a bulk loader !
github.com/spotify/hdfs2cass
28
![Page 29: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/29.jpg)
Serve it up
29
![Page 30: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/30.jpg)
1. Aggregate data from all our data sources
2. Decorate it with metadata
3. Shuffle!
30
![Page 31: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/31.jpg)
Serving at scale…
Discover is Spotify’s home page 500+ requests per second
!
Thousands of requests to other services !
Database calls for each unique user
31
![Page 32: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/32.jpg)
…or not?
Naive, first-stage prototype: !
10 to 20 seconds per request !
(doesn’t really scale)
32
![Page 33: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/33.jpg)
Fail a lot
33
![Page 34: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/34.jpg)
Make it webscale!
!
Find & rewrite slow sections !
Switch to C++ from Python for critical code
34
![Page 35: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/35.jpg)
More improvements
!
Pregenerate more data !
Cache aggressively
35
![Page 36: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/36.jpg)
Throw hardware at it
!
Switch to SSD’s !
Scale horizontally
36
![Page 37: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/37.jpg)
Takeaways
You will need hardware — big data can still be expensive
!
Less data can be better !
Prototype, iterate, optimize— fail early but improve
37
![Page 38: Delivering Personalized Music Discovery](https://reader033.vdocuments.mx/reader033/viewer/2022052822/554f57cbb4c905524c8b52c5/html5/thumbnails/38.jpg)
Questions?
38