machine learning and big data for music discovery at spotify

Post on 06-Apr-2017

483 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Machine Learning & Big Data for Music Discovery

Galvanize NYC, Mar 9th, 2017

Vidhya Murali @vid052Ching-Wei Chen @cweichen

100M users in 60 markets

50M subscribers

Over 30M songs, and 2B playlists

$5B paid to rightsholders

SpotifyMusic for everyone.

30 Million Songs...

What to recommend?

What to recommend?

Discover

Discover

Discover Weekly

How What to recommend?

Many flavors of recommendations

Radio

Many flavors of recommendations

Daily MixRadio

Many flavors of recommendations

This Is:Daily MixRadio

Many flavors of recommendations

Recommended SongsThis Is:Daily MixRadio

‣ Editorial

Recommendation approaches

‣ Editorial‣ Algorithmic

○Content-based

■ Metadata

■ Audio Signals

○Collaborative

■ Usage based

Recommendation approaches

‣ Editorial‣ Algorithmic

○Content-based

■ Metadata

■ Audio Signals

○Collaborative

■ Usage based

‣ Algotorial

Recommendation approaches

‣ Editorial‣ Algorithmic

○Content-based

■ Metadata

■ Audio Signals

○Collaborative

■ Usage based

‣ Algotorial

Recommendation approaches

‣ Find patterns from user’s past behavior to generate recommendations.

‣ Domain independent

‣ Scalable

Collaborative Filtering

Compact representation for each user and item (song): f-dimensional vectors

Latent Factor Models

NLP Models

Context & Co-occurrence is key!

Document : Playlist

Word : Song

NLP Models work great on playlists!

Generating Song Vectors

w1 w2 w3 w4 w5 w6 w7 wn....………..

?

Music in Latent Space

Semantic Regularities

Music + Math = Epic

Songs as vectors

Recommendations

User Profile:

● Aggregation over user interactions on Spotify

● Clustering to capture distinct user tastes/ contexts

● Time Sensitive profiling

‣ Scale of catalog● 30M tracks; 2B playlists● Training

○ 25B data points○ 100M users○ 60 countries represented

Challenges unique to spotify

Data Pipelines

Data Pipelines

Big Table

Big Table

GCS

DATAFLOW

Pub Sub

Scio

‣ Scale of catalog● 30M tracks; 2B playlists● Training

○ 25B data points○ 100M users○ 60 countries represented

‣ Cold-Start○ New Users○ New Music

Challenges unique to spotify

Learning from sound

What’s in a sound?

AmplitudeTime

Frequencies

Loudness

What’s in a sound?

MelodyBeats

Chords

Voices

Instruments

Lyrics

AmplitudeTime

Frequencies

Loudness

What’s in a sound?

MelodyBeats

Chords

Voices

Instruments

Lyrics

AmplitudeTime

Frequencies

Loudness

PopularityEra

RegionGenre

Mood

Purpose

* Some information isn’t encoded in the signal itself, but within the cultural context around the music

Supervised Machine Learning

http://www.nltk.org/

Deep Learning

Deep Learning

1. No feature extraction necessary

2. LOTS of simple learning nodes in many layers

3. Propogate errors backwards to learn optimal weights

4. Needs LOTS of data

Convolutional Neural Networks

Typical Convolutional Neural Network

Deep Learning on Audio at Spotify

Sander Dieleman: http://benanne.github.io/2014/08/05/spotify-cnns.html

Input: Audio spectrogram

Output: Latent Space Vector

Audio vector space

Cold Start? Problem solved! *

* Not completely, of course!

Recommending new music

Release Radar Fresh Finds

Recommendations at Spotify

Recommended SongsThis Is:Daily MixRadio

Discover Weekly

Release Radar

What’s next?

?

Join the band!www.spotify.com/jobs

Ching-Wei (@cweichen): cw@spotify.com

Vidhya (@vid052): vidhya@spotify.com

top related