issuu talk on topic models and recommendation systems

23
Topic Models Recommendations Morten Arngren Senior Data Scientist [ ]

Upload: arngren

Post on 31-Mar-2016

224 views

Category:

Documents


2 download

DESCRIPTION

Issuu gave a talk on the Data Science and Machine Learning Meetup in Copenhagen, Nov. 2013.

TRANSCRIPT

Page 1: Issuu Talk on Topic Models and Recommendation Systems

Topic Models Recommendations

Morten Arngren Senior Data Scientist[ ]

Page 2: Issuu Talk on Topic Models and Recommendation Systems

About Topic Recommendations

💡 !

Recommendations

Modelling

Page 3: Issuu Talk on Topic Models and Recommendation Systems

“…YouTube for Publications…

Page 4: Issuu Talk on Topic Models and Recommendation Systems

IStarted in 2006 by 5 dudes.

15M. publications (free)📖

👀 7.5B. page views / month

340M. pages - (25 km2)

2013

👥 83M. unique visitors / month

""

Page 5: Issuu Talk on Topic Models and Recommendation Systems

Data Science Team (Copenhagen)

12x 2.6GHz

96GB Ram

2TB SSD

2TB HardDrive

Morten Arngren Ph.D. in Machine Learning and AI (2011) M.Sc.A.M. (2007) B.Sc.E.E. (1997) !ISSUU, Data Scientist (2011 - present) DTU & FOSS Analytical, Machine Learning in Food Quality (2008-2011) Nokia Mobile Phones, Digital Signal Processing (2000-2007) Alcatel Space Denmark, Building Rockets (1997-2000)

Andrius Butkus Ph.D. in Digital Media Personalisation (2009) M.Sc.E.E. (2004) B.Sc.E.E. (2002) !ISSUU, Data Scientist (2011 - present) DTU External Lecturer, Human Computer Interaction (2010 - present) DTU Assistant Professor, Digital Media Engineering (2008-2010) ☁ Amazon Web

Services

ML Gadgets

Page 6: Issuu Talk on Topic Models and Recommendation Systems

📈Data📈Data

Page 7: Issuu Talk on Topic Models and Recommendation Systems

📈Data

Page 8: Issuu Talk on Topic Models and Recommendation Systems

📖Layout

(Quantify text and image boxes)

🚀

🚀

Article Extraction

)OCR

🚀

Image

Cover Analysis

#

Explicit Detection

Doc. Type Classification

$

Text

Detect Language (56)

Translate to English (from 24 languages) LDA Topics

(⚛

🚀

🔎

Page

Content

*DB

&40k

Pubs / Day

Page 9: Issuu Talk on Topic Models and Recommendation Systems

time

Reader Activity

+!

,

👍

- -

👍

,

,,

-

N NSession

""

"" "

"

"

*DB

🍔 🍔🎬

🎧1

2📹

“Birdie Nam Nam”

200GB / Day

Page 10: Issuu Talk on Topic Models and Recommendation Systems

Topic Modelling

Page 11: Issuu Talk on Topic Models and Recommendation Systems

LATENT DIRICHLET ALLOCATION

150 topics (preset parameter)

Topic model based on Bag-of-Words Data

http://radimrehurek.com/gensim/

Wikipedia Training Data ~4.5M Single Articles

(Pure Topics)

arabicAustralia history business

islands environment

hotels

poetic

food design arts

plants animals

Topic Distribution

1501

LDA 🌴

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.[ ]

Page 12: Issuu Talk on Topic Models and Recommendation Systems

🚀

(

📹

5

🌴

LATENT DIRICHLET ALLOCATION

Properties Σ[0:1] ∧ = 1

LDA SpacePC 4

the real

5+

Issuu Publications

Page 13: Issuu Talk on Topic Models and Recommendation Systems

TOPIC CATEGORIES

(

🍸

✈ ✈

(

📹

~4.5 Mio.

Density distr ibution not the same

I🌴

8🍸

~9 Mio.

Empty locations in LDA space.

Travel

Cocktails

Chemistry

0.5 Travel 0.4 Spor ts 0.1

Botanics

Drinks

(Learning from Wikipedia Dataset)

Dancing

Page 14: Issuu Talk on Topic Models and Recommendation Systems

Recommendation System!

Page 15: Issuu Talk on Topic Models and Recommendation Systems

🎬

READER ACTIVITY

🍔 🍔🎧1

2📹

Extract Implic it Rating….?

No Explic it Rating….

Time“Birdie Nam Nam”

Page 16: Issuu Talk on Topic Models and Recommendation Systems

Session { UserName: ‘Birdie-Nam-Nam’ DocID: xxx-xxxxx Pages: 1: [250, 725, 569, 134, ...] 2: [1056, 1259, ...] 3: [1056, 1259, ...] 4: [102, 356, 208, 438] 5: [102, 356, 208, 438] 6: [5250, 3567, 809] 7: [5250, 3567, 809] ... TimeStamp: 1378935850 DocID: yyy-yyyyy }

Pages: [1,2,3,6,7] ReadTime: 25789 ms. TimeStamp: 1378935850

Browsing or Reading?Time

Readers

Publ

icat

ions

🍔

🎬

2

🎧

🍸

Page 17: Issuu Talk on Topic Models and Recommendation Systems

Item2Item Matrix

🍔

🎬

2

🎧

🍸

🍔 🎬 2 🎧 🍸

12📹🎬🎧 🍔🍔

Reader indexed learning

To

Pages: [1,6,7,10,11] ReadTime: 11250 ms. TimeStamp: 1385437850

Time

568525081065

850 11509860

3690

in weeks

decay per week= 850

Decay function

Page 18: Issuu Talk on Topic Models and Recommendation Systems

RECOMMENDING

Item2Item Matrix

8

🍔

🎬

🏀

🍸

1 🍟 5 🎧 🎱

1 🍟 5 🎧

Item Matrix Weight Mapping Function

🎧🎬📹 🍔

Time

25081065850 1150

N

👍🌴< 🚀

11 1

Read History

📖

Likes

Stacks

Page 19: Issuu Talk on Topic Models and Recommendation Systems

RECOMMENDING

+5

🍔 I

1 🍕

📹

♫8

🎬

🎧

🏀

🍏🍟

E

🍸🔈

🎤

🎱

📷C

🍷

🍺🎾

F

👽

🎱

Item Matrix Weight Mapping Function

1

Item Weights

1 🍟 5 🎧 🎱 1🍟5 🎧 🎱

🔀Weighted Sampling

1🍟5 🎧 🎱

Page 20: Issuu Talk on Topic Models and Recommendation Systems

Max. Rank

Page 21: Issuu Talk on Topic Models and Recommendation Systems

Tuned Parameters

Page 22: Issuu Talk on Topic Models and Recommendation Systems

Deep Belief Network Model

Bag-of-Words modelTraining Data

I

Lars Maal

2000

500

20

2

Kasper Johansen

! "

Collaborate Fi lter ing Using Social Media Knowledge

Master Student Project

LLøe

Page 23: Issuu Talk on Topic Models and Recommendation Systems

Master Student Project

LLMorten Arngren

Senior Data Scientist[ ]