issuu talk on topic models and recommendation systems

Post on 31-Mar-2016

224 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

Issuu gave a talk on the Data Science and Machine Learning Meetup in Copenhagen, Nov. 2013.

TRANSCRIPT

Topic Models Recommendations

Morten Arngren Senior Data Scientist[ ]

About Topic Recommendations

💡 !

Recommendations

Modelling

“…YouTube for Publications…

IStarted in 2006 by 5 dudes.

15M. publications (free)📖

👀 7.5B. page views / month

340M. pages - (25 km2)

2013

👥 83M. unique visitors / month

""

Data Science Team (Copenhagen)

12x 2.6GHz

96GB Ram

2TB SSD

2TB HardDrive

Morten Arngren Ph.D. in Machine Learning and AI (2011) M.Sc.A.M. (2007) B.Sc.E.E. (1997) !ISSUU, Data Scientist (2011 - present) DTU & FOSS Analytical, Machine Learning in Food Quality (2008-2011) Nokia Mobile Phones, Digital Signal Processing (2000-2007) Alcatel Space Denmark, Building Rockets (1997-2000)

Andrius Butkus Ph.D. in Digital Media Personalisation (2009) M.Sc.E.E. (2004) B.Sc.E.E. (2002) !ISSUU, Data Scientist (2011 - present) DTU External Lecturer, Human Computer Interaction (2010 - present) DTU Assistant Professor, Digital Media Engineering (2008-2010) ☁ Amazon Web

Services

ML Gadgets

📈Data📈Data

📈Data

📖Layout

(Quantify text and image boxes)

🚀

🚀

Article Extraction

)OCR

🚀

Image

Cover Analysis

#

Explicit Detection

Doc. Type Classification

$

Text

Detect Language (56)

Translate to English (from 24 languages) LDA Topics

(⚛

🚀

🔎

Page

Content

*DB

&40k

Pubs / Day

time

Reader Activity

+!

,

👍

- -

👍

,

,,

-

N NSession

""

"" "

"

"

*DB

🍔 🍔🎬

🎧1

2📹

“Birdie Nam Nam”

200GB / Day

Topic Modelling

LATENT DIRICHLET ALLOCATION

150 topics (preset parameter)

Topic model based on Bag-of-Words Data

http://radimrehurek.com/gensim/

Wikipedia Training Data ~4.5M Single Articles

(Pure Topics)

arabicAustralia history business

islands environment

hotels

poetic

food design arts

plants animals

Topic Distribution

1501

LDA 🌴

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.[ ]

🚀

(

📹

5

🌴

LATENT DIRICHLET ALLOCATION

Properties Σ[0:1] ∧ = 1

LDA SpacePC 4

the real

5+

Issuu Publications

TOPIC CATEGORIES

(

🍸

✈ ✈

(

📹

~4.5 Mio.

Density distr ibution not the same

I🌴

8🍸

~9 Mio.

Empty locations in LDA space.

Travel

Cocktails

Chemistry

0.5 Travel 0.4 Spor ts 0.1

Botanics

Drinks

(Learning from Wikipedia Dataset)

Dancing

Recommendation System!

🎬

READER ACTIVITY

🍔 🍔🎧1

2📹

Extract Implic it Rating….?

No Explic it Rating….

Time“Birdie Nam Nam”

Session { UserName: ‘Birdie-Nam-Nam’ DocID: xxx-xxxxx Pages: 1: [250, 725, 569, 134, ...] 2: [1056, 1259, ...] 3: [1056, 1259, ...] 4: [102, 356, 208, 438] 5: [102, 356, 208, 438] 6: [5250, 3567, 809] 7: [5250, 3567, 809] ... TimeStamp: 1378935850 DocID: yyy-yyyyy }

Pages: [1,2,3,6,7] ReadTime: 25789 ms. TimeStamp: 1378935850

Browsing or Reading?Time

Readers

Publ

icat

ions

🍔

🎬

2

🎧

🍸

Item2Item Matrix

🍔

🎬

2

🎧

🍸

🍔 🎬 2 🎧 🍸

12📹🎬🎧 🍔🍔

Reader indexed learning

To

Pages: [1,6,7,10,11] ReadTime: 11250 ms. TimeStamp: 1385437850

Time

568525081065

850 11509860

3690

in weeks

decay per week= 850

Decay function

RECOMMENDING

Item2Item Matrix

8

🍔

🎬

🏀

🍸

1 🍟 5 🎧 🎱

1 🍟 5 🎧

Item Matrix Weight Mapping Function

🎧🎬📹 🍔

Time

25081065850 1150

N

👍🌴< 🚀

11 1

Read History

📖

Likes

Stacks

RECOMMENDING

+5

🍔 I

1 🍕

📹

♫8

🎬

🎧

🏀

🍏🍟

E

🍸🔈

🎤

🎱

📷C

🍷

🍺🎾

F

👽

🎱

Item Matrix Weight Mapping Function

1

Item Weights

1 🍟 5 🎧 🎱 1🍟5 🎧 🎱

🔀Weighted Sampling

1🍟5 🎧 🎱

Max. Rank

Tuned Parameters

Deep Belief Network Model

Bag-of-Words modelTraining Data

I

Lars Maal

2000

500

20

2

Kasper Johansen

! "

Collaborate Fi lter ing Using Social Media Knowledge

Master Student Project

LLøe

Master Student Project

LLMorten Arngren

Senior Data Scientist[ ]

top related