Topic Models Recommendations
Morten Arngren Senior Data Scientist[ ]
About Topic Recommendations
💡 !
Recommendations
Modelling
“…YouTube for Publications…
IStarted in 2006 by 5 dudes.
15M. publications (free)📖
👀 7.5B. page views / month
340M. pages - (25 km2)
2013
👥 83M. unique visitors / month
""
Data Science Team (Copenhagen)
12x 2.6GHz
96GB Ram
2TB SSD
2TB HardDrive
Morten Arngren Ph.D. in Machine Learning and AI (2011) M.Sc.A.M. (2007) B.Sc.E.E. (1997) !ISSUU, Data Scientist (2011 - present) DTU & FOSS Analytical, Machine Learning in Food Quality (2008-2011) Nokia Mobile Phones, Digital Signal Processing (2000-2007) Alcatel Space Denmark, Building Rockets (1997-2000)
Andrius Butkus Ph.D. in Digital Media Personalisation (2009) M.Sc.E.E. (2004) B.Sc.E.E. (2002) !ISSUU, Data Scientist (2011 - present) DTU External Lecturer, Human Computer Interaction (2010 - present) DTU Assistant Professor, Digital Media Engineering (2008-2010) ☁ Amazon Web
Services
ML Gadgets
📈Data📈Data
📈Data
📖Layout
(Quantify text and image boxes)
🚀
🚀
Article Extraction
)OCR
🚀
Image
Cover Analysis
#
Explicit Detection
Doc. Type Classification
$
Text
Detect Language (56)
Translate to English (from 24 languages) LDA Topics
(⚛
🚀
🔎
Page
Content
*DB
&40k
Pubs / Day
time
Reader Activity
+!
,
👍
- -
👍
,
,,
-
N NSession
""
"" "
"
"
*DB
🍔 🍔🎬
🎧1
2📹
“Birdie Nam Nam”
200GB / Day
Topic Modelling
LATENT DIRICHLET ALLOCATION
150 topics (preset parameter)
Topic model based on Bag-of-Words Data
http://radimrehurek.com/gensim/
Wikipedia Training Data ~4.5M Single Articles
(Pure Topics)
arabicAustralia history business
islands environment
hotels
poetic
food design arts
plants animals
Topic Distribution
1501
LDA 🌴
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003.[ ]
🚀
✈
(
📹
5
🌴
LATENT DIRICHLET ALLOCATION
Properties Σ[0:1] ∧ = 1
LDA SpacePC 4
the real
5+
Issuu Publications
TOPIC CATEGORIES
(
🍸
✈ ✈
(
📹
~4.5 Mio.
Density distr ibution not the same
I🌴
8🍸
~9 Mio.
Empty locations in LDA space.
Travel
Cocktails
Chemistry
0.5 Travel 0.4 Spor ts 0.1
Botanics
Drinks
(Learning from Wikipedia Dataset)
Dancing
Recommendation System!
🎬
READER ACTIVITY
🍔 🍔🎧1
2📹
Extract Implic it Rating….?
No Explic it Rating….
Time“Birdie Nam Nam”
Session { UserName: ‘Birdie-Nam-Nam’ DocID: xxx-xxxxx Pages: 1: [250, 725, 569, 134, ...] 2: [1056, 1259, ...] 3: [1056, 1259, ...] 4: [102, 356, 208, 438] 5: [102, 356, 208, 438] 6: [5250, 3567, 809] 7: [5250, 3567, 809] ... TimeStamp: 1378935850 DocID: yyy-yyyyy }
Pages: [1,2,3,6,7] ReadTime: 25789 ms. TimeStamp: 1378935850
Browsing or Reading?Time
Readers
Publ
icat
ions
🍔
🎬
2
🎧
🍸
Item2Item Matrix
🍔
🎬
2
🎧
🍸
🍔 🎬 2 🎧 🍸
12📹🎬🎧 🍔🍔
Reader indexed learning
To
Pages: [1,6,7,10,11] ReadTime: 11250 ms. TimeStamp: 1385437850
Time
568525081065
850 11509860
3690
in weeks
decay per week= 850
Decay function
RECOMMENDING
Item2Item Matrix
8
🍔
🎬
🏀
🍸
1 🍟 5 🎧 🎱
1 🍟 5 🎧
Item Matrix Weight Mapping Function
🎧🎬📹 🍔
Time
25081065850 1150
N
👍🌴< 🚀
11 1
Read History
📖
Likes
Stacks
RECOMMENDING
+5
🍔 I
1 🍕
📹
♫8
🎬
🎧
🏀
🍏🍟
E
🍸🔈
🎤
🎱
📷C
🍷
🍺🎾
F
👽
🎱
Item Matrix Weight Mapping Function
1
Item Weights
1 🍟 5 🎧 🎱 1🍟5 🎧 🎱
🔀Weighted Sampling
1🍟5 🎧 🎱
Max. Rank
Tuned Parameters
Deep Belief Network Model
Bag-of-Words modelTraining Data
I
Lars Maal
2000
500
20
2
Kasper Johansen
! "
Collaborate Fi lter ing Using Social Media Knowledge
Master Student Project
LLøe
Master Student Project
LLMorten Arngren
Senior Data Scientist[ ]