ingiltere’de+veri+bilimi+ve++ akademi3endustri+isbirligi ... · ingiltere’de+veri+bilimi+ve++...

15
Ingiltere’de Veri Bilimi ve AkademiEndustri Isbirligi Ornekleri Emine Yilmaz Ziyaretci Ogre2m Uyesi, Sabanci Universitesi Docent Dr. ve Turing Fellow, University College London (UCL)

Upload: others

Post on 24-Sep-2020

18 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Ingiltere’de  Veri  Bilimi  ve    Akademi-­‐Endustri  Isbirligi  Ornekleri

 Emine  Yilmaz  

Ziyaretci  Ogre2m  Uyesi,  Sabanci  Universitesi  Docent  Dr.  ve  Turing  Fellow,  University  College  London  (UCL)  

 

Page 2: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Turing  Ens;tusu

•  Ingiltere’de  2015  yilinda  kurulmus  ulusal  veri  bilimi  merkezi  •  Veri  biliminde  dunya  capinda  lider  bir  aras2rma  merkezi  olusturmak  •  Bilgisayar  bilimleri,  Matema2k,  Ista2s2k,  Sosyal  bilimler  gibi  veri  bilimi  ile  ilgili  alanlarda  uzman  kadro  olusturmak  

•  University  of  Cambridge,  Edinburgh,  Oxford,  UCL  ve  Warwick  kurucu  universiteler  

•  Ens2tu  calisanlari    •  Kurucu  universitelerdeki  akademisyenler  (fellowlar)    •  Tam  zamanli  ens2tude  calisan  doktora  ogrencileri  ve  post-­‐doclar  •  Ziyaretci  ogre2m  uyeleri  ve  stajyerler  

Page 3: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Turing  Ens;tusu-­‐Endustri  Ortakligi

•  Endustri’den  partnerler:  Lloyd’s  Register  Founda2on,  Intel,  HSBC  

•  Endustri  ortaklari  •  Ens2tunun  calisma  onceliklerini  belirlemede  soz  sahibi  olur  •  Ens2tu  calisanlari  ile  ortak  projeler  yurutur  •  Veri  paylasimlari  sayesinde  endustri  ile  direk  alakali  olan  problemler  uzerinde  calisilir  

•  Turing’deki  egi2mlere  (masterclass,  seminerler,  calistaylar)  ka2labilme  onceligi  

Page 4: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Elsevier  ile  Ortak  Proje

•  Elsevier  bir  akademik  yayin  plaZormu  •  Mendeley,  science  direct,  scopus,  vb.  cok  sayida  urun  

• Amac:  Kisilerin  ilgi  alanlarini  anlayip  kisiselles2rilmis  sonuclar  sunma  

• Bir  post-­‐doc  ogrencisi  araciligiyla  ortak  calisma  yurutuldu  •  Calisma  pek  cok  dergi/konferanslarda  yayinlandi  •  Yapilan  aras2rmalar  Elsevier  datasinin  uzerine  de  uygulandi  

Page 5: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Elsevier  Projesi:  Aras;rma  Konulari

•  Kullanicilarin  ilgi  alanlarinin  belirlenmesi  •  Topic  modelleme  (LDA):  Su  anda  kullanilan  en  populer  method  •  Kullanicilarin  daha  once  okuduklari  dokumanlara  bakarak  ilgilendikleri  konular  belirlenir  

•  Topic  modelleme’nin  cozemedigi  iki  onemli  sorun  •  Kullanicilarin  ilgi  alanlari  zamanla  (dinamik  olarak)  degisebilir    •  Kullanicilar  sistemi  ilk  kullanmaya  basladiklarinda  ilgi  alanlarini  belirlemek  zor  olmaktadir  

•  Elsevier  projesinin  amaci    topic  modelleme’nin  cozemedigi  bu  iki    probleme  cozum  ge2rmek2r  

Page 6: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Topic  Modelleme:  Latent  Dirichlet  Alloca;on  (LDA)

• Her  topic  kelimeler  uzerinde  belirlenmis  bir  olasilik  dagilimi  (probability  distribu2on)  • Her  dokuman  latent  topicler  uzerinde  tanimlanmis  bir  mixture  • Her  bir  kullanici  profili  ziyaret  e^gi  dokumanlara  bagli  olarak  topic’ler  uzerinde  bir  olasilik  dagilimi    

 

Page 7: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

• Bir  dokuman  elde  etmek  icin  (genera2ve  model)  •  Topicler  uzerinde  bir  olasilik  dagilimi  (Distribu2on  over  topics)  secilir  

•  θd  ∼  Dirichlet(α)  •  Dokumandaki  her  bir  kelime  icin  

•  Topicler  uzerindeki  olasilik  dagilimina  dayanarak  rastgele  bir  topic  secilir  •  zd,n  ∼  mul2nomial(θd)  

•  Secilmis  olan  topic  uzerinden  rastgele  bir  kelime  secilir  •  wd,n  ∼  mul2nomial(βzd,n)  

Topic  Modelleme:  Latent  Dirichlet  Alloca;on  (LDA)

Page 8: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Dinamik  ve  Sosyal  Network’e  Dayali    Topic  Modelleme   q   Topic  modelleme  kisilerin  degisen  ilgilerini  modellemede  ve  kullanicilar  sistemi  ilk  kullanmaya  basladiginda  iyi  sonuclar  vermez  

q Yeni  bir  modelleme  algoritmasi

q Arka  arkaya  gelen  zaman  araliklarindaki  olasilik  dagilimlari  birbirine  bagimli  (dependent)

q Modellemede  arkadas  bilgisini  kullanma q 𝑢↑′ 𝑛𝑢𝑛  𝑖𝑙𝑔𝑖  𝑎𝑙𝑎𝑛𝑙𝑎𝑟𝑖:   𝜃↓𝑡,  𝑢   q 𝑢↑′ 𝑛𝑢𝑛  𝑎𝑟𝑘𝑎𝑑𝑎𝑠𝑙𝑎𝑟𝑖𝑛𝑖𝑛  𝑖𝑙𝑔𝑖  𝑎𝑙𝑎𝑛𝑙𝑎𝑟𝑖:  

  8  

each document exhibits those topics (Blei, Ng, and Jor-dan 2003). Since the well-known topic models, PLSI (Prob-abilistic Latent Semantic Indexing) (Hofmann 1999) andLDA (Latent Dirichlet Allocation) (Blei, Ng, and Jordan2003), were proposed, topic models with dynamics havebeen widely studied. These include the Dynamic TopicModel (DTM) (Blei and Lafferty 2006), Dynamic MixtureModel (DMM) (Wei, Sun, and Wang 2007), Topic over Time(ToT) (Wang and McCallum 2006), Topic Tracking Model(TTM) (Iwata et al. 2009), and more recently, GeneralizedLinear Dynamic topic model (Caballero and Akella 2015),the dynamic User Clustering Topic model (UCT) (Zhaoet al. 2016), News and Twitter Interaction Topic Model(NTIT) (Hua et al. 2016), Dynamic Clustering Topic model(DCT) (Liang, Yilmaz, and Kanoulas 2016) and scaling-updynamic model (Bhadury et al. 2016). All of these modelsexcept DCT aim at inferring documents’ dynamic topic dis-tributions rather than user clustering. Except UCT and DCTthat work in the context of short text streams, most of the theprevious dynamic topic models works in the context of longtext streams. To the best of our knowledge, none of existingdynamic topic models has considered the problem of clus-tering users with collaborative information, e.g., followees’interests, in the context of short text streams.

Problem FormulationThe problem we address is to track users’ dynamic inter-ests and cluster them over time in the context of short textstreams such that users in the same cluster at a specific pointin time share similar interests. The dynamic user clusteringalgorithm is essentially a function g that satisfies:

ut

= {u1

, u

2

, . . . , u|ut|}g�! C

t

= {c1

, c2

, . . . , cZ

},

where ut

represents a set of users appearing in the stream

up to time t, with u

i

being the i-th user in ut

and |ut

| thetotal number of users in the user set, while C

t

is the re-sulting set of clusters of users with c

z

being the z-th clus-ter in C

t

and Z being the total number of clusters. We letD

t

= {. . . ,dt�2

,dt�1

,dt

} denote the stream of docu-ments generated by users in u

t

up to time t with dt

beingthe most recent set of short documents arriving at time pe-riod t. We assume that the length of a document d in D

t

isno more than a predefined small length (for instance, 140characters in the case of Twitter).

MethodIn this section, we describe our proposed User Collabora-tive Interest Tracking topic model, UCIT, aiming at trackingusers’ and their followees’ interests, and dynamically clus-tering them in the context of short text streams.

OverviewWe use Twitter as our default setting of short text streamsand provide an overview of our proposed UCIT model inAlgorithm 1. Following (Liang, Ren, and de Rijke 2014;Zhao et al. 2016), we represent each user’s interests by top-ics. Thus, the interests of each user u 2 u

t

at time pe-riod t are represented as a multinomial distribution ✓

t,u

=

Algorithm 1: Overview of the proposed UCIT model.Input : A set of users ut along with their tweets Dt

Output: Clusters of users Ct

1 Construct a collection of word-pairs bt,u for each user u2 Use UCIT model to track each user’s interests as ✓t,u and

their collaborative interest as t,u

3 Cluster users based on each user’s interest ✓t,u and theircollaborative interest t,u

✓t�1

↵t�1

t�1

�t�1

✓t

↵t

t

�t

z z

vi vj vi vj

�t�1 �t

�t�1 �t

Z Z

|ut�1

| |ut�1

| |ut

| |ut

|

|bt�1,u| |bt,u||u

t�1

| |ut

|

Figure 1: Graphical representation of our user interest trackingclustering topic model, UCIT. Shaded nodes represent observedvariables.

{✓t,u,z

}Zz=1

over topics. Here Z is the total number of topics.The distribution ✓

t,u

is inferred by the UCIT model. To al-leviate the sparcity problem of short texts, and by followingrecent work on the topic (Yan et al. 2013; 2015), we con-struct and represent documents by their biterms, i.e. wordpairs in them (step 1 in Algorithm 1). Next, we propose adynamic Dirichlet multinomial mixture user collaborativeinterest tracking topic model to capture each user’s dynamicinterests ✓

t,u

= {✓t,u,z

}Zz=1

and their collaborative interests

t,u

= { t,u,z

}Zz=1

inferred from their followees ft,u

, attime t, in the context of short text streams (step 2 in Algo-rithm 1). Here f

t,u

is user u’s all followees at t.Based on each user’s multinomial distributions ✓

t,u

and

t,u

, we cluster users using K-means clustering (Jain 2010)(step 3 in Algorithm 1). With the time period t moving for-ward, the clustering result changes dynamically.

User Collaborative Interest Tracking Model

Modeling interests over time. The goal of UCIT topicmodel is to infer the dynamical topic distribution of eachuser, ✓

t,u

= {✓t,u,z

}Zz=1

, and the user’s collaborative topicdistribution,

t,u

= { t,u,z

}Zz=1

, in short text streams ata given time t, and dynamically cluster all users based oninformation of each user’s ✓

t,u

and t,u

over time. Fig. 1shows a graphical representation of our UCIT model.

Given a user u, to track the dynamics of their interests, we

t,u

Page 9: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Deneysel  Sonuclar:  Data

q Datasets:  Twicer’da  1375  kullanici,  onlarin  arkadaslari  ve  arkadaslarinin  tweetleri  q Mayis  2015  tarihine  kadar  kaydolan  kullanicilar  q Cogu  kullanicinin  2  ila  50  arasinda  degisen  takipcisi  bulunmakta  

q Kullanicilarin  gelecekteki  hareketlerini  tahmin  etmeye  dayali  kalite  degerlendirmesi  

9  

Page 10: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Ornek  Sonuclar

Page 11: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Deneysel  Sonuclar

(a)  Topiksel  temsil  (representa2on)  kalitesi   (b)  Perplexity  

H-score

LDA AuthorTDTMTTM ToT UCITavg

UCITavg+ψ

UCITψ

0.25

0.35

0.45

0.55

Perplexity

LDA AuthorTDTM TTM ToT

GSDMMUCIT

ψ1000

1400

1800

2200

Page 12: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Elsevier  ile  Ortak  Proje:  Yurutumu

•  Elsevier  tarafindan  projeyi  yuruten  bir  yone2ci  ve  projenin  Elsevier  uzerinde  kodlanmasi  ile  ilgilenen  bir  programci  atandi  

• Projede  calisan  post-­‐doc  hahanin  bir  gunu  Elsevier’da,  Elsevier  calisanlarina  bilgi  vererek  ve  onlarla  beraber  kodlama  yaparak  gecirdi  

• Proje  Elsevier  datasi  ile  denenmeden  once  genel  kullanima  acik  datalar  (Twicer,  vb.)  uzerinde  denendi  

   • UCL  ve  Elsevier  arasinda  periyodik  olarak  toplan2lar  yapildi  

Page 13: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Endustri-­‐Akademi  Isbirlikleri  Hakkinda

•  Her  iki  taraf  icin  de  buyuk  katkilari  olan  isbirlikleri  

•  Endustri  icin  avantajlar  •  Akademisyenlerle  kendileri  icin  onemli  problemlerde  calisma  •  En  son  teknikleri  ogrenip  uygulama  imkani  •  Altyapiyi  duzenleme  ve  very  toplama  altyapisi  olusturma    

•  Akademisyenler  icin  avantajlar  •  Datasetlerine  erisim  •  Gercek  problemlere  odaklanma  •  Metodlari  gercek  sistemlerde,  gercek  kullanicilarla  deneme  imkani  

Page 14: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Endustri-­‐Akademi  Isbirligi  Ornekleri

•  Endustri  tarafindan  finanse  edilen  post-­‐doc/doktora  pozisyonlari  • Part-­‐2me  doktora  pozisyonlari  • Ogre2m  uyesi  odulleri  •  Endustri  ile  ortak  master  tez  danismanligi  •  Endustri’ye  yapilan  danismanliklar  •  Endustri’de  yapilan  sabba2callar  

Page 15: Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi ... · Ingiltere’de+Veri+Bilimi+ve++ Akademi3Endustri+Isbirligi+Ornekleri! Emine!Yilmaz Ziyaretci!Ogrem !Uyesi, Sabanci!Universitesi!

Ozet

•  Turing  Ens2tusu,  amaci  ve  calisma  tarsi  

•  Elsevier  ile  ornek  endustri-­‐akademi  isbirligi  projesi  •  Nasil  isliyor?  •  Calisilan  problemler  •  Ne  gibi  sonuclar  alindi?  

•  Endustri-­‐akademi  isbirligi  ornekleri  ve  avantajlari