graph data management lab school of computer science gdm@fudangdm@fudan email:...

Download Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN  Email: yangdeqing@fudan.edu.cnECML-PKDD 2012, Bristol, UK

If you can't read please download the document

Upload: opal-hunt

Post on 18-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

  • Slide 1
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Deqing Yang, Yanghua Xiao, Bo Xu, Hanghang Tong, Wei Wang, Sheng Huang School of Computer Science, Fudan University, China ECML-PKDD2012 Which Topic will You Follow?
  • Slide 2
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Outline Introduction Preliminaries Empirical Study Modeling
  • Slide 3
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Who are the most appropriate candidates to receive a call-for- paper or call-for-participation? How can you deliver the call-for-paper emails to the authors who are interested in the proposed topic instead of flooding it blindly?
  • Slide 4
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? What session topics should we propose for a conference of next year? Furthermore, how many sessions are necessary for a certain topic?
  • Slide 5
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Can we predict the topic of an authors next paper?
  • Slide 6
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Basic Idea Use features of authors in Scientific Collaboration Network (SCN) to model authors topic- following behavior Two candidate features Social influence an individual tends to adopt behaviors of his neighbors or friends Homophily the tendency of individuals to choose friends with similar characteristics
  • Slide 7
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Contributions Verify that social influence and homophily are the two factors determining topic diffusion in SCN Propose a Multiple Logistic Regression (MLR) model to predict authors topic-following behavior Conduct extensive experiments to prove our model has good prediction performance
  • Slide 8
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Outline Introduction Preliminaries Empirical Study Modeling
  • Slide 9
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Scientific Collaboration Network SCN A temporal, undirected and edge-weighted graph Vertex: author Edge: coauthoring relationship Edge-weight: number of papers coauthored by the two ends of the edge Settings DBLP dataset 25 representative topics
  • Slide 10
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Homophily We use topic similarity to characterize homophily A 25-dim vector u represents an authors topic history Topic similarity between two authors u and v: Topic similarity between an author u and a group of authors U: is also a 25-dim vector each dimension of which is i-th topics paper number published by all users in U
  • Slide 11
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Outline Introduction Preliminaries Empirical Study Modeling
  • Slide 12
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Driving Forces of Topic-Following U=U 0 V 0, U 0 V 0 = U 0 : the users who have published papers of a given topic before a certain year V 0 : U 1 ~U 4 N(u) is neighbor set of u U 1: affected by social influence and homophily U 2 : affected merely by social influence U 3 : affected merely by homophily U 4 : not affected by these two forces
  • Slide 13
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Driving Forces of Topic-Following (cont.) Two forces are mixed together to impact topic- following Impacts are time-sensitive and decrease in an exponential way
  • Slide 14
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Social Influence An author adopts a topic with more probability when more of his neighbors have followed the topic before It is more probable for an author to follow the topics that have been adopted by his neighbors (direct propagation) who have coauthored more papers with him
  • Slide 15
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Outline Introduction Preliminaries Empirical Study Modeling
  • Slide 16
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Model Variables Model selection Two-category classification Multiple Logistic Regression (MLR) model Explanatory Variables Social Influence An author us tendency to follow topic s in year t
  • Slide 17
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Explanatory Variables Homophily W.r.t. those users who have followed topic s before t, i.e.,, we measure us homophily as Then, the whole MLR model is Baseline ( Anagnostopoulos et al.,2008 ) Model Variables (cont.)
  • Slide 18
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Parameter Estimation By maximum likelihood (training set in [2004,2008]) 2 has larger Wald value than 1 indicating F TS (homophily) is more crucial to impact topic-following behavior than F SI (social influence)
  • Slide 19
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Evaluation Results Model evaluation Metrics (testing set in 2009) Recall/sensitivity, specificity, precision, accuracy, AUC (Area under ROC curve), Results for topic XML AUC: 0.743 vs. 0.638
  • Slide 20
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Evaluation Results (cont.) For other 4 representative topics, MLR outperforms the baseline in both accuracy and F
  • Slide 21
  • Graph Data Management Lab School of Computer Science GDM@FUDANGDM@FUDAN www.gdm.fudan.edu.cn Email: [email protected] 2012, Bristol, UK Which Topic will You Follow? Thank you! Any question is welcome