USER-CENTERED DATA ANALYTICS AND MODELING
Hongzhi YinSchool of ITEE
University of Queensland, Australia
May 26, 2018
About The University of Queensland
2
About The University of Queensland
3
The University of Queensland
4
School of ITEE
5
School of ITEE
6
My Basic Information
• Education Background
• 2009.9-2014.7, Peking University, Ph.D. in Computer Science
• Supervisor: Prof. Bin Cui (长江特聘教授)
• 2014.9-2015.12, The University of Queensland, Postdoc Research
Fellow
• Supervisor: Prof. Xiaofang Zhou (IEEE Fellow, 千人计划)
• Working Experiences
• 2016.1-2018.12: ARC DECRA Fellow (澳洲优秀青年基金)
• 2017.1-Present: Lecturer in Data Science (Continuing Position),
Deputy Director of Master of Computer Science in The
University of Queensland
• 2018.1 – Present: Chief AI Counselor in One-stop Warehouse
Company (Top-1 Wholesale distributors of Solar Products)
7
Selected Research Awards and Impact• Google Scholar Citations: 1166, H-index: 16
• My paper “LCARS: A Location-Content-Aware Recommender System” is top 1 cited paper among all KDD-13 oral papers.
• My TOIS’16 Paper “Joint modelling of user check-in behaviours for real-time point-of-interest recommendation” won 21st ACM Annual Best of Computing Award.
• Also Invited to present this work in SIGIR’18
• 2017.11, EAIT Faculty Early Career Researcher Award, The University of Queensland. (Only one winner in school of ITEE)
• 2016.1, Australia Discovery Early Career Researcher Award. (澳洲优青;
Only 6 winners in the information area across the whole Australia)
• Best Paper Award, 2016 Australian Database Conference
• 2014.7, Distinguished Doctor Degree Thesis Award, Peking University (Only 1 winner in CS department)
• 2014.5, Top-10 Distinguished Academic Fellow Award, Peking University8
Selected Research Awards and Impact
9
Publications in 2018
• 9 CCF A Conference Papers
• 4 KDD, 3 ICDE, 1 SIGIR, 1 IJCAI
• 6 CCF B Conference Papers
• 1 WSDM, 5 Dasfaa
• 1 CCF A Journal Papers
• 1 TKDE
• 4 JCR Zone 1-2 Papers
• 1 ACM TIST, 1 Information Science, 1 Knowledge-Based Systems,
1 Future Generation Computer Systems
• 4 CCF B Journal Papers
10
Graduation Ceremony of My first PhD student
11
My General Research• Data Mining
• KDD, ICDM, WSDM, TKDD and TKDE
• Database
• SIGMOD, VLDB, ICDE, VLDB J, TKDE
• Information Retrieval and Web Mining
• WWW, SIGIR, WSDM, CIKM, ACM TOIS
• Artificial Intelligence
• AAAI, IJCAI, ACM TIST
• Harnessing User Generated and Consumed Data for User-Centered Research
• Top-Tier Conferences and Journals: 60+ (21 CCF A/B SCI Journals)
• CCF A : 35+, CCF B: 20+, JCR Zone 1/2: 6
• 1 Scholar Book, 2 Book Chapters
• 45+ publications as the leading author 12
My Research Interests
• Recommender Systems• Spatial Item Recommendation (KDD’13, KDD’15, KDD’17, TKDE’16,
TKDE’17, ICDE’16, TOIS’14, TOIS’16, TIST’17, TIST’18, CIKM’15, CIKM’16, ACM Multimedia’15, DASFAA’17, DASFAA’18, WWWJ’18)
• Streaming Recommendation (VLDB’13, TOIS’16, IJCAI’17, SIGIR’18, KDD’18)
• Temporal Event-aware Recommendation (SIGMOD’14, TOIS’15)
• Long Tail Recommendation (VLDB’12)
• Semantic-Aware User Behaviour Prediction (ICDM’17)
• Joint Event-Partner Recommendation in Event-based Social Networks (ICDE’18)
• Integrating Category-Aware User Privacy Preference for Mobile App Recommendation (ICDE’17, KBS’18)
• Online Recommendation Efficiency (KDD’13, TOIS’14, TKDE’16, TOIS’16, ICDE’16, WSDM’18, KDD’18)
13
Online Recommendation Efficiency
• To support real-time recommendation response, smart retrieval algorithms + effective indexing structure
• Threshold based Algorithm (TA)
• LCARS: A Location-Content-Aware Recommender System (KDD’13)
• TA-Approximation Algorithm.
• LCARS: A Spatial Item Recommender System (TOIS’14)
• Attribute pruning-based algorithm (AP)
• Adapting to User Interest Drift for POI Recommendation (TKDE’16)
• Clustering-based branch and bound algorithm (CBB)
• Joint Modeling of User Check-in Behaviors for Real-time Point-of-Interest Recommendation (TOIS’16)
• Asymmetric Locality-sensitive hashing (ALSH)
• SPORE: A Sequential Personalized Spatial Item Recommender System(ICDE’16)
• Learning to Hash (L2H)
• Discrete Deep Learning for Fast Content-Aware Recommendation. (WSDM’18, KDD’18)14
Recommended Reading
• http://web.cs.ucla.edu/~yzsun/Tutorials.htm
• Yizhou Sun, Hongzhi Yin, Xiang Ren. Context-Rich Recommendation: Integrating Links, Text, and Spatio-Temporal Dimensions. (KDD17 Tutorial)
• Yizhou Sun, Hongzhi Yin*, Xiang Ren. Recommendation in Context-Rich Environment: An Information Network Analysis Approach. (WWW 17 Tutorial)
15
RECOMMENDED READINGS
16
Chapter: "Spatio-Temporal Recommendation in
Geo-Social Networks"
My Research Interests• User Linkage across Social Networks and Platforms
• User Name, Content, Language Features (WWW’17, Information Science’18, Future Generation Computer Systems’18, WWWJ’18)
• Spatial and Temporal Features (CIKM’2017, ICDE’18)
• Community Discovery Beyond Network Structure • User Generated Textual Content, Spatial-temporal Co-occurrence, Network
Structure (ICDE’16)
• Topic Discovery and Event Detection• Unifying discovery of user interest-related topics and event-related topics
(ICDE’13, SIGMOD’14)
• Network Embedding and Multi-Relation Learning• Adaptive and Adversarial Model Optimization Algorithms (ICDM’17,
ICDE’18, KDD’18)
• Information Diffusion and Influence Maximization in Social Network• Distinguishing re-sharing behaviours from re-creating behaviours in information
diffusion (ICDE’15, World Wide Web) 17
SPTF: A Scalable Probabilistic Tensor
Factorization Model for Semantic-Aware
Behaviour Prediction
18
Hongzhi Yin1, Hongxu Chen1, Hao Wang2
Yang Wang3 , Quoc Viet Hung Nguyen4
1The University of Queensland, 2360 Search Lab, Qihoo 360 Inc3University of New South Wales, 4Griffith University
Outlines
• Background
– Rich Interaction Behaviors with Items
• Problem Definition
– Semantic-Aware User Behavior Prediction
• Our Solution
– A Scalable Probabilistic Tensor Factorization Model
• Experiments
• Summary
19
Outlines
• Background
– Rich Interaction Behaviors with Items
• Problem Definition
– Semantic-Aware User Behavior Prediction
• Our Solution
– A Scalable Probabilistic Tensor Factorization Model
• Experiments
• Summary
20
Rich Interaction Behaviours with Items
Like
Click
Browse
Share
Favorite
Purchase
AddtoCart
Subscribe
Download
Add to
SendPin it
Visit
……
User Behaviours in Youtube
User Behaviours in Pinterest
User Behaviours in JD.COM
User Behaviours in Alibaba.com
Characteristics of User Interaction Data
• Implicit Feedback
– Only Positive Feedback is available and observed
– The unobserved user interaction behaviors
• Real negative feedback
• Potential positive feedback
• Heterogeneous Interaction Behaviors
– Different types of user behaviours imply different semantics
and user intention
– The way people interact with items is important for
understanding user intents and interests.
Characteristics of User Interaction Data
• Skewed Interaction Behavior data
– The distribution of user interaction data w.r.t. behavior types
is heavily skewed
Click87%
Add2Favorite5%
Add2Cart6%
Purchase2%
Outlines
• Background
– Rich Interaction Behaviors with Items
• Problem Definition
– Semantic-Aware User Behavior Prediction
• Our Solution
– A Scalable Probabilistic Tensor Factorization Model
• Experiments
• Summary
28
Problem Definition
• Semantic-Aware Behavior Prediction
– Given a target user 𝑢𝑖 and an action type 𝑡𝑘, we aim to
predict top-n items on which 𝑢𝑖 will perform action 𝑡𝑘.
Semantic-Aware User Behavior Prediction
• An alternative definition
– Given a target user 𝑢𝑖, we aim to predict top-n action-item pairs
(𝑡𝑘 , 𝑣𝑗) that 𝑢𝑖 will perform action 𝑡𝑘 on item 𝑣𝑗.
• What is important is not just what users interact with, but how they
interact with them
Outlines
• Background
– Rich Interaction Behaviors with Items
• Problem Definition
– Semantic-Aware User Behavior Prediction
• Our Solution
– A Scalable Probabilistic Tensor Factorization Model
• Experiments
• Summary
31
Representation of Heterogeneous Interaction Data
A set of users ; A set of items
A set of behavior types
A triple is used to represent an interaction record
All possible triples in can be grouped in a tensor
𝑦𝑖𝑘𝑗 is 1 if the triple 𝑥𝑖𝑘𝑗 is observed; otherwise it is 0;
Tucker Decomposition
Complexity of model equation is cubic in k (the dimension of latent factors)
Canonical Decomposition
Complexity of model equation is linear in k (the dimension of latent factors);
CD corresponds to TD with a static, diagonal core tensor.
Limitations of Classic TF Methods
• Cannot apply to large-scale datasets
– Treat all non-observed examples as negative examples
– The numbers of users and items are in the scale of millions or even
billions, leading to a super-big dense tensor and huge computation cost
• Limited prediction accuracy
– Some of non-observed examples are potentially positive examples
• Cannot overcome the skewness issue of user interaction behaviors
– Treat each type of observed interaction behaviors equally
• Fail to capture user and item biases
– A user tends to perform “add-to-cart” behaviours rather than “add-to-
favourite” (user bias)
– A video has received more “like” than other videos (item bias)
A Scalable Probabilistic Tensor Factorization Model
• For each triple , the probabilistic generative process is
as follows:
• The posterior distribution of the latent vectors of users, items and
behavior types is computed as follows:
A Scalable Probabilistic Tensor Factorization Model
• Objective Function
• Pairwise Interaction Factorization to implement the utility function
– Its complexity of model equation is linear with the dimension 𝐷, much lower
than Tucker Decomposition (TD)
Item BiasUser Bias
Model Optimization – Negative Sampling with SGD
• Directly optimizing the objective function is computationally
expensive, as the number of unobserved examples is cubic to the
number of users or items.
• Besides, not all unobserved examples are real negative examples.
• Inspired by the negative sampling technique proposed in word2vector
model, instead of treating all unobserved examples as negative, we
select a few most likely negative examples for model optimization.
• We propose a popularity-biased Bidirectional Negative Sampling
method to generate negative examples.
Algorithm of Training SPTF
How to sample a positive example in SGD
• User behaviours are heterogeneous in our problem, and the
distribution of positive examples w.r.t. behaviour types is
heavily skewed. In our collected T-mall dataset:
– Click behaviours: 86.58%
– Add-to-favourite: 4.93%
– Add-to-cart:5.91%
– Purchase: 2.57%
• For the widely-used uniform sampling method
– most of sampled positive examples would be associated with click
behaviours and the trained model would heavily bias towards click
behaviours.
Adaptive ranking-based sampling approach
• A desirable sampler is expected to choose adversarial positive
examples with high probabilities
– Informative at the current state of learning and more helpful to
correct the model
• An intuitive idea is that that positive examples at a lower rank
should have a higher probability to be sampled, as this kind of
positive examples are more informative and helpful to correct
the current model parameters.
Outlines
• Background
– Rich Interaction Behaviors with Items
• Problem Definition
– Semantic-Aware User Behavior Prediction
• Our Solution
– A Scalable Probabilistic Tensor Factorization Model
• Experiments
• Summary
44
Dataset and Measurement
• This dataset (T-Mall) contains 480,723 products, 10000
users and their generated twenty-million behaviour records
during 18/11/2014 – 18/12/2014.
• We adopt the Hits Ratio and MRR (Mean Reciprocal Rank)
to measure the prediction accuracy.
Comparison Method
• BPTF: A Bayesian Probabilistic Tensor Factorization (Xiong et al. SDM’10)
– Designed for rating prediction on explicit feedback datasets
– Only consider the observed examples
• RESCAL: A state-of-the-art tensor factorization model that was
proposed for factoring knowledge graph (Nickel et al. WWW’12)
– The behaviour type is represented by a matrix rather than a vector
– Only consider the observed examples
• BPR-PITF: Pairwise Interaction Tensor Factorization model
optimized by BPR-optimization framework (Rendle et al. WSDM’10)
– All unobserved examples are treated equally, and each positive example is
uniformly drawn
• BPR-SMF means applying BPR-based matrix factorization for each
type of user behaviours separately.
Experimental Results
Experimental Results
SPTF, BPR-PITF and BPR-SMF achieve much higher prediction accuracy than
RESCAL and BPTF, showing the importance of exploiting negative examples.
Both SPTF and BPR-PITF outperform BPR-SMF significantly. This demonstrates the
advantage of collective factorization over the separate factorization-based method.
BPR-SMF achieves its best prediction performance on the click behaviours, as the click
matrix is much denser than other three matrices.
The other three models achieve their highest prediction accuracy on other three types of
behaviours, as clicking behaviours provide strong signals for predicting other three
types of behaviours due to the sequential patterns of user actions on e-commerce sites.
Study of Different Sampling Strategies
Summary
• We developed a scalable probabilistic tensor factorization
model (SPTF) to predict semantic-aware user behaviours.
• To optimize/train the model of SPTF, we proposed a novel
bidirectional popularity-biased negative sampling technique to
leverage both observed and unobserved examples.
• We proposed a novel adaptive ranking-based sampling
approach to overcome the heavy skewness of the heterogeneous
behaviour data distribution w.r.t. behaviour types.
Thank you!
55