Download - USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

USER-CENTERED DATA ANALYTICS AND MODELING

Hongzhi YinSchool of ITEE

University of Queensland, Australia

[email protected]

May 26, 2018

mailto:[email protected]

About The University of Queensland

2

About The University of Queensland

3

The University of Queensland

4

School of ITEE

5

School of ITEE

6

My Basic Information

• Education Background

• 2009.9-2014.7, Peking University, Ph.D. in Computer Science

• Supervisor: Prof. Bin Cui (长江特聘教授)

• 2014.9-2015.12, The University of Queensland, Postdoc Research

Fellow

• Supervisor: Prof. Xiaofang Zhou (IEEE Fellow, 千人计划)

• Working Experiences

• 2016.1-2018.12: ARC DECRA Fellow (澳洲优秀青年基金)

• 2017.1-Present: Lecturer in Data Science (Continuing Position),

Deputy Director of Master of Computer Science in The

University of Queensland

• 2018.1 – Present: Chief AI Counselor in One-stop Warehouse

Company (Top-1 Wholesale distributors of Solar Products)

7

Selected Research Awards and Impact• Google Scholar Citations: 1166, H-index: 16

• My paper “LCARS: A Location-Content-Aware Recommender System” is top 1 cited paper among all KDD-13 oral papers.

• My TOIS’16 Paper “Joint modelling of user check-in behaviours for real-time point-of-interest recommendation” won 21st ACM Annual Best of Computing Award.

• Also Invited to present this work in SIGIR’18

• 2017.11, EAIT Faculty Early Career Researcher Award, The University of Queensland. (Only one winner in school of ITEE)

• 2016.1, Australia Discovery Early Career Researcher Award. (澳洲优青;

Only 6 winners in the information area across the whole Australia)

• Best Paper Award, 2016 Australian Database Conference

• 2014.7, Distinguished Doctor Degree Thesis Award, Peking University (Only 1 winner in CS department)

• 2014.5, Top-10 Distinguished Academic Fellow Award, Peking University8

Selected Research Awards and Impact

9

Publications in 2018

• 9 CCF A Conference Papers

• 4 KDD, 3 ICDE, 1 SIGIR, 1 IJCAI

• 6 CCF B Conference Papers

• 1 WSDM, 5 Dasfaa

• 1 CCF A Journal Papers

• 1 TKDE

• 4 JCR Zone 1-2 Papers

• 1 ACM TIST, 1 Information Science, 1 Knowledge-Based Systems,

1 Future Generation Computer Systems

• 4 CCF B Journal Papers

10

Graduation Ceremony of My first PhD student

11

My General Research• Data Mining

• KDD, ICDM, WSDM, TKDD and TKDE

• Database

• SIGMOD, VLDB, ICDE, VLDB J, TKDE

• Information Retrieval and Web Mining

• WWW, SIGIR, WSDM, CIKM, ACM TOIS

• Artificial Intelligence

• AAAI, IJCAI, ACM TIST

• Harnessing User Generated and Consumed Data for User-Centered Research

• Top-Tier Conferences and Journals: 60+ (21 CCF A/B SCI Journals)

• CCF A : 35+, CCF B: 20+, JCR Zone 1/2: 6

• 1 Scholar Book, 2 Book Chapters

• 45+ publications as the leading author 12

My Research Interests

• Recommender Systems• Spatial Item Recommendation (KDD’13, KDD’15, KDD’17, TKDE’16,

TKDE’17, ICDE’16, TOIS’14, TOIS’16, TIST’17, TIST’18, CIKM’15, CIKM’16, ACM Multimedia’15, DASFAA’17, DASFAA’18, WWWJ’18)

• Streaming Recommendation (VLDB’13, TOIS’16, IJCAI’17, SIGIR’18, KDD’18)

• Temporal Event-aware Recommendation (SIGMOD’14, TOIS’15)

• Long Tail Recommendation (VLDB’12)

• Semantic-Aware User Behaviour Prediction (ICDM’17)

• Joint Event-Partner Recommendation in Event-based Social Networks (ICDE’18)

• Integrating Category-Aware User Privacy Preference for Mobile App Recommendation (ICDE’17, KBS’18)

• Online Recommendation Efficiency (KDD’13, TOIS’14, TKDE’16, TOIS’16, ICDE’16, WSDM’18, KDD’18)

13

Online Recommendation Efficiency

• To support real-time recommendation response, smart retrieval algorithms + effective indexing structure

• Threshold based Algorithm (TA)

• LCARS: A Location-Content-Aware Recommender System (KDD’13)

• TA-Approximation Algorithm.

• LCARS: A Spatial Item Recommender System (TOIS’14)

• Attribute pruning-based algorithm (AP)

• Adapting to User Interest Drift for POI Recommendation (TKDE’16)

• Clustering-based branch and bound algorithm (CBB)

• Joint Modeling of User Check-in Behaviors for Real-time Point-of-Interest Recommendation (TOIS’16)

• Asymmetric Locality-sensitive hashing (ALSH)

• SPORE: A Sequential Personalized Spatial Item Recommender System(ICDE’16)

• Learning to Hash (L2H)

• Discrete Deep Learning for Fast Content-Aware Recommendation. (WSDM’18, KDD’18)14

Recommended Reading

• http://web.cs.ucla.edu/~yzsun/Tutorials.htm

• Yizhou Sun, Hongzhi Yin, Xiang Ren. Context-Rich Recommendation: Integrating Links, Text, and Spatio-Temporal Dimensions. (KDD17 Tutorial)

• Yizhou Sun, Hongzhi Yin*, Xiang Ren. Recommendation in Context-Rich Environment: An Information Network Analysis Approach. (WWW 17 Tutorial)

15

http://web.cs.ucla.edu/~yzsun/Tutorials.htm

RECOMMENDED READINGS

16

Chapter: "Spatio-Temporal Recommendation in

Geo-Social Networks"

My Research Interests• User Linkage across Social Networks and Platforms

• User Name, Content, Language Features (WWW’17, Information Science’18, Future Generation Computer Systems’18, WWWJ’18)

• Spatial and Temporal Features (CIKM’2017, ICDE’18)

• Community Discovery Beyond Network Structure • User Generated Textual Content, Spatial-temporal Co-occurrence, Network

Structure (ICDE’16)

• Topic Discovery and Event Detection• Unifying discovery of user interest-related topics and event-related topics

(ICDE’13, SIGMOD’14)

• Network Embedding and Multi-Relation Learning• Adaptive and Adversarial Model Optimization Algorithms (ICDM’17,

ICDE’18, KDD’18)

• Information Diffusion and Influence Maximization in Social Network• Distinguishing re-sharing behaviours from re-creating behaviours in information

diffusion (ICDE’15, World Wide Web) 17

SPTF: A Scalable Probabilistic Tensor

Factorization Model for Semantic-Aware

Behaviour Prediction

18

Hongzhi Yin1, Hongxu Chen1, Hao Wang2

Yang Wang3 , Quoc Viet Hung Nguyen4

1The University of Queensland, 2360 Search Lab, Qihoo 360 Inc3University of New South Wales, 4Griffith University

Outlines

• Background

– Rich Interaction Behaviors with Items

• Problem Definition

– Semantic-Aware User Behavior Prediction

• Our Solution

– A Scalable Probabilistic Tensor Factorization Model

• Experiments

• Summary

19

Outlines

• Background




• Our Solution


• Experiments

• Summary

20

Rich Interaction Behaviours with Items

Like

Click

Browse

Share

Favorite

Purchase

AddtoCart

Subscribe

Download

Add to

SendPin it

Visit

……

User Behaviours in Youtube

User Behaviours in Pinterest

User Behaviours in JD.COM

User Behaviours in Alibaba.com

Characteristics of User Interaction Data

• Implicit Feedback

– Only Positive Feedback is available and observed

– The unobserved user interaction behaviors

• Real negative feedback

• Potential positive feedback

• Heterogeneous Interaction Behaviors

– Different types of user behaviours imply different semantics

and user intention

– The way people interact with items is important for

understanding user intents and interests.

Characteristics of User Interaction Data

• Skewed Interaction Behavior data

– The distribution of user interaction data w.r.t. behavior types

is heavily skewed

Click87%

Add2Favorite5%

Add2Cart6%

Purchase2%

Outlines

• Background




• Our Solution


• Experiments

• Summary

28

Problem Definition

• Semantic-Aware Behavior Prediction

– Given a target user 𝑢𝑖 and an action type 𝑡𝑘, we aim to

predict top-n items on which 𝑢𝑖 will perform action 𝑡𝑘.

Semantic-Aware User Behavior Prediction

• An alternative definition

– Given a target user 𝑢𝑖, we aim to predict top-n action-item pairs

(𝑡𝑘 , 𝑣𝑗) that 𝑢𝑖 will perform action 𝑡𝑘 on item 𝑣𝑗.

• What is important is not just what users interact with, but how they

interact with them

Outlines

• Background




• Our Solution


• Experiments

• Summary

31

Representation of Heterogeneous Interaction Data

A set of users ; A set of items

A set of behavior types

A triple is used to represent an interaction record

All possible triples in can be grouped in a tensor

𝑦𝑖𝑘𝑗 is 1 if the triple 𝑥𝑖𝑘𝑗 is observed; otherwise it is 0;

Tucker Decomposition

Complexity of model equation is cubic in k (the dimension of latent factors)

Canonical Decomposition

Complexity of model equation is linear in k (the dimension of latent factors);

CD corresponds to TD with a static, diagonal core tensor.

Limitations of Classic TF Methods

• Cannot apply to large-scale datasets

– Treat all non-observed examples as negative examples

– The numbers of users and items are in the scale of millions or even

billions, leading to a super-big dense tensor and huge computation cost

• Limited prediction accuracy

– Some of non-observed examples are potentially positive examples

• Cannot overcome the skewness issue of user interaction behaviors

– Treat each type of observed interaction behaviors equally

• Fail to capture user and item biases

– A user tends to perform “add-to-cart” behaviours rather than “add-to-

favourite” (user bias)

– A video has received more “like” than other videos (item bias)

A Scalable Probabilistic Tensor Factorization Model

• For each triple , the probabilistic generative process is

as follows:

• The posterior distribution of the latent vectors of users, items and

behavior types is computed as follows:

A Scalable Probabilistic Tensor Factorization Model

• Objective Function

• Pairwise Interaction Factorization to implement the utility function

– Its complexity of model equation is linear with the dimension 𝐷, much lower

than Tucker Decomposition (TD)

Item BiasUser Bias

Model Optimization – Negative Sampling with SGD

• Directly optimizing the objective function is computationally

expensive, as the number of unobserved examples is cubic to the

number of users or items.

• Besides, not all unobserved examples are real negative examples.

• Inspired by the negative sampling technique proposed in word2vector

model, instead of treating all unobserved examples as negative, we

select a few most likely negative examples for model optimization.

• We propose a popularity-biased Bidirectional Negative Sampling

method to generate negative examples.

Algorithm of Training SPTF

How to sample a positive example in SGD

• User behaviours are heterogeneous in our problem, and the

distribution of positive examples w.r.t. behaviour types is

heavily skewed. In our collected T-mall dataset:

– Click behaviours: 86.58%

– Add-to-favourite: 4.93%

– Add-to-cart:5.91%

– Purchase: 2.57%

• For the widely-used uniform sampling method

– most of sampled positive examples would be associated with click

behaviours and the trained model would heavily bias towards click

behaviours.

Adaptive ranking-based sampling approach

• A desirable sampler is expected to choose adversarial positive

examples with high probabilities

– Informative at the current state of learning and more helpful to

correct the model

• An intuitive idea is that that positive examples at a lower rank

should have a higher probability to be sampled, as this kind of

positive examples are more informative and helpful to correct

the current model parameters.

Outlines

• Background




• Our Solution


• Experiments

• Summary

44

Dataset and Measurement

• This dataset (T-Mall) contains 480,723 products, 10000

users and their generated twenty-million behaviour records

during 18/11/2014 – 18/12/2014.

• We adopt the Hits Ratio and MRR (Mean Reciprocal Rank)

to measure the prediction accuracy.

Comparison Method

• BPTF: A Bayesian Probabilistic Tensor Factorization (Xiong et al. SDM’10)

– Designed for rating prediction on explicit feedback datasets

– Only consider the observed examples

• RESCAL: A state-of-the-art tensor factorization model that was

proposed for factoring knowledge graph (Nickel et al. WWW’12)

– The behaviour type is represented by a matrix rather than a vector

– Only consider the observed examples

• BPR-PITF: Pairwise Interaction Tensor Factorization model

optimized by BPR-optimization framework (Rendle et al. WSDM’10)

– All unobserved examples are treated equally, and each positive example is

uniformly drawn

• BPR-SMF means applying BPR-based matrix factorization for each

type of user behaviours separately.

Experimental Results

Experimental Results

SPTF, BPR-PITF and BPR-SMF achieve much higher prediction accuracy than

RESCAL and BPTF, showing the importance of exploiting negative examples.

Both SPTF and BPR-PITF outperform BPR-SMF significantly. This demonstrates the

advantage of collective factorization over the separate factorization-based method.

BPR-SMF achieves its best prediction performance on the click behaviours, as the click

matrix is much denser than other three matrices.

The other three models achieve their highest prediction accuracy on other three types of

behaviours, as clicking behaviours provide strong signals for predicting other three

types of behaviours due to the sequential patterns of user actions on e-commerce sites.

Study of Different Sampling Strategies

Summary

• We developed a scalable probabilistic tensor factorization

model (SPTF) to predict semantic-aware user behaviours.

• To optimize/train the model of SPTF, we proposed a novel

bidirectional popularity-biased negative sampling technique to

leverage both observed and unobserved examples.

• We proposed a novel adaptive ranking-based sampling

approach to overcome the heavy skewness of the heterogeneous

behaviour data distribution w.r.t. behaviour types.

Thank you!

55

Download - USER-CENTERED DATA ANALYTICS AND MODELINGnet.pku.edu.cn/daim/hongzhi.yin/slides/invited_talk.pdfMy General Research •Data Mining •KDD, ICDM, WSDM, TKDD and TKDE •Database •SIGMOD,

Top Related