mining trajectory profiles for discovering user communities speaker : chih-wen chang national chiao...
TRANSCRIPT
Mining Trajectory Profiles for Discovering User
Communities
Speaker : Chih-Wen ChangNational Chiao Tung University, Taiwan
2009.11.03
Chih-Chieh Hung, Chih-Wen Chang, Wen-Chih Peng
2
Outline
• Motivation• Goal• Framework
– Preprocess– Construct User’s Profiles– Formulate Distance function– Identify Community
• Experiments• Conclusion
3
Motivation (1/2)
• Rapid development of positioning techniques, users can easily collect their trajectories– GPS Logger, smart phones and navigation
devices
4
Motivation (2/2)
• Many GPS community sites are established– Users can share their own trajectories – Users can search trajectories
My tracks
Every Trail
Query
5
Goal
• Mine user communities from raw trajectories– User Communities
• Sets of users who have similar moving behaviors
• Applications– Find new friends– Recommendation– Rank of trajectories
6
Profile Profile
Profile
Measure Distance Between UsersCommunity 2
Community 1
1. Construct User’s Profile2. Formulate distance function3. Identify users communities
7
Outline
• Motivation• Goal• Framework
– Preprocess– Construct User’s Profiles– Formulate Distance function– Identify Community
• Experiments• Conclusion
8
Framework
Preprocess
Construct User’s Profile
Measure Distance Between Users
Identify Community
9
Preprocessing• Step 1:
– Find frequent regions• Input: all trajectories of users• Output: frequent regions • Density-based approach
• Step 2: – Transform trajectories into sequences of
frequnet region id• T1 : <A, B, D>
10
Framework
Preprocess
Construct User’s Profile
Measure Distance Between Users
Identify Community
11
Construct User’s Profiles (1/2)
• User’s Profile– Probabilistic Suffix Tree (abbreviated as PST)
• Find and organize trajectory patterns• Record the probability of next movements
Frequently moving sequence
Conditional tables(next possible movements)
12
Construct User’s Profiles (2/2)
• Construct PST– Level by level– Two operations:
• Create a child node– The counts of Before symbol > MinSup
• Add a symbol into the related conditional table– The counts of After symbol > MinSup
root
A:0.5 B:0.375
A
A
B
ABEABAACBADFHJHIEDH AB:0.25
Before symbol A : 2 2/3 × 0.375 = 0.25
After symbol A : 1 1/2 = 0.5 E : 1 1/2 = 0.5
Node B
SID Count C. Prob.
A 1 0.5
E 1 0.5
ABEABAACBADFHJHIEDH
ABEABAACBADFHJHIEDH
B:0.375
MinSup = 0.2
13
Framework
Preprocess
Construct User’s Profile
Measure Distance Between Users
Identify Community
14
• Determine distance of users1. Transform the PST into Moving Sequence
ListEach element in moving sequence list is a branch of PST with their probability
Formulate Distance function (1/3)
L1 [1..2] = <[(A,0.5)],[(B,0.375)(AB,0.33)]>
15
Formulate Distance function (2/3)
2. Define the distance between PSTs−Find the minimal dist(Li[1..m], Lj[1..n])
−Use three editing operations• Insertion
L1={m1:0.3,m2:0.2,m3:0.3}
L2={m1:0.3,m2:0.2}L1={m1:0.3,m2:0.2,m3:0.3}L2={m1:0.3,m2:0.2,m3:0.3}
Insert0.2
0.1
T1 T2 Cost = 0.3
• Deletion
• Replacement
L1={m1:0.2,m2:0.2,m3:0.2}
L2={m1:0.2,m2:0.2,m3:0.2}
Replace
Formulate Distance function (3/3)
16
L1={m1:0.2,m2:0.3}
L2={m1:0.2,m2:0.3,m3:0.3}
Delete
L1={m1:0.2,m2:0.3}L2={m1:0.2,m2:0.3,____}
L1={m1:0.2,m2:0.2,m3:0.2}
L2={m1:0.2,m2:0.2,m4:0.3}
T1 T2
T1 T2
0.3 Cost = 0.3
0.2 0.3Cost = 0.3+0.2 = 0.50.2
17
Framework
Preprocess
Construct User’s Profile
Measure Distance Between Users
Identify Community
18
Identify Community (1/4)
• User community– The same community: δMLS(Ti,Tj) < thresholdδ
– The number of communities is minimal• Transform the relation between PSTs into a
graph– A vertex represents a user– An edge exists between two vertices when
δMLS(Ti,Tj) < thresholdδ O1
O2 O5O3
O4
19
Identify Community (2/4)
• Model as a minimum clique problem– A clique is a set of pair-wise adjacent vertices Example
O1
O2 O5O3
O4
20
Identify Community (3/4)
• Select a representative PST for each community– Represent all PSTs in the same community– Advantages
• Reduce the overhead of storages• Speed up query processing• Identify new users for their communities
Representative PST
Add into
?
21
Identify Community (4/4)
• Two factors1. Size of representative PST
▪ The number of tree nodes, denoted as N(Ti)
2. Distance between the selected PST and othersin the same community▪ The error sum, denoted as ES
- Sum of the distance between selected PST and others
• Representative PST– Minimize
22
Outline
• Motivation• Goal• Framework
– Preprocess– Construct User’s Profiles– Formulate Distance function– Identify Community
• Experiments• Conclusion
23
Experiments (1/4)
• Simulator Model– Use real trajectories from CarWeb to simulate
the group mobility of users• Total : 2400 trajectories
24
• Compare to General Sequential Pattern mining algorithm (GSP)– Set of sequential patterns Ex. sp1, sp2, ..., spn
– Trajectory profile of a user represented as a
– Distance function between profiles• Cosine similarity measurement, similarity(Vi, Vj) = Example
Experiments (2/4)
Similarity : <1,1,0,0> . <0,1,1,1>
|<1,1,0,0>||<0,1,1,1>| 32
1
||||||| ji
ji
VV
VV
25
Experiments (3/4)
• Impact of Trajectory Profiles
Storage
Prediction
GSP are always larger than PSTEspecially in MinSup smaller than 0.15
26
Experiments (4/4)
• Impact of the thresholdδ and MinSup– Smaller thresholdδ will find more number of
communities
Storage
Prediction
27
Outline
• Motivation• Goal• Framework
– Preprocess– Construct User’s Profiles– Formulate Distance function– Identify Community
• Experiments• Conclusion
28
Conclusion
• Explore the problem of mining communities from trajectories
Preprocess
Construct User’s Profile
Measure Distance Between Users
Identify Community
Find frequent regionsReplace trajectories by region ids
Formulate distance function
Cluster users by distance functionSelect Representative PSTs
Build probabilistic suffix tree (abbreviated as PST)
29
THANK YOU!