![Page 1: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/1.jpg)
User Behavior Modeling on Financial Message
BoardsPritha D.N
Sahaj BiyaniDecember 9, 2015
![Page 2: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/2.jpg)
Introduction
![Page 3: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/3.jpg)
Investors Hub
![Page 4: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/4.jpg)
Objective
• To identify the roles users assume in these message board forums.
• Validate the “90-9-1 Rule for Participation Inequality” in the message boards community.
![Page 5: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/5.jpg)
Dataset• Free US listed stocks message boards
• Time Period: January, 2001 - June, 2015
• Total Message Boards: 6,278
• Total Users: 52,558
• Total Posts: 5,624,024
![Page 6: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/6.jpg)
Dataset Analysis
• Percentage of initiated posts: 30%
• 19% of users did not initiate any post.
• 80% of users initiated less than 20 posts.
![Page 7: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/7.jpg)
Dataset Analysis
• Number of boards user participated in:
• 56% of users are active only on 1 board.
• 90% of users are limited to/ active on less than 20 boards.
![Page 8: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/8.jpg)
Dataset Analysis
• Average response time of replies a user makes:
![Page 9: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/9.jpg)
Dataset Analysis
• Number of posts across boards:
• 80% of posts made on less than 200 boards.
• 1000 out of 6278 boards account for 90% of posts made.
![Page 10: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/10.jpg)
Dataset Analysis
• Percentage of initiated posts: 30%• From the graph we infer,
• 19% of users did not initiate any post.
• 80% of users initiated less than 20 posts.
![Page 11: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/11.jpg)
Features1. Number of threads a user initiated over time2. Number of replies a user made over time3. Number of users a user replies to4. Number of users who reply to a user5. Number of boards a user is active on6. Number of followers7. Replier share , AVG[proportion of replies a user gets on a board]8. Reply share, AVG[proportion of reply a user makes on a board]9. Average Response time10. Volume of content he posted11. Number of links he has posted
Content Related
User Network StructureActivity of User
![Page 12: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/12.jpg)
Methodology
• Data Preprocessing
• Feature Selection/Extraction
• Clustering
• Role Inference
![Page 13: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/13.jpg)
Data Preprocessing
• We use Min-Max Normalization• Normalize data between [0 – 1]
![Page 14: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/14.jpg)
Feature Selection
• Step 1 – Feature Extraction • Do Principal Component Analysis• Do K-means on the projected data and extract feature labels
• Step 2 – Feature importance using Random Forest classifier
![Page 15: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/15.jpg)
Principle Component % Variance Cumulative %
Variance
1 62.16 62.16
2 15.07 77.23
3 07.95 85.18
4 05.74 90.92
5 03.57 94.49
6 01.67 96.16
7 01.48 97.64
8 00.68 98.32
9 00.59 98.91
10 00.55 99.46
11 00.54 100
Feature Extraction using PCA
Scree Plot
![Page 16: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/16.jpg)
Choosing the number of clusters
![Page 17: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/17.jpg)
Elbow Plot
• Plot the Within Group Sum of Squares versus K, and look at the “elbow-point” in the plot.
• The first clusters will add much information (explain a lot of variance), but at some point the marginal gain will drop, giving an angle in the graph.
• Choose the number after the last big drop.
• This "elbow" cannot always be unambiguously identified.
![Page 18: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/18.jpg)
Silhouette Coefficient
a(i) is the average dissimilarity of with all data within the same cluster.
b(i) is the lowest average dissimilarity of to any other cluster, of which is not a member.
![Page 19: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/19.jpg)
Feature Selection
• Train a Random Forest classifier using all the features and labels assigned by K-means.
• Feature importance is defined as the total decrease in node impurity (weighted by the probability of reaching that node ,which is approximated by the proportion of samples reaching that node) averaged over all trees of the ensemble.
![Page 20: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/20.jpg)
Clustering Users• Applied K-Means clustering with K=4.• Run 10 times with different seeds.• 300 iterations in a single run.
Clusters User Count % of UsersCluster 1 47295 91.7
Cluster 2 360 0.73
Cluster 3 3322 6.44
Cluster 4 581 1.13
![Page 21: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/21.jpg)
Cluster AnalysisInitiation of Posts by users of each cluster
Cluster 130%
Cluster 222%
Cluster 344%
Cluster 43%
Post Initiation Share
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Series10
200
400
600
800
1000
1200
Cluster 1
Cluster 2
Cluster 3
Cluster 4
10.9
1066.6
228.298.3
Initiation Per User
![Page 22: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/22.jpg)
Cluster AnalysisReplies by users of each cluster
Cluster 122%
Cluster 228%
Cluster 347%
Cluster 44%
Reply Share
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Series10
500
1000
1500
2000
2500
3000
Cluster 1
Cluster 2
Cluster 3
Cluster 4
17.5
2946.3
534.9
255.9
Reply Per User
![Page 23: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/23.jpg)
Cluster Analysis
Clust er1 Clust er 2 Clust er 3 Clust er 4
34
22
24
18
20
27
11
25
41
49
22
44
4
2
42
2
Inter-Cluster reply %Cluster1 Cluster4 Cluster2 Cluster3
![Page 24: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/24.jpg)
Cluster AnalysisFeature 3: Number of users a user replies to
![Page 25: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/25.jpg)
Cluster AnalysisFeature 4: Number of users who reply to a user
![Page 26: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/26.jpg)
Role Inference• Cluster1: Lurkers
• The post initiated per user and reply made per user ratio are very less.
• Cluster2: Super Users• Very active. Contribute most to the boards. Engage with lot of users.
• Cluster3: Contributors• Account for 45% of total post initiations, 46% of total replies made. Have a high
response time meaning they respond very fast. Backbone of the forum.
• Cluster4: Taciturns• Limited to themselves. Initiate very less but reply often mostly to users in their own
cluster.
![Page 27: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/27.jpg)
Participation Inequality
% of Users Content Contributed0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
91.73
24
0.73
26
6.44
46
1.134
Lurkers Super-Users Contributors Taciturns
![Page 28: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/28.jpg)
Conclusion
• Users take up different roles on online communities and the cluster of users can be identified by their behavioral pattern.
• Participation Inequality exists on financial message boards.
![Page 29: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/29.jpg)
Conclusion
• Users take up different roles on online communities and the cluster of users can be identified by their behavioral pattern.
• Participation Inequality exists on financial message boards as well.
![Page 30: User Behavior Modeling on Financial Message Boards](https://reader031.vdocuments.mx/reader031/viewer/2022030309/58f20a641a28ab44198b45f5/html5/thumbnails/30.jpg)
Thank You!