[ieee 2012 international conference on advances in social networks analysis and mining (asonam 2012)...
TRANSCRIPT
User interests modeling in online forumsNa Ni and Yaodong Li Institute of Automation, Chinese Academy of Science
95 Zhongguancun East Road, Beijing, China
Email: {na.ni, yaodong.li}@ia.ac.cn
Abstract—This paper studies the problem of user modeling inonline forums from a personality viewpoint. A novel hierarchicaluser profiling mechanism is proposed, which utilizes the user-generated content, the reply relations among users and thetopics of the discussions. The hierarchical model represent theusers’ interests across different topics. The obtained user profilesare applied to three forum-related tasks: new discussions rec-ommendation, external news articles recommendation and userretrieval. The experimental results show that, comparing withthe traditional methods, the hierarchical user profiling approachachieves a better performance in all three tasks.
Keywords-online forum; user modeling; hierarchical model
I. INTRODUCTION
The online discussion or forum is a type of social media
promoted by Web 2.0, where people gather together and
discuss a specific topic in depth they are interested in. In
forums like Digg1, people participate in discussions to discover
some valuable information or share their minds and knowledge
with other people. In this type of forums, it is meaningful to
find out what the users are interested in.Existing work on user modeling could be classified into
content-based methods and collaborative filtering methods.
These methods have been used to facilitate the personalized
search or recommendation in some social media such as
newsgroups, blog and microblog. But the online forum has
its own characteristics. The content in online forums usually
has more noises and the users’ relations in online forum are
not as tight as that in blog or microblog. Meanwhile, existing
works on user modeling for discussions are restricted to some
specific tasks such as thread recommendation.In online forums, a user’s interests are reflected via the
contents generated by him, the users he has exchanged opin-
ions with and the topics of discussions he has participated in.
Using the above information, a hierarchical user profile ap-
proach is proposed to model the differences of users’ interests
among different topics. This model contains two layers: cross-
domain layer and inner-domain layer. The cross-domain layer
describes a user’s interest across different domains and the
inner-domain layer describes a user’s interest within a specific
domain. To evaluate the effectiveness of this approach, the
hierarchical user profiles are applied to three online forums
related tasks: new discussions recommendation, external news
articles recommendations and user retrieval. The framework
of this study is shown in Figure 1.
II. METHODOLOGY
As is shown in Figure 1, the hierarchical user model
proposed in this paper has two layers. In the cross-domain
1http://digg.com/
Fig. 1: User modeling framework
layer, a user’s model is viewed as a distribution across the
domains that he is interested in, which could be represented
as PC (u) = {(ci, p(ci|u))| ci ∈ C}. Applying the Bayesian
model:
p (cj |u) = p(u|cj)p(cj)/p(u) (1)
where p(cj) is the probability of cluster cj in the training set.
We estimate the probability p(u|cj) of a user appeared in
a given domain based on the directed relation among users in
that domain. A user could reply to (RT) or be replied by (RB)
other users in a discussion. Using these directed relations, we
could build two directed graphs Grt and Grb on users for a
given domain. In the graphs, each user corresponds to a vertex
and the directed edge Grt(i, j) or Grb(i, j) denotes the times
that user i has replied to user j or the times he has been
replied by user j in the target domain. In graph Grt, a user
has higher probability on the cluster if he has replied to more
users. While in graph Grb he will achieve higher probability
if he has been replied by more users. The PageRank algorithm
[1] is adopted on the directed graph Grt and Grb to analyze
the users’ probability p(u|cj) of a given domain.
p(u|cj) = δ
N+ (1− δ)
∑
u′∈Ucj
rank(u′, cj)G(u, u′)
(2)
in which, δ is a damping factor, Ucj is the user collection of
domain cj , and N is the number of users in Ucj . Analogously,
the probability p(u) of a candidate user appearing in the entire
collection could be also calculated by PageRank algorithm on
the directed graph built by the replying relations among users
in the entire collection.
In the inner domain layer, the profile PCW (u|cj) ={(wi, p(w|θu, cj))| wi ∈ V } of a user u in domain cj is
calculated by the contents of posts that he has published within
that domain. The value of p(w|θu, cj) is estimated using the
language model in this paper.
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4799-2/12 $26.00 © 2012 IEEE
DOI 10.1109/ASONAM.2012.122
729
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
978-0-7695-4799-2/12 $26.00 © 2012 IEEE
DOI 10.1109/ASONAM.2012.122
708
III. APPLICATIONS AND EXPERIMENTS
A. Data set
We collect 13,872 discussions containing 64,374 users from
an online discussion forum: Digg, during Nov. to Dec., 2011.
The discussions have been classified to some domains manu-
ally, which will be used directly in building the hierarchical
user model. To verify the effectiveness of our method, we
also implement some existing user modeling methods such as
content-based(“CON”) method, relation-based method(based
on the “reply to”, “reply by” and “co-occurrence” relations
between two users, which are denoted as “CF-RT”, “CF-RB”
and “CF-CO”) and method proposed in [2](“CON+C+G”).
B. New discussions recommendation
When the first post dr of a new discussion is given, we
classify the new discussion to a domain of the training set
according to dr and compute a user’s interest in it with
p(dr, cd|u) = p(dr|cd, u)p(cd|u). When the author of a new
discussion is given, the discussion will be recommended to
other users via relation-based user model. Mean Reciprocal
Rank (MRR) and Precision@N are used as the evaluation
metrics. The performances of our methods and the contrasting
approaches are shown in table I. Both of the hierarchical
approache works better than all the other methods adopt in this
task: “CON”, “CON+C+G” and relation-based user models.
TABLE I: New discussions recommendation results
methods P@1 P@5 P@10 P@20 P@30 MRR
CON 0.1048 0.1036 0.0935 0.0801 0.0709 0.2364
CON+C+G 0.1036 0.1099 0.0842 0.0811 0.0691 0.2367
CF-RT 0.0248 0.0288 0.0318 0.0313 0.0300 0.0803
CF-RB 0.0225 0.0284 0.0306 0.0322 0.0300 0.0795
CF-CO 0.0248 0.0297 0.0311 0.0293 0.0276 0.0779
H-RB 0.1149 0.1153 0.1011 0.0827 0.0723 0.2500
H-RT 0.1214 0.1162 0.1158 0.1066 0.0908 0.2655
C. News articles recommendation
The external news articles is classified into the existing
domains of discussions using SVM first. Given a user u,
the candidate articles are ranked according to the value
cos(p(w|dnew, cd), p(w|θu)) · p(cd|u). In our experiment, for
the users in training set, we collect the source articles of the
discussions they have dug but not participated in. These news
articles will be used as the ground truth. About 4 articles
are downloaded for each of the 1,152 users, and a 4,580
news recommendation list is obtained. Table II shows the
result obtained by our method “H-RT” and the contrasting
approaches: “CON” and “CON+C+G”.
D. User retrieval
We use the titles of the discussions in training set to generate
30 queries for user retrieval task. The average length of query
is 5.67 words and the average number of relevant users of each
TABLE II: News articles recommendation results
methods P@1 P@5 P@10 P@20 P@30 MRR
CON 0.2483 0.1100 0.0733 0.0500 0.0396 0.3301
CON+C+G 0.2440 0.1083 0.0731 0.0498 0.0403 0.3245
H-RT 0.2683 0.1161 0.0781 0.0514 0.0414 0.3475
query is 14.83. In the situation that given a query q and the
target domain cq of results, the domain of the title is considered
as the target domain. In this task, the evaluation metrics are
Mean Average Precision (MAP)and Precision@N. The results
of user retrieval using different user models are shown in Table
III. From the table, we find when the target domain is assigned,
using the hierarchical model, the precision of retrieval results
is improved. This indicates that, we could use the hierarchical
user model to achieve a better retrieval results when the target
domain is assigned.
TABLE III: User retrieval results
methods P@1 P@5 P@10 P@20 P@30 MAP
CON 0.3000 0.2733 0.2500 0.1850 0.1544 0.4186
CON+C+G 0.3333 0.2800 0.2567 0.1883 0.1622 0.4334
H-RT 0.3667 0.3067 0.2677 0.1833 0.1611 0.4462
The experimental results show that, (1) in the task of new
threads recommendation, the relation-based model does not
achieve good performance. It reveals that the relations among
users in online forums are formed by discussions temporarily,
which is not as tight as other social media. (2) the hierarchical
user profile proposed in this paper performs best in all of the
three tasks. This demonstrates that a user’s interests usually
focus on more than one domain. Thus, modeling a user
hierarchically is reasonable and proved to be effective.
IV. CONCLUSION
A novel hierarchical user modeling approach is proposed to
model the interests of users in online forums. The method is
implemented on three tasks related to online forums: new dis-
cussions recommendation, external news articles recommenda-
tion and user retrieval. The experimental results demonstrate
that the users’ interests in online forum can be better modeled
by utilizing information such as the user-generated contents,
reply relationships among users and the domains that the
discussions belongs to. Especially, the proposed hierarchical
user modeling approach outperforms traditional methods in all
three tasks. The influence of temporal information on users’
profiles is needed to consider in future work.
ACKNOWLEDGMENT
This work is sponsored by NSFC (under grant 61072084).
REFERENCES
[1] L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank CitationRanking: Bringing Order to the Web. Technical report, Stanford InfoLab,1999.
[2] G.-R. Xue, J. Han, Y. Yu, and Q. Yang. User language model forcollaborative personalized search. ACM Trans. Inf. Syst., 27:11:1–11:28,March 2009.
730709