web profiling shannon 2006
TRANSCRIPT
-
8/9/2019 Web Profiling Shannon 2006
1/19
User profiling on the Web based on deepknowledge and sequential questioning
Silvano MussiCILEA (Interuniversity Consortium for Information and Communication Technologies), via R.
Sanzio 4, 20090 Segrate-Mi, Italy
E-mail: [email protected]
Abstract: User profiling on the Web is a topic that has attracted a great number of technological approachesand applications. In most user profiling approaches the website learns profiles from data implicitly acquired from
user behaviours, i.e. observing the behaviours of users with a statistically significant number of accesses. This
paper presents an alternative approach. In this approach the website explicitly acquires data from users, user
interests are represented in a Bayesian network, and user profiles are enriched and refined over time. The profile
enrichment is achieved through a sequential asking algorithm based on the value-of-information theory using the
Shannon entropy concept. However, what mostly characterizes the approach is the fact that the user is involved in
a collaborative process of profile building. The approach has been tried out for over a year in a real application.
On the basis of the experimental results the approach turns out to be particularly suitable for applications where
the website is strongly based on deep domain knowledge (as for example is the case for scientific websites) and has
a community of users that share the same domain knowledge of the website and produce a low number ofaccesses (low compared to the high number of accesses of a typical commercial website). After presenting the
technical aspects of the approach, we discuss the underlying ideas in the light of the experimental results and the
literature on humancomputer interaction and user profiling.
Keywords: user profiling, deep knowledge, value of information, Bayesian networks, sequential asking,Shannon entropy
1. Introduction
In order for a website to be able to be collabora-
tive with its users (e.g. to be adaptive (Billsus
et al., 2002; Brusilovsky & Maybury, 2002), toexhibit a personalized behaviour (Fink et al.,
2002), to produce personalized alerting (Horvitz
et al., 1999) etc.), it is necessary that the website
knows the user interests at a sufficiently fine level
of refinement. In other words, the website has to
acquire the user profile. This paper presents an
approach to the acquisition and construction of
user profiling. A plain approach to this problem
consists in making the website directly acquire
from the user all the information it needs, e.g. byasking the user to fill in a sort of questionnaire
such as: Are you interested in the topic X? If yes,
to what extent? And so forth, for each possible
topic X. However, such an approach could result
in bothering the user both because there are toomany questions and because some questions
might turn out to be not very easy for the user.
In order to overcome such a difficulty a website
has to be able to infer hypotheses about multiple
user interests on the basis of little and simple
information easily provided by the user, i.e.
information concerning simple facts1 rather than
information entailing a certain mental effort
Article _____________________________
1If we consider the medical domain, simple facts provided bythe patient are, for example, I have fever, I have a pain inmy stomach etc.
-
8/9/2019 Web Profiling Shannon 2006
2/19
from the user (e.g. information resulting from
mental activities like judgment, introspection,
abstraction, reasoning etc.). However, it may
happen that after these initial questions the
information collected is not sufficient to infer
user interests with a low margin of uncertainty.
A website has therefore to consider the possibi-
lity of asking additional questions, questions
whose answers may be, in general, not very easy
to provide. In other words, because these ques-
tions are not as easy as the initial ones, there is
the possibility that a user may feel bothered
when he=she is asked them. The questionsshould therefore be asked only if it is necessary,
and they cannot be asked all and indistinctly at
the same time, given that each question entails a
trade-off between expected benefit and expected
bother. To solve such a problem, the paradigm
of sequential asking is proposed and applied.
The paper is organized as follows. Section 2
introduces, as a background topic, the role of
deep knowledge in user profiling, while Section 3
introduces, as a foreground topic, the role of
sequential asking in user profiling. Section 4
presents the technical aspects of the proposal
focusing on the paradigm of sequential asking.
Section 5 illustrates a significant experimentalapplication, while Section 6 presents the experi-
mental results and give some remarks on them.
In Section 7 some strengths of the proposed
approach are pointed out. Section 8 discusses
the proposal in the context of humancomputer
interaction research and related work. Finally,
Section 9 draws some conclusions.
2. Background: deep-knowledge-based user
profiling
Inferring user interests starting from acquired
facts may be carried out by using a deep-
knowledge-based approach. In the deep-knowl-
edge-based approach to user profiling, deep
knowledge is represented by causal knowledge
linking user facts to user interests and organizing
the user interests themselves in causal paths. In
practice, user interests coincide with the topics ofthe domain knowledge. Let us notice that, in
order to be realistic, interests inference cannot
avoid taking uncertainty into account. As a
consequence, Bayesian networks (Pearl, 1988;
Jensen, 2001) turn out to be an ideal tool to
represent deep knowledge and to reason with it
under uncertainty. A site provided with deep
knowledge does not need a statistically signifi-
cant number of accesses before exhibiting a
personalized behaviour, as is required from
similarity-based approaches. Similarity-based
approaches like content-based or collaborative
filtering approaches (Hirsh et al., 2000) are, in a
sense, based on shallow knowledge: an item is
supposed to be relevant for the user if its content
is similar to the content of the items he=shepreferred in the past (content-based approach)
or if other users with a similar taste preferred it
in the past (collaborative filtering approach). So,
a site using these approaches does not know the
deep reasons why a user prefers an item it lacks
deep knowledge, i.e. it lacks causal knowledge
linking the user preferences (effects) to the user
goals (causes).
Turning to Bayesian networks, let us face the
problem of building the Bayesian network of the
interests of a user working, or, more generally,
operating, in a well-defined domain of real life,characterized by precise domain knowledge. Let
us think, for example, of the medical domain, the
computer science domain etc. Inside a domain of
knowledge (for short, in the following, domain),
multiple topics may be identified. Let us adopt
the basic assumption that domain topics repre-
sent possible user interests (in the respective
topics). In general, working in a certain field
involves pursuing certain goals, and pursuing a
certain goal involves using certain tools. So wehave topics concerning fields (field-topics), goals
(goal-topics) and tools (tool-topics). As a con-
sequence, being interested in a certain field-topic
involves being interested in certain goal-topics,
and being interested in a certain goal-topic
involves being interested in certain tool-topics.
We have therefore implicitly built a causal
network of interests. What we need now is to
quantify uncertainty. So let us ask the domain
expert to provide conditional probabilities, i.e.P(interest in topic X| interest in topic Y). We
-
8/9/2019 Web Profiling Shannon 2006
3/19
have therefore obtained the Bayesian network of
interests in the domain topics for an abstract user.
In order to be able to personalize the network to a
specific user we need to add user-facts, i.e. facts
that affect interests or are consequences of
interests. Among user-facts let us distinguish
between facts concerning tools (tool-facts) and
facts concerning fields (field-facts). A tool-fact
concerns the use of a tool. For example, the user
uses the software S is a tool-fact. A tool-fact is
caused by the interest in the related tool, e.g. the
fact that the user uses S results from the fact that
the user is interested in S. Let us ask the expert
to provide the probability of using a tool given
the interest in it, e.g. P(the user uses the tool S |
the user is interested in the tool S).2 A tool-fact
plays the role of a symptom of the presence (in
the mind of the user) of interest in the related
tool-topic. A field-fact concerns a user context.
For example, the user works in the field of
software development is a field-fact. A field-fact
affects the related field-interest, e.g. working in
the field of software development affects interest
in the field of software development. Let us ask
the expert to provide the probability that a user
is interested in a field given the fact that the user
works in a certain field, e.g. P(the user is in-terested in the field of software development |
the user works in the field of software develop-
ment). A field-fact plays the role of a situation
conditioning the probability of the presence (in
the mind of the user) of interest in the related
field-topic (it is the case of anamnesis in the me-
dical field). In the following, for short, interests
in field-topics, goal-topics and tool-topics will be
called field-interests, goal-interests and tool-
interests respectively. Let us now resume theconsiderations made so far by representing in
Figure 1 the conceptual structure of the user-
interests network. For simplicity let us consider
two states for each node, i.e. y (yes state), n
(no state). See Mussi (2003) for more details on
a knowledge-based approach to user profiling.
3. Foreground: question-asking-based user
profiling
Among user-facts we can distinguish between
facts provided by the user without any effort at
all, i.e. facts definitely easy to provide (for short,
easy-facts), and facts provided by the user after a
certain mental process, a certain reasoning ef-
fort, i.e. facts less easy to provide (for short, less-
easy-facts). For simplicity let us limit our analy-
sis to consider less-easy-facts concerning tools(i.e. tool-facts less easy to provide). For example,
let us consider the question Do you use X?. The
answer is easy if X is an object such as for ex-
ample a certain computer or a certain software
product etc., but it might not be so easy if X
denotes an approach to a problem or a frame-
work etc. In fact the user might have developed a
solution that is not just a mere instantiation of a
well-known approach but, for example, takes
some ideas from a certain approach, some otherideas from other approaches, and then intro-
tool_fact
tool_interest
goal_interest
field_interest
field_fact
tool_fact_1
tool_interest
goal_interest
field_interest
field_fact_1
Figure 1: Conceptual structure of the user-
interests network. The pure interests network
(represented by nodes of type field-interest, goal-
interest, tool-interest) has been enriched with
nodes of type field-fact and tool-fact. The figure
shows an abstract network, with only two nodes
for each type. Figure 3 will show a real instance.
2Let us note that in real-world applications the presence ofinterest in a tool is not sufficient to make the user use it,especially in cases in which the user has at his=her disposal
possible alternative tools for pursuing the same goal.
-
8/9/2019 Web Profiling Shannon 2006
4/19
duces some original variants.3 On the basis of
these considerations we conclude that a website
cannot avoid considering the possibility that a
user be bothered by having to answer a question
that entails a certain mental effort (reasoning,
analysis, assessment etc.). So, whereas easy-facts
may be asked without restrictions, less-easy-
facts should be asked only if it is worth doing it,
taking into account the probability of bothering
the user. In the following, a question for acquir-
ing an easy-fact will be called, for short, an easy-
question, whereas a question for acquiring a
less-easy-fact will be called a less-easy-question.
The probability of bothering the user because of
a less-easy-question is affected by the topic of the
question itself. If the question concerns a topic
the user is not interested in, there is a higher
probability that the user gets bothered by the
question. The fact that the user is not interested
in the question topic might be due to the fact that
the field he=she works in does not concern thattopic. In such a case he=she might perceive asense of logical discontinuity with his=her pre-viously entered easy-facts and might not know
that topic very well or even at all. So in conclus-
ion he=she is more likely to get bothered if the
question concerns a topic he=she is not inter-ested in. These considerations prompt us to
consider a bother-effect network beside the user-
interests network. The bother-effect network
simply consists of three nodes (each one with
two states y, n): bother, bother_af, interest-
ed. Let us examine their meanings and how they
are connected. The bother node (which stands
for the user is bothered) and the bother_af
node (which stands for after the question the
user is bothered) represent the current status(bothered or non-bothered) of the current user
respectively before and after a question is asked.
They are therefore linked by the causeeffect
relation: bother ! bother_af. The interest-ed node (which stands for the user is interested
in the topic of the question) plays the role of a
conditioner node for the conditional probability
table of the bother_af node. In other words, it
modulates the probability that the user is in the
state bother after the question. So, in formal
terms, we have that P(bother_af y | bother n,interested y) is lower than P(bother_af y |bother n, interested n).4 Let us now resumethe considerations made so far by representing
in Figure 2 the structure of the bother-effect
network.
4. Enriching and refining user profiles
The website starts the profile building process by
asking the user to enter a set of easy-facts and
then by propagating them through the user-
interests network, inferring in this way the user
profile, or in other words inferring, for each
topic, the probability that the interest in the
topic is present in the users mind. At this stage
the question arises: is the inferred profile suffici-
ently defined? That is, have the user-interestsbeen inferred with a sufficiently low uncertainty
margin? In practice, is it necessary or not to ask
one or more less-easy-questions in order to
capture further information from the user and
infer a more accurate profile? The answer is that
less-easy-questions should be asked only if it is
worth doing it. Let us define the concept of it
is worth doing it. In a broad sense, the purpose
of asking questions is to know more about
interested
bother
bother_af
Figure 2: Structure of the bother-effect network.
3The distinction between easy-facts and less-easy-facts istypical in the medical domain. In fact, information providedby the patient at the beginning of a medical examination is aneasy-fact, whereas information coming from a clinical test isa less-easy-fact. For example, the answer to the question Do
you have fever? is an easy-fact, but the answer to the questionIs the bilirubin level in your blood normal? is a less-easy-fact(you have to undergo a blood test, you have to pay for it, ittakes some time etc.). 4Obviously, P(bother_af y | bother y) 1.
-
8/9/2019 Web Profiling Shannon 2006
5/19
user-interests, or, more precisely, to decrease
uncertainty about user-interests. In order to
simplify the problem we wonder if among all the
user-interests there is a subset of interests for
which the margin of uncertainty is particularly
important to be low. To this end let us notice
that, since goals are at the basis of user actions,
goal-interest nodes (i.e. goal-topics) are strate-
gic, and as a consequence we aim at having
low uncertainty about these nodes. So a less-
easy-question is worth asking if, in spite of the
unavoidable risk of bothering the user, it is
expected to produce a significant decrease of
uncertainty about the goal-interest nodes. Un-
certainty about the states of a node is well
represented by the concept of entropy (Shannon
& Weaver, 1949), which is the hub concept of
information theory. So a less-easy-question is
worth asking if the expected decrease in entropy
of the goal-interest nodes is significant enough to
compensate the unavoidable risk of bothering
the user. The next section will formally define the
problem in the decision theory framework (von
Winterfeldt & Edwards, 1986) based on entropy.
4.1. Benefit expected from a less-easy-questionLet G be a node and let g denote a state ofG, for
short g 2 G. The entropy of G is given by
ENTG X
g2G
Pg log2Pg 1
Let us use the value function ENT which in-creases with preference. Let Q be the set of less-
easy-questions and let q be a less-easy-question,
i.e. q 2 Q. Let Jq be the set of possible answers toq, and let jq be an answer to q, i.e. jq 2 Jq. Thevalue, with respect to the node G, of the
information jq is given by VG(jq):
VGjq ENTGjjq 2
The expected value, with respect to the node G,
of the question q is given by EVG(q):
EVGq X
jq2JqVGjq Pjq 3
So the expected benefit of q, with respect to the
node G, is given by
EBGq EVGq ENTG 4
Let us consider a set of nodes G. If we assume
that each node has the same weight of importan-ce, the overall expected benefit from a question q
is given by
EBq X
G
EBGq 5
Let us now turn back to the user-interests
network. Let G denote a goal-interest node. For
each question q 2 Q let the set Jq of the possibleanswers be {yes, no}. If the user answers yes
(no) to q, we set the related tool-fact node to y
(n). For example, let us consider the less-easy-
question Do you use X? and the related tool-
fact node use_X (standing for the user uses
X). If the user answers yes, use_X is set to the
state y; if he=she answers no, use_X is set tothe state n. So (2) becomes
VGDo you use X? yes
ENTGjuse X y 20
VGDo you use X? no
ENTGjuse X n 200
Moreover, the probability of using X given the
interest in X represents the probability of
obtaining from the user the answer yes to the
question Do you use X? given the interest in X,
i.e. we do not consider lying. So (3) becomes
EVGq ENTGjuse X y Puse X y ENTGjuse X n Puse X n
30
4.2. Bother-effect expected from a less-easy-
question
From the user-interests network we can obtain,
through (5), the expected benefit of the question
q: Do you use X?. However, in order to assess if
q is worth asking or not, we have to take intoaccount the probability that, after q is asked,
-
8/9/2019 Web Profiling Shannon 2006
6/19
the mental state of the user is bothered (because
of q). In order to properly calculate this
probability we have to take into account two
influence factors: the first concerns a sort of
cumulative effect in cases of multiple questions,
and the second concerns the influence of the
question topic. Let us examine the first factor.
The probability that the user is bothered
because of q is higher after the second question
than after the first, and so forth. Let us model
this cumulative effect in the following way. Let
us initially start with P(bother y) 0; theneach time a question is asked let us replace the
probability distribution of the bother node with
that of the bother_af node. Let us now pass to
examine the second factor.
The probability that the user is interested in
the question topic is given by the probability
distribution of the related tool-interest node
(Figure 1). So, given the question q, Do you use
X?, let us replace the prior probability distribu-
tion of the node interested (Figure 2) with the
probability distribution of the tool-interest node
the user is interested in X (Figure 1). After
having assigned the proper probability distribu-
tions to the nodes bother and interested, let us
propagate. The expected bother-effect of q istherefore given by the value ofP(bother_af y)resulting from the propagation. Finally, let us
note that P(bother_af y)>P(bother y).
4.3. Profit expected from a less-easy-question
In order to obtain a single number quantifying
the opportunity of asking a less-easy-question
q, we have to combine, in terms of utility,
the expected benefit (entropy decrease) with theexpected bother-effect of q. Let us consider
the utility function of the entropy of a node G. If
the entropy ofG is minimum, i.e. 0, the utility is
maximum, i.e. 1. If the entropy of G is maxi-
mum, i.e. 1, the utility is minimum, i.e. 0. So,
considering, for simplicity, a linear utility func-
tion, we have that the expected benefit in terms
of utility is equal to the expected benefit in terms
of pure entropy. Let us now pass to consider
the utility function of the bother-effect. If thebother-effect is absent, i.e. the state of the node
bother_af is n, the utility is maximum, i.e. 1. If
the bother-effect is present, i.e. the state of the
node bother_af is y, the utility is minimum, i.e.
0. So, the expected bother-effect in terms of
utility is given by P(bother_af y | q) * 0 P(bother_af n | q)
*1 1 P(bother_af y|q).
Let us call the expected cost of q the quantity
ECq Pbother af y j q 6
Finally, let us notice that the entropy utility
and the bother-effect utility have, in general,
different importance. Let us therefore consider
the importance weight distributions kENT and
kdist, where kENT represents the importance
weight assigned to the entropy and kdist repre-
sents the importance weight assigned to the
bother-effect.5 On the basis of all these con-
siderations, the expected net advantage, which
will be called expected profit (EP), from a
question q is given by
EPq EBq kENT ECq kdist 7
where kENT kdist 1. In conclusion, a questionq is worth asking if EP(q)> 0. Moreover, the ques-tion with the maximum EP is the preferred one.
4.4. Sequential asking algorithm
In general, there is a certain number ( > 1) ofless-easy-questions. So a website during the pro-
cess of solving the target problem (i.e. inferring
the user profile) has to solve the meta-problem of
choosing, under trade-off between benefit and
cost, the more opportune less-easy-question to
ask next in order to acquire a new piece of infor-
mation which will contribute to solving the
target problem, taking into account also the fact
that the answer provided by the user has the side-
effect of changing the benefits expected from
asking other questions. Let us establish that ques-
tions are asked one at a time. This strategy is
called myopic. It addresses the following ques-
tion: if you are allowed to ask at most one
5The importance weights may be elicited with the probabilityindifference method. Considering for example kENT we havethat the gambling situation is G: [p* {entropy min, botherabsent}; (1 p)*{entropy max, bother present}], whereasthe certainty situation is I: [{entropy null, bother present}for certain].
-
8/9/2019 Web Profiling Shannon 2006
7/19
question, which question would you choose? As
is well known the analysis of all the possible
sequences of questions is, in practice, intractable
because the number of sequences grows expo-
nentially with the number of tests (Heckerman
et al., 1992). So, many real-world applications
use the so-called myopic value-of-information
approach. The myopic approach is in practice a
good heuristic (Gorry & Barnett, 1985).
We are now ready, on the basis of the consi-
derations made so far, to define at a conceptual
level the basic algorithm of sequential asking.
Let x denote the current available knowledge,
i.e. x all the information entered so far.
0. Ask all the easy-questions, acquire the
answers and set the related user-fact nodesto the appropriate states.
1. Propagate the entered information through
the user-interests network.
2. If all the less-easy-questions have been
asked, then EXIT.
3. For each less-easy-question q not yet asked
calculate EP(q | x).
4. If each EP(q | x)r 0, then EXIT.
5. Ask the user the q with the maximum EP,
acquire the answer and set the related tool-fact node to the appropriate state.
6. Go back to point 1.
4.5. Asking over time
After a certain number of questions the prob-
ability that the user is bothered (because of the
questions), i.e. P(bother_af y | q), increases somuch that for each q it happens that EP(q |
x) r 0, and as a consequence the sequential
asking stops. However, as time passes, it seemsreasonable to suppose that the probability that
the user is still in the state bothered (because of
the questions) decreases. As a consequence, it
might happen that an EP that was negative in the
past is now positive. In other words, after a cer-
tain time interval there might be some questions
that are worth asking, even if they were not in
the past. Such a situation is modelled by dyna-
mically calculating the values of the prior proba-
bility distribution of the node bother on the basisof an appropriate decreasing temporal function.
The practical consequence of all this is that the
user profiling process is not accomplished in a
single and initial phase but in a sequence of
sessions over time. In each session the sequential
asking algorithm is activated and, as a conse-
quence, the user profile is enriched and refined.
In practice, when, after a certain time from the
last user profiling session, the user accesses the
website, the website evaluates if there are some
less-easy-questions that are worth asking. If yes,
the website personalizes the user home page with
a message inviting the user to resume the
dialogue in order to better refine his=her profile.
5. An application of the proposed approach
within a supercomputing portal
In this section we will describe a significant
application of the approach so far presented.
The proposal has been applied within the
CILEA6 supercomputing portal:7 a portal de-
voted to disseminating supercomputing culture
and events in the community of researchers
working in the supercomputing field. More pre-
cisely, the proposal has been applied in the con-
text of a news recommender system embedded in
the portal. The proposal has been implementedin a prototype (developed by the author)8 tested
on a PC off-line; then the prototype was inte-
grated into the portal. Supercomputing news is
recommended in a personalized manner both
through a personalized presentation on the home
page and by activating a personalized alerting
service. The application regards a particular sub-
field of supercomputing, that of computational
fluid dynamics (CFD). CFD encompasses many
heterogeneous application fields (aerospace, bio-medical etc.), various phenomenological areas
(turbulent flows, porous media flows etc.),
various computational techniques (large eddy
simulation, coupled transport equations etc.),
6CILEA is an Italian interuniversity consortium for informa-tion and communication technologies that provides super-computing services to its users in order to promotesupercomputing culture and services.7www.supercomputing.it
8Tools used to implement the proposal: ASP pages, Clanguage, MySQL database, Hugin inference engine.
-
8/9/2019 Web Profiling Shannon 2006
8/19
specialized software packages (Fluent, Fidap
etc.) and so forth. A CFD user may be an aero-
spatial engineer working in the aerospace in-
dustry; another may be a biophysicist working in
the biomedical field etc. A biophysicist probably
pursues the goal of developing computational
models of porous media flows, uses the coupled
transport equations technique and runs his=hermodels with the package Fidap. Conversely, an
aerospatial engineer probably focuses on mod-
elling turbulent flows, and it is probable that
he=she is not interested in the tools used by abiophysicist. So the need for building accurate
user profiles for recommending the right news to
the right users arises. Taking into account the
general network structures of Figures 1 and 2,
the CFD user-interests network and the bother-
effect network illustrated in Figure 3 have been
produced. Notice that the user-interests network
encompasses six nodes (on the top) representing
field-facts, 18 nodes (on the bottom) represent-
ing tool-facts and 30 nodes (in the middle)
representing interests, i.e. topics.
Tool-facts have been distinguished as easy-
facts and less-easy-facts. The facts concerning
the use of software packages (i.e. the nodes
whose names are prefixed by use_s_) have beenconsidered as easy-facts, whereas the facts
concerning the use of computational techniques
(i.e. the nodes whose names are prefixed by
use_t_) have been considered as less-easy-facts.
Referring to the algorithm defined in Section 4.4,
let us start (step 0) by asking the user to enter the
set of easy-facts (Figure 4). The entered facts are
then propagated through the network, inferring
in this way the CFD profile of the current user
(step 1). So now the portal knows, for each of the30 topics, the probability that the current user is
interested in it. The loop of the remaining steps is
then performed, possibly asking less-easy-ques-
tions depending on the specific current case. For
example, if the user declares that he=she worksin the aerospace field and does not use any of
the software packages listed in Figure 4, the
portal finds that it is worth trying to decrease
uncertainty about the goal-interests and begins
to ask less-easy-questions (see Figure 5 foran example).
In this application, user profiles are used by
the portal to recommend CFD news. Such news
is classified by a CFD expert. More precisely,
given a piece of news N, a CFD expert, for each
topic T, defines (through subjective estimation)
the probability that N concerns T. The classified
piece of news is then entered in the website
database. The expected utility EU of each news
N for a user U is calculated according to the
following algorithm:
EU(NU) 0For each T do
EU(NU) EU(NU) EU(NU T)End for
where
EUNUT
UNUTj N concerns T yes; interest in T
yes
PN concerns T yes
Pinterest in T yes
where U(NU T|N concerns T yes, interest inT yes) stands for utility of N for the user U,given that N concerns T and the user Uis inter-
ested in T. For simplicity U(NUT | . . .) has been
assumed to be equal to 1 for any piece of news
and any user. The website has therefore at its dis-
posal a personalized expected utility of each piece
of news, calculated on the basis of the inferred
user profile. So, when a user accesses the portal,
the portal uses such EU values to recommend
news to the user. Recommendation is carried out
by presenting news with different emphasis and
in decreasing order of relevance (Figure 6).
As stated in Section 4.5, user profiles are
enriched and refined over time. In practice, when
a user accesses the portal, the portal evaluates if
there are some less-easy-questions that are worth
asking9 and, if that is the case, places at the
beginning of the news list a message inviting the
user to resume the dialogue (Figure 7).
News recommendation is also accomplished
via alerting e-mail. Users can take advantage of
9The algorithm in Section 4.4 is performed by starting fromstep 2.
-
8/9/2019 Web Profiling Shannon 2006
9/19Figu
re3:Thefigureshowsthebother-effectnetwork(topleft)andtheCFDuser-interestsnetworkoftheapplicationoftheproposaltothe
CILEAsupercomputingportal.TheinterestsnetworkinstantiatestheabstractnetworkofFigure1.Moreprecisely,startingfromthetop,the
nodesofthetoplayer(whosename
sbeginwithWork_in)areoftypefield-fact.Comingdown,thenodesofthefollowingthree
layersrepresent
interestsandareoftypefield-interest,goal-interest,tool-interestrespectively.Thenodesofthebottomlayerareoftypetool-fact.Inparticular
theeightnodesontheleftconcerntheuseofcomputationaltechniq
ues(less-easy-facts),w
hereas
theremainingtenontherightc
oncerntheuseof
softw
arepackages(easy-facts).
-
8/9/2019 Web Profiling Shannon 2006
10/19
the intelligent alerting service: for each user, the
portal considers the set of news not yet seen bythe user, and for each piece of news calculates
the expected utility the piece of news has for the
user; if the expected utility is greater than a given
threshold, the portal sends the relevant piece of
news to the user via e-mail.
6. Experimental results
The proposal implementation presented in Sec-tion 5 has been working for more than a year
inside the supercomputing portal. In order to
better test the proposal, several heterogeneous
institutions (research centres, university depart-
ments, organizations operating in the environ-
mental physics field, industries, CFD software
providers etc.) working in the CFD field have
been involved. During that period of time
several researchers working with those institu-
tions have used the proposal implementation.They have been interviewed, collecting their
comments both about the proposed approach to
user profiling and about the accuracy of various
types of profiles.
6.1. User feedback about the approach
Most users have expressed a favourable impres-
sion about the approach. In particular, they
have appreciated both the fact that their inferredprofiles were ready soon after their initial login
Do you use (in your work activity) the computational technique:
Volume of Fluid Methods?
yes
no
I don't feel like answering this question at the moment
I would prefer not to answer this question, even in the future
SEND
Figure 5: An example of a less-easy-question.
If you wish you may enrich your profile record by selecting in the area ofComputational Fluid Dynamics the application field/s
you work in
Aerospace (Environmental Control System, Propulsion, Pumps, Rotor-Airframe Interaction, External Aerodynamics ...)
Biomedical (Blood Handling Equipment, Physiological Flows, Toxicology Research ...)
ChemicalProcess (Combustion, Drying, Emission Control, Filtration, Reaction, Water Treatment ...)
Environment (Atmospheric Plume Dispersion, Coast Erosion, Hydro-Geological Applications, Irrigation Components,
Meteorological Applications, Petroleum Platforms, Sea Technologies, Service Reservoirs, Wastewater Pump-stations, Weirs, ...)
Automotive (Engine Cooling, External Aerodynamics, Intake Valves, Hunderhood Flow Simulation, Vehicle Interior, Vapor
Dispersion, Windshield Washer Nozzles...)
Energy (Boilers, Burners, Coal Transport and Classification,Combustor, Hydro-power, Incinerators, Nuclear Reactors,
Turbomachinery ...)
If you wish you may enrich your profile record by selecting the software package/s you use for running your models in the area of
Computational Fluid Dynamics
KIVA
FLUENT
FIDAP
ADINA
CFXCFX_TASKFLOW
CFX_TURBOGRID
GAMBIT
TGRID
STAR_CD
Figure 4: The form used for entering the initial set of easy-facts, i.e. the subset of six field-facts (top)
followed by the subset of ten tool-facts (bottom).
-
8/9/2019 Web Profiling Shannon 2006
11/19
and the fact of being directly involved (through
specific knowledge-based questions) in the pro-
file refining process over time. They have
declared that they are not bothered by being
asked questions over time from the website;
conversely they have declared that they arepleased to cooperate with the website. They have
perceived the question-asking attitude from the
website as a sort of constant and competent
attention from the website to their needs (the
website behaves like an expert that looks after
their specialist interests). In fact questions are
asked in an unobtrusive manner because of both
the fact that they are asked over time according
to the sequential asking algorithm and the fact
that a user is not forced to answer them. A user isfirst asked if he=she agrees with the website
resuming the dialogue. Moreover, for each
question the user is offered the possibility of
temporally suspending the answer if he=she doesnot feel like answering at that moment, or telling
the website not to ask that question any longer
(see the choice options appearing in Figure 5).We note that the type of user population that has
been considered in this experimental year has the
following characteristics. Users are researchers
in the CFD field. In general they have used the
alerting service to be automatically notified of
possibly relevant newly created news, and have
not accessed the portal very often. Both users
and the website share a very specific field of
knowledge (CFD knowledge), and it is just this
knowledge on which the dialogue between usersand the website is based.
GOOD NEWS FOR YOU!!!
Int. conf. on Aerospace supercomputing
FURTHER NEWS OF PROBABLE INTEREST FOR YOU
Tutorial course on Turbulent Flows modelling
OTHER NEWS YOU MIGHT BE INTERESTED IN
Summer school on Fluent
OTHER NEWS
Advanced course on porous media flows modelling
Int. Conf. on Biomedical supercomputing
Figure 6: An example of personalized news recommendation, given the fact that the user works in
the aerospace field.
Figure 7: An example of a message inviting the user to resume the dialogue.
-
8/9/2019 Web Profiling Shannon 2006
12/19
6.2. User feedback about profile accuracy
Profile accuracy has been checked in an empiri-
cal manner. Initially the probability tables of the
user-interests network were elicited, with the aid
of suitable forms, from the CILEA CFD expert.
Then they were tuned with the cooperation of aset of end-users who had used both probability
soundness tests and recommendation sound-
ness tests. In order to clearly explain what these
soundness tests consist of, let us consider three
sample sets whose definitions are illustrated in
the following section.
6.2.1. Sample set definitions Some end-users
belonging to heterogeneous CFD sub-fields havebeen involved in providing feedback about
profile accuracy. Let SEU be the sample set of
such end-users.
Some typical hypothetical cases of user-facts
have been defined. For example, let us consider a
hypothetical case defined by the user-facts
working in the aerospace field and using the
software tool Fluent (e.g. the case of an
aerospatial engineer), another defined by the
user-facts working in the biomedical field andusing the software tool Fidap (e.g. the case of a
bioengineer) and so on. Let SHC be the sample
set of such hypothetical cases.
Some suitable hypothetical piece of news has
been created. More precisely, for each CFD
topic T hypothetical news NT concerning T with
probability 1 and the other topics with probabi-
lity 0 has been created so that the expected utility
of NT is equal to the probability of interest in
T
yes. For example, for the topic turbulence
flow modelling news like International Confer-
ence on Turbulence Flow Modelling has been
created, for the topic porous media flow model-
ling news like Summer School on Porous
Media Flow Modelling has been created, and
so forth. Let SHN be the sample set of such
hypothetical news.
6.2.2. Soundness tests Soundness tests have
been performed according to the followingalgorithm.
For each hypothetical case in SHC:
enter the related user-facts and propagatethem through the user-interests network, in
this way producing the related inferred
profile;
activate the recommender system with theSHN set, in this way producing the list of
hypothetical news recommended according
to the inferred profile;
ask each end-user in the SEU to examine the inferred profile, to check if the inferred
user-interest probabilities could be con-
sidered acceptable in the light of the
domain common sense, given the entered
facts (probability soundness test);
the emphasis given to each hypotheticalpiece of news by the recommender system,
to check if the emphasis degree could be
considered compatible with the inferred
profile in the light of the domain common
sense (recommendation soundness test).
The feedback collected from the end-users of the
SEU was then used to tune the probability tables
of the user-interests network.
7. Strengths of the proposed approach and
suitable application domains
A basic humancomputer interaction element
underlying the proposed approach is represent-
ed by the fact that the main aim of the website is
that of cooperating with a user (without bother-
ing him=her) to build an accurate profile of him=her. The website aims at building an accurate
user profile to create a collaborative user rela-tionship over time, so that the user perceives the
website as a collaborator watching over his=her interests and looking after his=her profileover time. In fact, even after the initial profile
construction session, the website does not miss
future favourable opportunities to invite the user
to resume the dialogue in order to better refine
his=her profile. This approach of involving theuser as a partner in the process of building
his=her profile by asking him=her questions evenover time (without bothering him=her) is more
-
8/9/2019 Web Profiling Shannon 2006
13/19
suitable to websites specialized in certain
areas interesting a very specific community, such
as, for example, a specific scientific community.
In these cases, in general, the domain knowledge
of the application expands more in depth than
in broadness, so questions are not excessively
numerous, are more specific, the answers are
more significant because they refer to a specia-
lized universe of knowledge, and the dialogue
between the user and the website resembles
a dialogue between two experts operating in
the same field. As a consequence in these cases
it is less probable that a user gets bored when
the website asks him=her some competentquestions.
Another strength of the proposed approach is
represented by the fact that the website has at its
disposal a complete inferred profile of the user
soon after the initial session, where the user
enters his=her user-facts. In other words, thewebsite does not need a statistically significant
number of accesses from the user to build a com-
plete user profile. This is a very useful feature for
those websites that have a low number of access-
es, possibly because of the type of population
accessing the website, the type of news or ser-
vices provided by the website etc. For example, adeep-knowledge-based website, concerning a
specific field and having the main purpose of
promoting and disseminating the specific field
culture, may have a user population mostly
consisting of professional people operating in
the same field (e.g. researchers in the field). In
general, such users do not need to access the
website for their everyday tasks. They access the
website for reading recent news. News concerns,
in general, cultural events in the field: confer-ences, courses, books etc. infrequent events
which do not occur daily. Moreover, the website
has an alerting service which users might prefer
to use instead of directly accessing the website.
For all these reasons the website may have a low
number of accesses, but nevertheless the website
needs to have at its disposal an accurate user
profile (for providing personalized services)
from the beginning of the userwebsite relation-
ship. The presented approach represents asolution to such a problem.
Summarizing the considerations made so far,
let us conclude that the approach presented in
this paper is suitable for websites characterized
by the following facts:
they are based on specialist knowledge (such
as for example scientific websites); the dialogue with users is on the basis of
domain knowledge that the website and the
users have in common;
they may have a low number of accesses, butnevertheless they need to have at their
disposal a complete inferred user profile
soon after the user enters his=her initial data.
Conversely, in cases of websites dealing with a
broad and generic domain of knowledge, a verylarge population of visitors and a statistically
significant number of accesses, such as for
example popular commercial websites, famous
book-store websites etc., alternative approaches
are used. In general, these approaches are based
on collaborative filtering techniques or other
techniques of user profiling based on user be-
haviour observation (i.e. implicit data acquisi-
tion), i.e. techniques that avoid any type of
active involvement from users. Some of suchtechniques will be reviewed in Section 8.
8. Related work and discussion
In this section we will discuss the ideas underly-
ing the proposal in the wider context of human
computer interaction research with a particular
focus on user profiling. User profiling on the
Web is a topic which has received much
attention in several international journals, con-ferences and books. The topic has been ap-
proached in the light of various technologies and
has found application in several heterogeneous
fields. An exhaustive overview of the state of the
art is beyond the scope of this paper. We will
limit ourselves to comparing the proposal to
some significant approaches. User profiling is
typically either knowledge-based or behaviour-
based. In knowledge-based approaches informa-
tion about users is acquired explicitly throughquestionnaires or single questions. Behaviour-
-
8/9/2019 Web Profiling Shannon 2006
14/19
based approaches use implicit observations of
user behaviour, commonly using machine-learn-
ing techniques to discover useful patterns in the
behaviour (data mining techniques are applied
to log files to extract patterns). We can also clas-
sify works on user profiling from two orthogonal
points of view: the technology used to build pro-
files and the target task the profiles are built for.
For example, technologies may be Bayesian net-
works, case-based reasoning, ontologies, data
mining etc., while target tasks may be informa-
tion retrieval, e-learning, e-commerce, recom-
mender systems etc. According to these points of
view, the proposal presented in the present paper
may be classified as knowledge-based, using the
Bayesian network technology and addressing
the recommender system target task.
Godoy et al. (2004) propose two intelligent
agents, PersonalSearcher and NewsAgent, that
assist users in tasks of filtering and organize
information available on the Web. Personal-
Searcher (a personalized Web searcher) is an
interface agent that helps users who are search-
ing the Web for relevant information by filtering
a set of documents retrieved from several search
engines according to users interests. NewsAgent
(a personalized digital newspaper generator) isan interface agent that selects those articles that
are relevant to a user from several online news-
papers. The agents incrementally build a hier-
archy (a tree) of users relevant topics by means
of a textual case-based reasoning technique (a
specialization of case-based reasoning for tex-
tual documents). Profiles are adapted as agents
interact with users over time. Both agents
observe users behaviour while they are reading
Web documents, recording the main featurescharacterizing these experiences. A user profile
consists of a set of weighted topics relevant to a
user. This approach is therefore behaviour-
based, uses the case-based reasoning technology,
and addresses the information retrieval and
intelligent assistant target tasks. In our proposal
also user profiles are represented in terms of topics,
i.e. user interests. In our proposal, however, a
topic hierarchy is represented by a network.
Wong and Butz (2000) propose a method forrepresenting a user profile as a Bayesian net-
work. Such a network is learned from a sample
of documents that are judged by the user to be
relevant or irrelevant. The proposed method
addresses the information retrieval target task.
An approach addressing the information
retrieval target task and integrating the case-
based reasoning and Bayesian network techni-
ques to build user profiles incrementally is
presented in Schiaffino and Amandi (2000).
Case-based reasoning provides a mechanism to
acquire knowledge about user actions that are
worth recording to determine his=her habits andpreferences. Each case records the attributes or
keywords used by a user to perform queries. A
query is classified according to its similarity to
previous recorded queries. The Bayesian net-
work provides a tool to represent relationships
between items of interest. It is built gradually as
a user queries the database. Information stored
in the form of cases is used to gradually build
and update the Bayesian network as the interests
of the user change over time. The user profile
consists of a statistical profile (type of queries
frequently made etc.) and an inferred (via the
Bayesian network) profile. A profile is used to
suggest the execution of relevant queries to a user
at an appropriate moment. The authors focustheir attention on users who need data stored in
the database to fulfil their everyday tasks, or at
least who often use the database. Conversely,
our proposal focuses on a type of population of
users who do not access a website very often. In
our case the website has at its disposal a general
Bayesian network (knowledge base) that is then
instantiated to a specific user and generates a
specific inferred profile, but does not need to be
built incrementally. This is an advantage inapplications where the user population does not
produce a great number of accesses. In these
cases the solution is just the knowledge-based
one: asking users explicitly for some informa-
tion, involving the user in a cooperative relation-
ship. So, even soon after the first contact with a
new user (registration phase and data entry of
Figure 4) the website has at its disposal an
inferred profile of the user. Another difference
concerns the target task: our proposal addressesthe task of news recommendation.
-
8/9/2019 Web Profiling Shannon 2006
15/19
In Nokelainen et al. (2002) an adaptive online
questionnaire system, EDUFORM, is pre-
sented. The proposal addresses the educational
target task and uses Bayesian probabilistic
models. The authors face the problem of one
size fits all on-line questionnaires equipped with
numerous propositions. In particular, EDU-
FORM is a Web-based data gathering tool,
which performs adaptive and dynamic optimi-
zation of the number of questionnaire proposi-
tions during the data gathering process.
EDUFORM uses probabilistic Bayesian meth-
ods to create user profiles that are then used to
dynamically optimize the set of propositions
that are presented to a user in order to maximize
information extraction. In our approach the
goal of minimizing the number of questions
and maximizing information extraction is
achieved by using the value-of-information
framework.
The approach presented in Adomavicius and
Tuzhilin (1999, 2001) is well suited to the e-
commerce target task. It is behaviour-based and
uses data mining methods. More precisely, the
authors present a method for constructing user
behavioural profiles using data mining techni-
ques. Profiles are specified with sets of ruleslearned from transactional histories. Since many
rules can be spurious, irrelevant or trivial, a
method for validating them separating good
rules from bad ones is presented.
Data mining techniques are also used in
Nasraoui et al. (2002) where the authors present
a framework (based on fuzzy relational cluster-
ing) for mining typical user profiles from the
vast amount of historical data stored in server
access logs.Soltysiak and Crabtree (1998) use a heuristics-
based clustering method to generate user interest
profiles. The method is applied to the e-mails a
user sends or receives, and the WWW pages
he=she browses. They use a keyword extractiontechnique for identifying relevant keywords
within a document. A document is then repre-
sented as a vector of keywords. The vectors are
used as the basis of grouping documents into
clusters. A sufficiently large cluster of docu-ments represents a users interest.
The same authors (Crabtree & Soltysiak,
1998) address the information retrieval target
task, focusing in particular on tracking interest
themes over time through measuring the simi-
larity of interest themes across time periods.
Heuristics-based techniques are also used in
Kostoff et al. (2001) and Rousseau et al. (2004)
addressing the intelligent assistant and informa-
tion retrieval target tasks respectively.
In Esposito et al. (2003) a comparison of the
effectiveness of two supervised methods for
learning user profiles, inductive logic program-
ming and Bayesian classifier, is accomplished.
The comparison focuses on the two different
learning strategies to infer models of user
interests from textual book descriptions. Experi-
mental results are conducted in the context of a
content-based profiling system for a virtual
bookshop on the Web.
Recently particular attention has been paid to
the use of ontology-based technologies in user
profiling. Middleton et al. (2004) explore an
ontological approach to user profiling within
recommender systems, working on the problem
of recommending online academic research
papers. They present two experimental systems
that create user profiles from unobtrusivelymonitored behaviour and relevance feedback,
representing the profiles in terms of a research-
paper topics ontology. Papers are classified
using ontological classes. The database of re-
search papers is classified using a research-paper
topics ontology and a set of training examples.
Recorded Web browsing and relevance feedback
elicited from users are used to compute daily
profiles of users research interests. Interest pro-
files are represented in ontological terms, allow-ing other interests to be inferred. The interest
profiles are visualized to allow elicitation of
direct profile feedback.
The same authors face the theme of the use of
ontologies in recommender systems (Middleton
et al., 2001), and in Middleton et al. (2003) they
explore the idea of profile visualization to
capture further knowledge about user interests.
Let us classify in Table 1 the set of papers con-
sidered so far. The papers, along with the presentproposal, are located in the table according to
-
8/9/2019 Web Profiling Shannon 2006
16/19Tab
le1:
Theworksexaminedcla
ssifiedaccordingtothetechno
logytheyuseandthetargettasktheyaddress
e-C
ommerce
Informationretrieval
Recommendersystem
e-Learningand
education
Intelligentassistant
Ontologies
Middletonetal.,
2004
Middletonetal.,
2001
Middletonetal.,
2003
Bayesiannetworks
Wong&Butz,
2000
a
Case-basedreasoning
Godoyet
al.,
2004
G
odoyetal.,
2004
Case-basedreasoningand
Bayesiannetworks
Schiaffino
&
Amandi,2000
Bayesianprobabilistic
mod
els
Nokelainen
etal.,
2002
Dataminingandrule
discovery
Adomavicius&
Tu
zhilin,
1999
Adomavicius&
Tu
zhilin,
2001
Web
logdataminingand
fuzzyclustering
N
asraouietal.,
2002
Heu
ristics-basedclustering
Crabtree
&
Soltysiak,
1998
Rousseau
etal.,
2004
Soltysiak&
C
rabtree,
1998
K
ostoffetal.,
2001
Indu
ctivelogicprogramming
Es
positoetal.,
2003
Bayesianclassification
Es
positoetal.,
2003
aWorkpresentedinthispaper.
-
8/9/2019 Web Profiling Shannon 2006
17/19
the technology and the target task by which they
are characterized. Although the set of applica-
tions that have been considered is not exhaus-
tive, the table gives an idea of the variety of
technological approaches and target tasks of
user profiling. It also gives an idea of both the
enormous amount of work done in the user
profiling field and the difficulties underlying the
problem of building accurate user profiling.
9. Conclusions
This paper has presented a user profiling app-
roach based on a deep-knowledge model of user
interests (i.e. domain topics) and a sequentialasking algorithm using the value-of-information
theory based on Shannon entropy. The app-
roach has been implemented and tried out for
over a year in a real context. Experimental re-
sults have indicated that the proposal is parti-
cularly suitable for websites addressing domains
with specific deep knowledge (like scientific web-
sites) and with a user population that is charac-
terized by sharing the same knowledge and
producing a low number of accesses (i.e. low ifconfronted with the high number of accesses of a
typical commercial website). The approach has
been discussed in the context of the scientific
literature concerning humancomputer inter-
action and in particular user profiling. The
discussion has highlighted that the proposed
approach differs from others mostly because of
its emphasis on involving users in collaborative
processes for building and refining their profiles.
Acknowledgements
In alphabetical order, thanks to Dr Enrico
Cavalli for the integration of the prototype into
the portal running on the server computer,
thanks to Dr Paolo Ramieri for his role of
CFD expert providing CFD knowledge, thanks
to the anonymous reviewers for their valuable
comments, and thanks to the users involved fortheir cooperation and feedback.
References
ADOMAVICIUS, G. and A. TUZHILIN (1999) User pro-
filing in personalized applications through rule
discovery and validation, in Proceedings of the Fifth
ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, San Diego, CA,
377381.ADOMAVICIUS, G. and A. TUZHILIN (2001) Using data
mining methods to build customer profiles, IEEE
Computer, 34 (2), 7482.BILLSUS, D., C.A. BRUNK, C. EVANS, B. GLADISH and
M. PAZZANI (2002) Adaptive interfaces for ubiqui-
tous Web access, Communications of the ACM, 45
(5), 3438.BRUSILOVSKY, P. and M.T. MAYBURY (2002) From
adaptive hypermedia to the adaptive Web, Commu-
nications of the ACM, 45 (5), 3033.
CRABTREE, I.B. and S.J. SOLTYSIAK (1998) Identifying
and tracking changing interests, International Jour-nal of Digital Libraries, 2, 3853.
ESPOSITO, F., G. SEMERARO, S. FERILLI, M. DEGEM-
MIS, N. DI MAURO, T.M.A. BASILE and P. LOPS
(2003) Evaluation and validation of two approaches
to user profiling, in Proceedings of the ECML=PKDD-2003 First European Web Mining Forum, B.
Berendt, A. Hotho, D. Mladenic, M. van Someren,
M. Spiliopoulou and G. Stumme (eds).
FINK, J., J. KOENEMANN, S. NOLLER and I. SCHWAB
(2002) Putting personalization into practice, Com-
munications of the ACM, 45 (5), 4142.
GODOY, D., S. SCHIAFFINO and A. AMANDI (2004)Interface agents personalizing Web-based tasks,
Cognitive Systems Research, 5, 207222.
GORRY, G.A. and G.O. BARNETT (1985) Experience
with a model of sequential diagnosis, in Computer-
assisted Medical Decision Making, J.A. Reggia and S.
Turhim (eds), Berlin: Springer, Vol. 1, pp. 206222.HECKERMAN, D.E., E.J. HORVITZ and B.N. NATHWA-
NI (1992) Toward normative expert systems, Part I:
The Pathfinder project, Methods of Information in
Medicine, 31 (2), 90105.
HIRSH, H., C. BASU and B.D. DAVISON (2000) Learn-
ing to personalize, Communications of the ACM, 43(8), 102106.
HORVITZ, E., A. JACOBS and D. HOVEL (1999) Atten-
tion-sensitive alerting, in Proceedings of UAI 99
Conference on Uncertainty in Artificial Intelligence,
305313.
JENSEN, F.V. (2001) Bayesian Networks and Decision
Graphs, Berlin: Springer.
KOSTOFF, R.N., J.A. DEL RIO, J.A. HUMENIK, E.O.
GARCIA and A.M. RAMIREZ (2001) Citation mining:
integrating text mining and bibliometrics for re-
search user profiling, Journal of the American Society
for Information Science and Technology, 52 (13),11481156.
-
8/9/2019 Web Profiling Shannon 2006
18/19
MIDDLETON, S.E., D.C. DE ROURE and N.R. SHAD-
BOLT (2001) Capturing knowledge of user preferences:
ontologies in recommender systems, in Proceedings ofthe International Conference on Knowledge Capture
K-CAP 2001, Victoria, B.C., Canada.
MIDDLETON, S.E., N.R. SHADBOLT and D.C. DE
ROURE (2003) Capturing interest through inference
and visualization: ontological user profiling in
recommender systems, in Proceedings of the Interna-
tional Conference on Knowledge Capture K-CAP
2003, Sundial Beach Resort, Sanibel Island, FL.MIDDLETON, S.E., N.R. SHADBOLT and D.C. DE
ROURE (2004) Ontological user profiling in recom-
mender systems, ACM Transactions on Information
Systems, 22 (1), 5488.
MUSSI, S. (2003) Providing websites with capabilities of
one-to-one marketing, Expert Systems, 20 (1), 819.NASRAOUI, O., R. KRISHNAPURAM, A. JOSHI and T.
KAMDAR (2002) Automatic Web user profiling and
personalization using robust fuzzy relational clus-tering, in e-Commerce and Intelligent Methods, J.
Segovia, P. Szczepaniak and M. Niedzwiedzinski
(eds), Studies in Fuzziness and Soft Computing,
Berlin: Springer.NOKELAINEN, P., H. TIRRI, M . MIETTINEN and T.
SILANDER (2002) Optimizing and profiling users
online with Bayesian probabilistic modeling, inProceedings of the NL 2002 Conference.
PEARL, J. (1988) Probabilistic Reasoning in Intelligent
Systems, San Mateo, CA: Morgan Kaufmann.ROUSSEAU, B., P. BROWNE, P . MALONE and M.
OFOGHLu (2004) User profiling for content perso-nalization in information retrieval, in Proceedings 19th ACM Symposium on Applied Computing, SAC
(Nicosia Cyprus).
SCHIAFFINO, S.N. and A. AMANDI (2000) User profil-
ing with case-based reasoning and Bayesian net-
works, in Proceedings International Joint Confer-
ence IBERAMIA-SBIA, 1221.
SHANNON, C.E. and W. WEAVER (1949) The Mathe-
matical Theory of Communication, Urbana, IL:
University of Illinois Press.
SOLTYSIAK, S.J. and I.B. CRABTREE (1998) Automaticlearning of user profiles towards the personaliza-
tion of agent services, BT Technology Journal, 16 (3).
VON WINTERFELDT, D. and W. EDWARDS (1986)Decision Analysis and Behavioral Research, Cam-bridge: Cambridge University Press.
WONG, S.K.M. and C.J. BUTZ (2000) A Bayesian
approach to user profiling in information retrieval,Technology Letters, 4 (1), 5056.
The author
Silvano Mussi
Silvano Mussi graduated in physics in 1975 from
the University of Milan, Italy. He worked at
ITALTEL for ten years in the fields of software
engineering and functional discrete simulations
of real-time systems. He has been with CILEA
since 1981. He has cooperated in research acti-
vities with Milan Polytechnic and Brescia Uni-
versity where for three academic years he
was contract-professor of artificial intelli-
gence. For over ten years he has been doingresearch in the fields of knowledge engineer-
ing and expert systems. His current research
interests address methods for providing web-
sites with capabilities of decision-making and
reasoning under conditions pervaded with un-
certainty.
-
8/9/2019 Web Profiling Shannon 2006
19/19