web profiling shannon 2006

8/9/2019 Web Profiling Shannon 2006

1/19

User profiling on the Web based on deepknowledge and sequential questioning

Silvano MussiCILEA (Interuniversity Consortium for Information and Communication Technologies), via R.

Sanzio 4, 20090 Segrate-Mi, Italy

E-mail: [email protected]

Abstract: User profiling on the Web is a topic that has attracted a great number of technological approachesand applications. In most user profiling approaches the website learns profiles from data implicitly acquired from

user behaviours, i.e. observing the behaviours of users with a statistically significant number of accesses. This

paper presents an alternative approach. In this approach the website explicitly acquires data from users, user

interests are represented in a Bayesian network, and user profiles are enriched and refined over time. The profile

enrichment is achieved through a sequential asking algorithm based on the value-of-information theory using the

Shannon entropy concept. However, what mostly characterizes the approach is the fact that the user is involved in

a collaborative process of profile building. The approach has been tried out for over a year in a real application.

On the basis of the experimental results the approach turns out to be particularly suitable for applications where

the website is strongly based on deep domain knowledge (as for example is the case for scientific websites) and has

a community of users that share the same domain knowledge of the website and produce a low number ofaccesses (low compared to the high number of accesses of a typical commercial website). After presenting the

technical aspects of the approach, we discuss the underlying ideas in the light of the experimental results and the

literature on humancomputer interaction and user profiling.

Keywords: user profiling, deep knowledge, value of information, Bayesian networks, sequential asking,Shannon entropy

1. Introduction

In order for a website to be able to be collabora-

tive with its users (e.g. to be adaptive (Billsus

et al., 2002; Brusilovsky & Maybury, 2002), toexhibit a personalized behaviour (Fink et al.,

2002), to produce personalized alerting (Horvitz

et al., 1999) etc.), it is necessary that the website

knows the user interests at a sufficiently fine level

of refinement. In other words, the website has to

acquire the user profile. This paper presents an

approach to the acquisition and construction of

user profiling. A plain approach to this problem

consists in making the website directly acquire

from the user all the information it needs, e.g. byasking the user to fill in a sort of questionnaire

such as: Are you interested in the topic X? If yes,

to what extent? And so forth, for each possible

topic X. However, such an approach could result

in bothering the user both because there are toomany questions and because some questions

might turn out to be not very easy for the user.

In order to overcome such a difficulty a website

has to be able to infer hypotheses about multiple

user interests on the basis of little and simple

information easily provided by the user, i.e.

information concerning simple facts1 rather than

information entailing a certain mental effort

Article _____________________________

1If we consider the medical domain, simple facts provided bythe patient are, for example, I have fever, I have a pain inmy stomach etc.


2/19

from the user (e.g. information resulting from

mental activities like judgment, introspection,

abstraction, reasoning etc.). However, it may

happen that after these initial questions the

information collected is not sufficient to infer

user interests with a low margin of uncertainty.

A website has therefore to consider the possibi-

lity of asking additional questions, questions

whose answers may be, in general, not very easy

to provide. In other words, because these ques-

tions are not as easy as the initial ones, there is

the possibility that a user may feel bothered

when he=she is asked them. The questionsshould therefore be asked only if it is necessary,

and they cannot be asked all and indistinctly at

the same time, given that each question entails a

trade-off between expected benefit and expected

bother. To solve such a problem, the paradigm

of sequential asking is proposed and applied.

The paper is organized as follows. Section 2

introduces, as a background topic, the role of

deep knowledge in user profiling, while Section 3

introduces, as a foreground topic, the role of

sequential asking in user profiling. Section 4

presents the technical aspects of the proposal

focusing on the paradigm of sequential asking.

Section 5 illustrates a significant experimentalapplication, while Section 6 presents the experi-

mental results and give some remarks on them.

In Section 7 some strengths of the proposed

approach are pointed out. Section 8 discusses

the proposal in the context of humancomputer

interaction research and related work. Finally,

Section 9 draws some conclusions.

2. Background: deep-knowledge-based user

profiling

Inferring user interests starting from acquired

facts may be carried out by using a deep-

knowledge-based approach. In the deep-knowl-

edge-based approach to user profiling, deep

knowledge is represented by causal knowledge

linking user facts to user interests and organizing

the user interests themselves in causal paths. In

practice, user interests coincide with the topics ofthe domain knowledge. Let us notice that, in

order to be realistic, interests inference cannot

avoid taking uncertainty into account. As a

consequence, Bayesian networks (Pearl, 1988;

Jensen, 2001) turn out to be an ideal tool to

represent deep knowledge and to reason with it

under uncertainty. A site provided with deep

knowledge does not need a statistically signifi-

cant number of accesses before exhibiting a

personalized behaviour, as is required from

similarity-based approaches. Similarity-based

approaches like content-based or collaborative

filtering approaches (Hirsh et al., 2000) are, in a

sense, based on shallow knowledge: an item is

supposed to be relevant for the user if its content

is similar to the content of the items he=shepreferred in the past (content-based approach)

or if other users with a similar taste preferred it

in the past (collaborative filtering approach). So,

a site using these approaches does not know the

deep reasons why a user prefers an item it lacks

deep knowledge, i.e. it lacks causal knowledge

linking the user preferences (effects) to the user

goals (causes).

Turning to Bayesian networks, let us face the

problem of building the Bayesian network of the

interests of a user working, or, more generally,

operating, in a well-defined domain of real life,characterized by precise domain knowledge. Let

us think, for example, of the medical domain, the

computer science domain etc. Inside a domain of

knowledge (for short, in the following, domain),

multiple topics may be identified. Let us adopt

the basic assumption that domain topics repre-

sent possible user interests (in the respective

topics). In general, working in a certain field

involves pursuing certain goals, and pursuing a

certain goal involves using certain tools. So wehave topics concerning fields (field-topics), goals

(goal-topics) and tools (tool-topics). As a con-

sequence, being interested in a certain field-topic

involves being interested in certain goal-topics,

and being interested in a certain goal-topic

involves being interested in certain tool-topics.

We have therefore implicitly built a causal

network of interests. What we need now is to

quantify uncertainty. So let us ask the domain

expert to provide conditional probabilities, i.e.P(interest in topic X| interest in topic Y). We


3/19

have therefore obtained the Bayesian network of

interests in the domain topics for an abstract user.

In order to be able to personalize the network to a

specific user we need to add user-facts, i.e. facts

that affect interests or are consequences of

interests. Among user-facts let us distinguish

between facts concerning tools (tool-facts) and

facts concerning fields (field-facts). A tool-fact

concerns the use of a tool. For example, the user

uses the software S is a tool-fact. A tool-fact is

caused by the interest in the related tool, e.g. the

fact that the user uses S results from the fact that

the user is interested in S. Let us ask the expert

to provide the probability of using a tool given

the interest in it, e.g. P(the user uses the tool S |

the user is interested in the tool S).2 A tool-fact

plays the role of a symptom of the presence (in

the mind of the user) of interest in the related

tool-topic. A field-fact concerns a user context.

For example, the user works in the field of

software development is a field-fact. A field-fact

affects the related field-interest, e.g. working in

the field of software development affects interest

in the field of software development. Let us ask

the expert to provide the probability that a user

is interested in a field given the fact that the user

works in a certain field, e.g. P(the user is in-terested in the field of software development |

the user works in the field of software develop-

ment). A field-fact plays the role of a situation

conditioning the probability of the presence (in

the mind of the user) of interest in the related

field-topic (it is the case of anamnesis in the me-

dical field). In the following, for short, interests

in field-topics, goal-topics and tool-topics will be

called field-interests, goal-interests and tool-

interests respectively. Let us now resume theconsiderations made so far by representing in

Figure 1 the conceptual structure of the user-

interests network. For simplicity let us consider

two states for each node, i.e. y (yes state), n

(no state). See Mussi (2003) for more details on

a knowledge-based approach to user profiling.

3. Foreground: question-asking-based user

profiling

Among user-facts we can distinguish between

facts provided by the user without any effort at

all, i.e. facts definitely easy to provide (for short,

easy-facts), and facts provided by the user after a

certain mental process, a certain reasoning ef-

fort, i.e. facts less easy to provide (for short, less-

easy-facts). For simplicity let us limit our analy-

sis to consider less-easy-facts concerning tools(i.e. tool-facts less easy to provide). For example,

let us consider the question Do you use X?. The

answer is easy if X is an object such as for ex-

ample a certain computer or a certain software

product etc., but it might not be so easy if X

denotes an approach to a problem or a frame-

work etc. In fact the user might have developed a

solution that is not just a mere instantiation of a

well-known approach but, for example, takes

some ideas from a certain approach, some otherideas from other approaches, and then intro-

tool_fact

tool_interest

goal_interest

field_interest

field_fact

tool_fact_1

tool_interest

goal_interest

field_interest

field_fact_1

Figure 1: Conceptual structure of the user-

interests network. The pure interests network

(represented by nodes of type field-interest, goal-

interest, tool-interest) has been enriched with

nodes of type field-fact and tool-fact. The figure

shows an abstract network, with only two nodes

for each type. Figure 3 will show a real instance.

2Let us note that in real-world applications the presence ofinterest in a tool is not sufficient to make the user use it,especially in cases in which the user has at his=her disposal

possible alternative tools for pursuing the same goal.


4/19

duces some original variants.3 On the basis of

these considerations we conclude that a website

cannot avoid considering the possibility that a

user be bothered by having to answer a question

that entails a certain mental effort (reasoning,

analysis, assessment etc.). So, whereas easy-facts

may be asked without restrictions, less-easy-

facts should be asked only if it is worth doing it,

taking into account the probability of bothering

the user. In the following, a question for acquir-

ing an easy-fact will be called, for short, an easy-

question, whereas a question for acquiring a

less-easy-fact will be called a less-easy-question.

The probability of bothering the user because of

a less-easy-question is affected by the topic of the

question itself. If the question concerns a topic

the user is not interested in, there is a higher

probability that the user gets bothered by the

question. The fact that the user is not interested

in the question topic might be due to the fact that

the field he=she works in does not concern thattopic. In such a case he=she might perceive asense of logical discontinuity with his=her pre-viously entered easy-facts and might not know

that topic very well or even at all. So in conclus-

ion he=she is more likely to get bothered if the

question concerns a topic he=she is not inter-ested in. These considerations prompt us to

consider a bother-effect network beside the user-

interests network. The bother-effect network

simply consists of three nodes (each one with

two states y, n): bother, bother_af, interest-

ed. Let us examine their meanings and how they

are connected. The bother node (which stands

for the user is bothered) and the bother_af

node (which stands for after the question the

user is bothered) represent the current status(bothered or non-bothered) of the current user

respectively before and after a question is asked.

They are therefore linked by the causeeffect

relation: bother ! bother_af. The interest-ed node (which stands for the user is interested

in the topic of the question) plays the role of a

conditioner node for the conditional probability

table of the bother_af node. In other words, it

modulates the probability that the user is in the

state bother after the question. So, in formal

terms, we have that P(bother_af y | bother n,interested y) is lower than P(bother_af y |bother n, interested n).4 Let us now resumethe considerations made so far by representing

in Figure 2 the structure of the bother-effect

network.

4. Enriching and refining user profiles

The website starts the profile building process by

asking the user to enter a set of easy-facts and

then by propagating them through the user-

interests network, inferring in this way the user

profile, or in other words inferring, for each

topic, the probability that the interest in the

topic is present in the users mind. At this stage

the question arises: is the inferred profile suffici-

ently defined? That is, have the user-interestsbeen inferred with a sufficiently low uncertainty

margin? In practice, is it necessary or not to ask

one or more less-easy-questions in order to

capture further information from the user and

infer a more accurate profile? The answer is that

less-easy-questions should be asked only if it is

worth doing it. Let us define the concept of it

is worth doing it. In a broad sense, the purpose

of asking questions is to know more about

interested

bother

bother_af

Figure 2: Structure of the bother-effect network.

3The distinction between easy-facts and less-easy-facts istypical in the medical domain. In fact, information providedby the patient at the beginning of a medical examination is aneasy-fact, whereas information coming from a clinical test isa less-easy-fact. For example, the answer to the question Do

you have fever? is an easy-fact, but the answer to the questionIs the bilirubin level in your blood normal? is a less-easy-fact(you have to undergo a blood test, you have to pay for it, ittakes some time etc.). 4Obviously, P(bother_af y | bother y) 1.


5/19

user-interests, or, more precisely, to decrease

uncertainty about user-interests. In order to

simplify the problem we wonder if among all the

user-interests there is a subset of interests for

which the margin of uncertainty is particularly

important to be low. To this end let us notice

that, since goals are at the basis of user actions,

goal-interest nodes (i.e. goal-topics) are strate-

gic, and as a consequence we aim at having

low uncertainty about these nodes. So a less-

easy-question is worth asking if, in spite of the

unavoidable risk of bothering the user, it is

expected to produce a significant decrease of

uncertainty about the goal-interest nodes. Un-

certainty about the states of a node is well

represented by the concept of entropy (Shannon

& Weaver, 1949), which is the hub concept of

information theory. So a less-easy-question is

worth asking if the expected decrease in entropy

of the goal-interest nodes is significant enough to

compensate the unavoidable risk of bothering

the user. The next section will formally define the

problem in the decision theory framework (von

Winterfeldt & Edwards, 1986) based on entropy.

4.1. Benefit expected from a less-easy-questionLet G be a node and let g denote a state ofG, for

short g 2 G. The entropy of G is given by

ENTG X

g2G

Pg log2Pg 1

Let us use the value function ENT which in-creases with preference. Let Q be the set of less-

easy-questions and let q be a less-easy-question,

i.e. q 2 Q. Let Jq be the set of possible answers toq, and let jq be an answer to q, i.e. jq 2 Jq. Thevalue, with respect to the node G, of the

information jq is given by VG(jq):

VGjq ENTGjjq 2

The expected value, with respect to the node G,

of the question q is given by EVG(q):

EVGq X

jq2JqVGjq Pjq 3

So the expected benefit of q, with respect to the

node G, is given by

EBGq EVGq ENTG 4

Let us consider a set of nodes G. If we assume

that each node has the same weight of importan-ce, the overall expected benefit from a question q

is given by

EBq X

G

EBGq 5

Let us now turn back to the user-interests

network. Let G denote a goal-interest node. For

each question q 2 Q let the set Jq of the possibleanswers be {yes, no}. If the user answers yes

(no) to q, we set the related tool-fact node to y

(n). For example, let us consider the less-easy-

question Do you use X? and the related tool-

fact node use_X (standing for the user uses

X). If the user answers yes, use_X is set to the

state y; if he=she answers no, use_X is set tothe state n. So (2) becomes

VGDo you use X? yes

ENTGjuse X y 20

VGDo you use X? no

ENTGjuse X n 200

Moreover, the probability of using X given the

interest in X represents the probability of

obtaining from the user the answer yes to the

question Do you use X? given the interest in X,

i.e. we do not consider lying. So (3) becomes

EVGq ENTGjuse X y Puse X y ENTGjuse X n Puse X n

30

4.2. Bother-effect expected from a less-easy-

question

From the user-interests network we can obtain,

through (5), the expected benefit of the question

q: Do you use X?. However, in order to assess if

q is worth asking or not, we have to take intoaccount the probability that, after q is asked,


6/19

the mental state of the user is bothered (because

of q). In order to properly calculate this

probability we have to take into account two

influence factors: the first concerns a sort of

cumulative effect in cases of multiple questions,

and the second concerns the influence of the

question topic. Let us examine the first factor.

The probability that the user is bothered

because of q is higher after the second question

than after the first, and so forth. Let us model

this cumulative effect in the following way. Let

us initially start with P(bother y) 0; theneach time a question is asked let us replace the

probability distribution of the bother node with

that of the bother_af node. Let us now pass to

examine the second factor.

The probability that the user is interested in

the question topic is given by the probability

distribution of the related tool-interest node

(Figure 1). So, given the question q, Do you use

X?, let us replace the prior probability distribu-

tion of the node interested (Figure 2) with the

probability distribution of the tool-interest node

the user is interested in X (Figure 1). After

having assigned the proper probability distribu-

tions to the nodes bother and interested, let us

propagate. The expected bother-effect of q istherefore given by the value ofP(bother_af y)resulting from the propagation. Finally, let us

note that P(bother_af y)>P(bother y).

4.3. Profit expected from a less-easy-question

In order to obtain a single number quantifying

the opportunity of asking a less-easy-question

q, we have to combine, in terms of utility,

the expected benefit (entropy decrease) with theexpected bother-effect of q. Let us consider

the utility function of the entropy of a node G. If

the entropy ofG is minimum, i.e. 0, the utility is

maximum, i.e. 1. If the entropy of G is maxi-

mum, i.e. 1, the utility is minimum, i.e. 0. So,

considering, for simplicity, a linear utility func-

tion, we have that the expected benefit in terms

of utility is equal to the expected benefit in terms

of pure entropy. Let us now pass to consider

the utility function of the bother-effect. If thebother-effect is absent, i.e. the state of the node

bother_af is n, the utility is maximum, i.e. 1. If

the bother-effect is present, i.e. the state of the

node bother_af is y, the utility is minimum, i.e.

0. So, the expected bother-effect in terms of

utility is given by P(bother_af y | q) * 0 P(bother_af n | q)

*1 1 P(bother_af y|q).

Let us call the expected cost of q the quantity

ECq Pbother af y j q 6

Finally, let us notice that the entropy utility

and the bother-effect utility have, in general,

different importance. Let us therefore consider

the importance weight distributions kENT and

kdist, where kENT represents the importance

weight assigned to the entropy and kdist repre-

sents the importance weight assigned to the

bother-effect.5 On the basis of all these con-

siderations, the expected net advantage, which

will be called expected profit (EP), from a

question q is given by

EPq EBq kENT ECq kdist 7

where kENT kdist 1. In conclusion, a questionq is worth asking if EP(q)> 0. Moreover, the ques-tion with the maximum EP is the preferred one.

4.4. Sequential asking algorithm

In general, there is a certain number ( > 1) ofless-easy-questions. So a website during the pro-

cess of solving the target problem (i.e. inferring

the user profile) has to solve the meta-problem of

choosing, under trade-off between benefit and

cost, the more opportune less-easy-question to

ask next in order to acquire a new piece of infor-

mation which will contribute to solving the

target problem, taking into account also the fact

that the answer provided by the user has the side-

effect of changing the benefits expected from

asking other questions. Let us establish that ques-

tions are asked one at a time. This strategy is

called myopic. It addresses the following ques-

tion: if you are allowed to ask at most one

5The importance weights may be elicited with the probabilityindifference method. Considering for example kENT we havethat the gambling situation is G: [p* {entropy min, botherabsent}; (1 p)*{entropy max, bother present}], whereasthe certainty situation is I: [{entropy null, bother present}for certain].


7/19

question, which question would you choose? As

is well known the analysis of all the possible

sequences of questions is, in practice, intractable

because the number of sequences grows expo-

nentially with the number of tests (Heckerman

et al., 1992). So, many real-world applications

use the so-called myopic value-of-information

approach. The myopic approach is in practice a

good heuristic (Gorry & Barnett, 1985).

We are now ready, on the basis of the consi-

derations made so far, to define at a conceptual

level the basic algorithm of sequential asking.

Let x denote the current available knowledge,

i.e. x all the information entered so far.

0. Ask all the easy-questions, acquire the

answers and set the related user-fact nodesto the appropriate states.

1. Propagate the entered information through

the user-interests network.

2. If all the less-easy-questions have been

asked, then EXIT.

3. For each less-easy-question q not yet asked

calculate EP(q | x).

4. If each EP(q | x)r 0, then EXIT.

5. Ask the user the q with the maximum EP,

acquire the answer and set the related tool-fact node to the appropriate state.

6. Go back to point 1.

4.5. Asking over time

After a certain number of questions the prob-

ability that the user is bothered (because of the

questions), i.e. P(bother_af y | q), increases somuch that for each q it happens that EP(q |

x) r 0, and as a consequence the sequential

asking stops. However, as time passes, it seemsreasonable to suppose that the probability that

the user is still in the state bothered (because of

the questions) decreases. As a consequence, it

might happen that an EP that was negative in the

past is now positive. In other words, after a cer-

tain time interval there might be some questions

that are worth asking, even if they were not in

the past. Such a situation is modelled by dyna-

mically calculating the values of the prior proba-

bility distribution of the node bother on the basisof an appropriate decreasing temporal function.

The practical consequence of all this is that the

user profiling process is not accomplished in a

single and initial phase but in a sequence of

sessions over time. In each session the sequential

asking algorithm is activated and, as a conse-

quence, the user profile is enriched and refined.

In practice, when, after a certain time from the

last user profiling session, the user accesses the

website, the website evaluates if there are some

less-easy-questions that are worth asking. If yes,

the website personalizes the user home page with

a message inviting the user to resume the

dialogue in order to better refine his=her profile.

5. An application of the proposed approach

within a supercomputing portal

In this section we will describe a significant

application of the approach so far presented.

The proposal has been applied within the

CILEA6 supercomputing portal:7 a portal de-

voted to disseminating supercomputing culture

and events in the community of researchers

working in the supercomputing field. More pre-

cisely, the proposal has been applied in the con-

text of a news recommender system embedded in

the portal. The proposal has been implementedin a prototype (developed by the author)8 tested

on a PC off-line; then the prototype was inte-

grated into the portal. Supercomputing news is

recommended in a personalized manner both

through a personalized presentation on the home

page and by activating a personalized alerting

service. The application regards a particular sub-

field of supercomputing, that of computational

fluid dynamics (CFD). CFD encompasses many

heterogeneous application fields (aerospace, bio-medical etc.), various phenomenological areas

(turbulent flows, porous media flows etc.),

various computational techniques (large eddy

simulation, coupled transport equations etc.),

6CILEA is an Italian interuniversity consortium for informa-tion and communication technologies that provides super-computing services to its users in order to promotesupercomputing culture and services.7www.supercomputing.it

8Tools used to implement the proposal: ASP pages, Clanguage, MySQL database, Hugin inference engine.


8/19

specialized software packages (Fluent, Fidap

etc.) and so forth. A CFD user may be an aero-

spatial engineer working in the aerospace in-

dustry; another may be a biophysicist working in

the biomedical field etc. A biophysicist probably

pursues the goal of developing computational

models of porous media flows, uses the coupled

transport equations technique and runs his=hermodels with the package Fidap. Conversely, an

aerospatial engineer probably focuses on mod-

elling turbulent flows, and it is probable that

he=she is not interested in the tools used by abiophysicist. So the need for building accurate

user profiles for recommending the right news to

the right users arises. Taking into account the

general network structures of Figures 1 and 2,

the CFD user-interests network and the bother-

effect network illustrated in Figure 3 have been

produced. Notice that the user-interests network

encompasses six nodes (on the top) representing

field-facts, 18 nodes (on the bottom) represent-

ing tool-facts and 30 nodes (in the middle)

representing interests, i.e. topics.

Tool-facts have been distinguished as easy-

facts and less-easy-facts. The facts concerning

the use of software packages (i.e. the nodes

whose names are prefixed by use_s_) have beenconsidered as easy-facts, whereas the facts

concerning the use of computational techniques

(i.e. the nodes whose names are prefixed by

use_t_) have been considered as less-easy-facts.

Referring to the algorithm defined in Section 4.4,

let us start (step 0) by asking the user to enter the

set of easy-facts (Figure 4). The entered facts are

then propagated through the network, inferring

in this way the CFD profile of the current user

(step 1). So now the portal knows, for each of the30 topics, the probability that the current user is

interested in it. The loop of the remaining steps is

then performed, possibly asking less-easy-ques-

tions depending on the specific current case. For

example, if the user declares that he=she worksin the aerospace field and does not use any of

the software packages listed in Figure 4, the

portal finds that it is worth trying to decrease

uncertainty about the goal-interests and begins

to ask less-easy-questions (see Figure 5 foran example).

In this application, user profiles are used by

the portal to recommend CFD news. Such news

is classified by a CFD expert. More precisely,

given a piece of news N, a CFD expert, for each

topic T, defines (through subjective estimation)

the probability that N concerns T. The classified

piece of news is then entered in the website

database. The expected utility EU of each news

N for a user U is calculated according to the

following algorithm:

EU(NU) 0For each T do

EU(NU) EU(NU) EU(NU T)End for

where

EUNUT

UNUTj N concerns T yes; interest in T

yes

PN concerns T yes

Pinterest in T yes

where U(NU T|N concerns T yes, interest inT yes) stands for utility of N for the user U,given that N concerns T and the user Uis inter-

ested in T. For simplicity U(NUT | . . .) has been

assumed to be equal to 1 for any piece of news

and any user. The website has therefore at its dis-

posal a personalized expected utility of each piece

of news, calculated on the basis of the inferred

user profile. So, when a user accesses the portal,

the portal uses such EU values to recommend

news to the user. Recommendation is carried out

by presenting news with different emphasis and

in decreasing order of relevance (Figure 6).

As stated in Section 4.5, user profiles are

enriched and refined over time. In practice, when

a user accesses the portal, the portal evaluates if

there are some less-easy-questions that are worth

asking9 and, if that is the case, places at the

beginning of the news list a message inviting the

user to resume the dialogue (Figure 7).

News recommendation is also accomplished

via alerting e-mail. Users can take advantage of

9The algorithm in Section 4.4 is performed by starting fromstep 2.


9/19Figu

re3:Thefigureshowsthebother-effectnetwork(topleft)andtheCFDuser-interestsnetworkoftheapplicationoftheproposaltothe

CILEAsupercomputingportal.TheinterestsnetworkinstantiatestheabstractnetworkofFigure1.Moreprecisely,startingfromthetop,the

nodesofthetoplayer(whosename

sbeginwithWork_in)areoftypefield-fact.Comingdown,thenodesofthefollowingthree

layersrepresent

interestsandareoftypefield-interest,goal-interest,tool-interestrespectively.Thenodesofthebottomlayerareoftypetool-fact.Inparticular

theeightnodesontheleftconcerntheuseofcomputationaltechniq

ues(less-easy-facts),w

hereas

theremainingtenontherightc

oncerntheuseof

softw

arepackages(easy-facts).


10/19

the intelligent alerting service: for each user, the

portal considers the set of news not yet seen bythe user, and for each piece of news calculates

the expected utility the piece of news has for the

user; if the expected utility is greater than a given

threshold, the portal sends the relevant piece of

news to the user via e-mail.

6. Experimental results

The proposal implementation presented in Sec-tion 5 has been working for more than a year

inside the supercomputing portal. In order to

better test the proposal, several heterogeneous

institutions (research centres, university depart-

ments, organizations operating in the environ-

mental physics field, industries, CFD software

providers etc.) working in the CFD field have

been involved. During that period of time

several researchers working with those institu-

tions have used the proposal implementation.They have been interviewed, collecting their

comments both about the proposed approach to

user profiling and about the accuracy of various

types of profiles.

6.1. User feedback about the approach

Most users have expressed a favourable impres-

sion about the approach. In particular, they

have appreciated both the fact that their inferredprofiles were ready soon after their initial login

Do you use (in your work activity) the computational technique:

Volume of Fluid Methods?

yes

no

I don't feel like answering this question at the moment

I would prefer not to answer this question, even in the future

SEND

Figure 5: An example of a less-easy-question.

If you wish you may enrich your profile record by selecting in the area ofComputational Fluid Dynamics the application field/s

you work in

Aerospace (Environmental Control System, Propulsion, Pumps, Rotor-Airframe Interaction, External Aerodynamics ...)

Biomedical (Blood Handling Equipment, Physiological Flows, Toxicology Research ...)

ChemicalProcess (Combustion, Drying, Emission Control, Filtration, Reaction, Water Treatment ...)

Environment (Atmospheric Plume Dispersion, Coast Erosion, Hydro-Geological Applications, Irrigation Components,

Meteorological Applications, Petroleum Platforms, Sea Technologies, Service Reservoirs, Wastewater Pump-stations, Weirs, ...)

Automotive (Engine Cooling, External Aerodynamics, Intake Valves, Hunderhood Flow Simulation, Vehicle Interior, Vapor

Dispersion, Windshield Washer Nozzles...)

Energy (Boilers, Burners, Coal Transport and Classification,Combustor, Hydro-power, Incinerators, Nuclear Reactors,

Turbomachinery ...)

If you wish you may enrich your profile record by selecting the software package/s you use for running your models in the area of

Computational Fluid Dynamics

KIVA

FLUENT

FIDAP

ADINA

CFXCFX_TASKFLOW

CFX_TURBOGRID

GAMBIT

TGRID

STAR_CD

Figure 4: The form used for entering the initial set of easy-facts, i.e. the subset of six field-facts (top)

followed by the subset of ten tool-facts (bottom).


11/19

and the fact of being directly involved (through

specific knowledge-based questions) in the pro-

file refining process over time. They have

declared that they are not bothered by being

asked questions over time from the website;

conversely they have declared that they arepleased to cooperate with the website. They have

perceived the question-asking attitude from the

website as a sort of constant and competent

attention from the website to their needs (the

website behaves like an expert that looks after

their specialist interests). In fact questions are

asked in an unobtrusive manner because of both

the fact that they are asked over time according

to the sequential asking algorithm and the fact

that a user is not forced to answer them. A user isfirst asked if he=she agrees with the website

resuming the dialogue. Moreover, for each

question the user is offered the possibility of

temporally suspending the answer if he=she doesnot feel like answering at that moment, or telling

the website not to ask that question any longer

(see the choice options appearing in Figure 5).We note that the type of user population that has

been considered in this experimental year has the

following characteristics. Users are researchers

in the CFD field. In general they have used the

alerting service to be automatically notified of

possibly relevant newly created news, and have

not accessed the portal very often. Both users

and the website share a very specific field of

knowledge (CFD knowledge), and it is just this

knowledge on which the dialogue between usersand the website is based.

GOOD NEWS FOR YOU!!!

Int. conf. on Aerospace supercomputing

FURTHER NEWS OF PROBABLE INTEREST FOR YOU

Tutorial course on Turbulent Flows modelling

OTHER NEWS YOU MIGHT BE INTERESTED IN

Summer school on Fluent

OTHER NEWS

Advanced course on porous media flows modelling

Int. Conf. on Biomedical supercomputing

Figure 6: An example of personalized news recommendation, given the fact that the user works in

the aerospace field.

Figure 7: An example of a message inviting the user to resume the dialogue.


12/19

6.2. User feedback about profile accuracy

Profile accuracy has been checked in an empiri-

cal manner. Initially the probability tables of the

user-interests network were elicited, with the aid

of suitable forms, from the CILEA CFD expert.

Then they were tuned with the cooperation of aset of end-users who had used both probability

soundness tests and recommendation sound-

ness tests. In order to clearly explain what these

soundness tests consist of, let us consider three

sample sets whose definitions are illustrated in

the following section.

6.2.1. Sample set definitions Some end-users

belonging to heterogeneous CFD sub-fields havebeen involved in providing feedback about

profile accuracy. Let SEU be the sample set of

such end-users.

Some typical hypothetical cases of user-facts

have been defined. For example, let us consider a

hypothetical case defined by the user-facts

working in the aerospace field and using the

software tool Fluent (e.g. the case of an

aerospatial engineer), another defined by the

user-facts working in the biomedical field andusing the software tool Fidap (e.g. the case of a

bioengineer) and so on. Let SHC be the sample

set of such hypothetical cases.

Some suitable hypothetical piece of news has

been created. More precisely, for each CFD

topic T hypothetical news NT concerning T with

probability 1 and the other topics with probabi-

lity 0 has been created so that the expected utility

of NT is equal to the probability of interest in

T

yes. For example, for the topic turbulence

flow modelling news like International Confer-

ence on Turbulence Flow Modelling has been

created, for the topic porous media flow model-

ling news like Summer School on Porous

Media Flow Modelling has been created, and

so forth. Let SHN be the sample set of such

hypothetical news.

6.2.2. Soundness tests Soundness tests have

been performed according to the followingalgorithm.

For each hypothetical case in SHC:

enter the related user-facts and propagatethem through the user-interests network, in

this way producing the related inferred

profile;

activate the recommender system with theSHN set, in this way producing the list of

hypothetical news recommended according

to the inferred profile;

ask each end-user in the SEU to examine the inferred profile, to check if the inferred

user-interest probabilities could be con-

sidered acceptable in the light of the

domain common sense, given the entered

facts (probability soundness test);

the emphasis given to each hypotheticalpiece of news by the recommender system,

to check if the emphasis degree could be

considered compatible with the inferred

profile in the light of the domain common

sense (recommendation soundness test).

The feedback collected from the end-users of the

SEU was then used to tune the probability tables

of the user-interests network.

7. Strengths of the proposed approach and

suitable application domains

A basic humancomputer interaction element

underlying the proposed approach is represent-

ed by the fact that the main aim of the website is

that of cooperating with a user (without bother-

ing him=her) to build an accurate profile of him=her. The website aims at building an accurate

user profile to create a collaborative user rela-tionship over time, so that the user perceives the

website as a collaborator watching over his=her interests and looking after his=her profileover time. In fact, even after the initial profile

construction session, the website does not miss

future favourable opportunities to invite the user

to resume the dialogue in order to better refine

his=her profile. This approach of involving theuser as a partner in the process of building

his=her profile by asking him=her questions evenover time (without bothering him=her) is more


13/19

suitable to websites specialized in certain

areas interesting a very specific community, such

as, for example, a specific scientific community.

In these cases, in general, the domain knowledge

of the application expands more in depth than

in broadness, so questions are not excessively

numerous, are more specific, the answers are

more significant because they refer to a specia-

lized universe of knowledge, and the dialogue

between the user and the website resembles

a dialogue between two experts operating in

the same field. As a consequence in these cases

it is less probable that a user gets bored when

the website asks him=her some competentquestions.

Another strength of the proposed approach is

represented by the fact that the website has at its

disposal a complete inferred profile of the user

soon after the initial session, where the user

enters his=her user-facts. In other words, thewebsite does not need a statistically significant

number of accesses from the user to build a com-

plete user profile. This is a very useful feature for

those websites that have a low number of access-

es, possibly because of the type of population

accessing the website, the type of news or ser-

vices provided by the website etc. For example, adeep-knowledge-based website, concerning a

specific field and having the main purpose of

promoting and disseminating the specific field

culture, may have a user population mostly

consisting of professional people operating in

the same field (e.g. researchers in the field). In

general, such users do not need to access the

website for their everyday tasks. They access the

website for reading recent news. News concerns,

in general, cultural events in the field: confer-ences, courses, books etc. infrequent events

which do not occur daily. Moreover, the website

has an alerting service which users might prefer

to use instead of directly accessing the website.

For all these reasons the website may have a low

number of accesses, but nevertheless the website

needs to have at its disposal an accurate user

profile (for providing personalized services)

from the beginning of the userwebsite relation-

ship. The presented approach represents asolution to such a problem.

Summarizing the considerations made so far,

let us conclude that the approach presented in

this paper is suitable for websites characterized

by the following facts:

they are based on specialist knowledge (such

as for example scientific websites); the dialogue with users is on the basis of

domain knowledge that the website and the

users have in common;

they may have a low number of accesses, butnevertheless they need to have at their

disposal a complete inferred user profile

soon after the user enters his=her initial data.

Conversely, in cases of websites dealing with a

broad and generic domain of knowledge, a verylarge population of visitors and a statistically

significant number of accesses, such as for

example popular commercial websites, famous

book-store websites etc., alternative approaches

are used. In general, these approaches are based

on collaborative filtering techniques or other

techniques of user profiling based on user be-

haviour observation (i.e. implicit data acquisi-

tion), i.e. techniques that avoid any type of

active involvement from users. Some of suchtechniques will be reviewed in Section 8.

8. Related work and discussion

In this section we will discuss the ideas underly-

ing the proposal in the wider context of human

computer interaction research with a particular

focus on user profiling. User profiling on the

Web is a topic which has received much

attention in several international journals, con-ferences and books. The topic has been ap-

proached in the light of various technologies and

has found application in several heterogeneous

fields. An exhaustive overview of the state of the

art is beyond the scope of this paper. We will

limit ourselves to comparing the proposal to

some significant approaches. User profiling is

typically either knowledge-based or behaviour-

based. In knowledge-based approaches informa-

tion about users is acquired explicitly throughquestionnaires or single questions. Behaviour-


14/19

based approaches use implicit observations of

user behaviour, commonly using machine-learn-

ing techniques to discover useful patterns in the

behaviour (data mining techniques are applied

to log files to extract patterns). We can also clas-

sify works on user profiling from two orthogonal

points of view: the technology used to build pro-

files and the target task the profiles are built for.

For example, technologies may be Bayesian net-

works, case-based reasoning, ontologies, data

mining etc., while target tasks may be informa-

tion retrieval, e-learning, e-commerce, recom-

mender systems etc. According to these points of

view, the proposal presented in the present paper

may be classified as knowledge-based, using the

Bayesian network technology and addressing

the recommender system target task.

Godoy et al. (2004) propose two intelligent

agents, PersonalSearcher and NewsAgent, that

assist users in tasks of filtering and organize

information available on the Web. Personal-

Searcher (a personalized Web searcher) is an

interface agent that helps users who are search-

ing the Web for relevant information by filtering

a set of documents retrieved from several search

engines according to users interests. NewsAgent

(a personalized digital newspaper generator) isan interface agent that selects those articles that

are relevant to a user from several online news-

papers. The agents incrementally build a hier-

archy (a tree) of users relevant topics by means

of a textual case-based reasoning technique (a

specialization of case-based reasoning for tex-

tual documents). Profiles are adapted as agents

interact with users over time. Both agents

observe users behaviour while they are reading

Web documents, recording the main featurescharacterizing these experiences. A user profile

consists of a set of weighted topics relevant to a

user. This approach is therefore behaviour-

based, uses the case-based reasoning technology,

and addresses the information retrieval and

intelligent assistant target tasks. In our proposal

also user profiles are represented in terms of topics,

i.e. user interests. In our proposal, however, a

topic hierarchy is represented by a network.

Wong and Butz (2000) propose a method forrepresenting a user profile as a Bayesian net-

work. Such a network is learned from a sample

of documents that are judged by the user to be

relevant or irrelevant. The proposed method

addresses the information retrieval target task.

An approach addressing the information

retrieval target task and integrating the case-

based reasoning and Bayesian network techni-

ques to build user profiles incrementally is

presented in Schiaffino and Amandi (2000).

Case-based reasoning provides a mechanism to

acquire knowledge about user actions that are

worth recording to determine his=her habits andpreferences. Each case records the attributes or

keywords used by a user to perform queries. A

query is classified according to its similarity to

previous recorded queries. The Bayesian net-

work provides a tool to represent relationships

between items of interest. It is built gradually as

a user queries the database. Information stored

in the form of cases is used to gradually build

and update the Bayesian network as the interests

of the user change over time. The user profile

consists of a statistical profile (type of queries

frequently made etc.) and an inferred (via the

Bayesian network) profile. A profile is used to

suggest the execution of relevant queries to a user

at an appropriate moment. The authors focustheir attention on users who need data stored in

the database to fulfil their everyday tasks, or at

least who often use the database. Conversely,

our proposal focuses on a type of population of

users who do not access a website very often. In

our case the website has at its disposal a general

Bayesian network (knowledge base) that is then

instantiated to a specific user and generates a

specific inferred profile, but does not need to be

built incrementally. This is an advantage inapplications where the user population does not

produce a great number of accesses. In these

cases the solution is just the knowledge-based

one: asking users explicitly for some informa-

tion, involving the user in a cooperative relation-

ship. So, even soon after the first contact with a

new user (registration phase and data entry of

Figure 4) the website has at its disposal an

inferred profile of the user. Another difference

concerns the target task: our proposal addressesthe task of news recommendation.


15/19

In Nokelainen et al. (2002) an adaptive online

questionnaire system, EDUFORM, is pre-

sented. The proposal addresses the educational

target task and uses Bayesian probabilistic

models. The authors face the problem of one

size fits all on-line questionnaires equipped with

numerous propositions. In particular, EDU-

FORM is a Web-based data gathering tool,

which performs adaptive and dynamic optimi-

zation of the number of questionnaire proposi-

tions during the data gathering process.

EDUFORM uses probabilistic Bayesian meth-

ods to create user profiles that are then used to

dynamically optimize the set of propositions

that are presented to a user in order to maximize

information extraction. In our approach the

goal of minimizing the number of questions

and maximizing information extraction is

achieved by using the value-of-information

framework.

The approach presented in Adomavicius and

Tuzhilin (1999, 2001) is well suited to the e-

commerce target task. It is behaviour-based and

uses data mining methods. More precisely, the

authors present a method for constructing user

behavioural profiles using data mining techni-

ques. Profiles are specified with sets of ruleslearned from transactional histories. Since many

rules can be spurious, irrelevant or trivial, a

method for validating them separating good

rules from bad ones is presented.

Data mining techniques are also used in

Nasraoui et al. (2002) where the authors present

a framework (based on fuzzy relational cluster-

ing) for mining typical user profiles from the

vast amount of historical data stored in server

access logs.Soltysiak and Crabtree (1998) use a heuristics-

based clustering method to generate user interest

profiles. The method is applied to the e-mails a

user sends or receives, and the WWW pages

he=she browses. They use a keyword extractiontechnique for identifying relevant keywords

within a document. A document is then repre-

sented as a vector of keywords. The vectors are

used as the basis of grouping documents into

clusters. A sufficiently large cluster of docu-ments represents a users interest.

The same authors (Crabtree & Soltysiak,

1998) address the information retrieval target

task, focusing in particular on tracking interest

themes over time through measuring the simi-

larity of interest themes across time periods.

Heuristics-based techniques are also used in

Kostoff et al. (2001) and Rousseau et al. (2004)

addressing the intelligent assistant and informa-

tion retrieval target tasks respectively.

In Esposito et al. (2003) a comparison of the

effectiveness of two supervised methods for

learning user profiles, inductive logic program-

ming and Bayesian classifier, is accomplished.

The comparison focuses on the two different

learning strategies to infer models of user

interests from textual book descriptions. Experi-

mental results are conducted in the context of a

content-based profiling system for a virtual

bookshop on the Web.

Recently particular attention has been paid to

the use of ontology-based technologies in user

profiling. Middleton et al. (2004) explore an

ontological approach to user profiling within

recommender systems, working on the problem

of recommending online academic research

papers. They present two experimental systems

that create user profiles from unobtrusivelymonitored behaviour and relevance feedback,

representing the profiles in terms of a research-

paper topics ontology. Papers are classified

using ontological classes. The database of re-

search papers is classified using a research-paper

topics ontology and a set of training examples.

Recorded Web browsing and relevance feedback

elicited from users are used to compute daily

profiles of users research interests. Interest pro-

files are represented in ontological terms, allow-ing other interests to be inferred. The interest

profiles are visualized to allow elicitation of

direct profile feedback.

The same authors face the theme of the use of

ontologies in recommender systems (Middleton

et al., 2001), and in Middleton et al. (2003) they

explore the idea of profile visualization to

capture further knowledge about user interests.

Let us classify in Table 1 the set of papers con-

sidered so far. The papers, along with the presentproposal, are located in the table according to


16/19Tab

le1:

Theworksexaminedcla

ssifiedaccordingtothetechno

logytheyuseandthetargettasktheyaddress

e-C

ommerce

Informationretrieval

Recommendersystem

e-Learningand

education

Intelligentassistant

Ontologies

Middletonetal.,

2004

Middletonetal.,

2001

Middletonetal.,

2003

Bayesiannetworks

Wong&Butz,

2000

a

Case-basedreasoning

Godoyet

al.,

2004

G

odoyetal.,

2004

Case-basedreasoningand

Bayesiannetworks

Schiaffino

&

Amandi,2000

Bayesianprobabilistic

mod

els

Nokelainen

etal.,

2002

Dataminingandrule

discovery

Adomavicius&

Tu

zhilin,

1999

Adomavicius&

Tu

zhilin,

2001

Web

logdataminingand

fuzzyclustering

N

asraouietal.,

2002

Heu

ristics-basedclustering

Crabtree

&

Soltysiak,

1998

Rousseau

etal.,

2004

Soltysiak&

C

rabtree,

1998

K

ostoffetal.,

2001

Indu

ctivelogicprogramming

Es

positoetal.,

2003

Bayesianclassification

Es

positoetal.,

2003

aWorkpresentedinthispaper.


17/19

the technology and the target task by which they

are characterized. Although the set of applica-

tions that have been considered is not exhaus-

tive, the table gives an idea of the variety of

technological approaches and target tasks of

user profiling. It also gives an idea of both the

enormous amount of work done in the user

profiling field and the difficulties underlying the

problem of building accurate user profiling.

9. Conclusions

This paper has presented a user profiling app-

roach based on a deep-knowledge model of user

interests (i.e. domain topics) and a sequentialasking algorithm using the value-of-information

theory based on Shannon entropy. The app-

roach has been implemented and tried out for

over a year in a real context. Experimental re-

sults have indicated that the proposal is parti-

cularly suitable for websites addressing domains

with specific deep knowledge (like scientific web-

sites) and with a user population that is charac-

terized by sharing the same knowledge and

producing a low number of accesses (i.e. low ifconfronted with the high number of accesses of a

typical commercial website). The approach has

been discussed in the context of the scientific

literature concerning humancomputer inter-

action and in particular user profiling. The

discussion has highlighted that the proposed

approach differs from others mostly because of

its emphasis on involving users in collaborative

processes for building and refining their profiles.

Acknowledgements

In alphabetical order, thanks to Dr Enrico

Cavalli for the integration of the prototype into

the portal running on the server computer,

thanks to Dr Paolo Ramieri for his role of

CFD expert providing CFD knowledge, thanks

to the anonymous reviewers for their valuable

comments, and thanks to the users involved fortheir cooperation and feedback.

References

ADOMAVICIUS, G. and A. TUZHILIN (1999) User pro-

filing in personalized applications through rule

discovery and validation, in Proceedings of the Fifth

ACM SIGKDD International Conference on Knowl-

edge Discovery and Data Mining, San Diego, CA,

377381.ADOMAVICIUS, G. and A. TUZHILIN (2001) Using data

mining methods to build customer profiles, IEEE

Computer, 34 (2), 7482.BILLSUS, D., C.A. BRUNK, C. EVANS, B. GLADISH and

M. PAZZANI (2002) Adaptive interfaces for ubiqui-

tous Web access, Communications of the ACM, 45

(5), 3438.BRUSILOVSKY, P. and M.T. MAYBURY (2002) From

adaptive hypermedia to the adaptive Web, Commu-

nications of the ACM, 45 (5), 3033.

CRABTREE, I.B. and S.J. SOLTYSIAK (1998) Identifying

and tracking changing interests, International Jour-nal of Digital Libraries, 2, 3853.

ESPOSITO, F., G. SEMERARO, S. FERILLI, M. DEGEM-

MIS, N. DI MAURO, T.M.A. BASILE and P. LOPS

(2003) Evaluation and validation of two approaches

to user profiling, in Proceedings of the ECML=PKDD-2003 First European Web Mining Forum, B.

Berendt, A. Hotho, D. Mladenic, M. van Someren,

M. Spiliopoulou and G. Stumme (eds).

FINK, J., J. KOENEMANN, S. NOLLER and I. SCHWAB

(2002) Putting personalization into practice, Com-

munications of the ACM, 45 (5), 4142.

GODOY, D., S. SCHIAFFINO and A. AMANDI (2004)Interface agents personalizing Web-based tasks,

Cognitive Systems Research, 5, 207222.

GORRY, G.A. and G.O. BARNETT (1985) Experience

with a model of sequential diagnosis, in Computer-

assisted Medical Decision Making, J.A. Reggia and S.

Turhim (eds), Berlin: Springer, Vol. 1, pp. 206222.HECKERMAN, D.E., E.J. HORVITZ and B.N. NATHWA-

NI (1992) Toward normative expert systems, Part I:

The Pathfinder project, Methods of Information in

Medicine, 31 (2), 90105.

HIRSH, H., C. BASU and B.D. DAVISON (2000) Learn-

ing to personalize, Communications of the ACM, 43(8), 102106.

HORVITZ, E., A. JACOBS and D. HOVEL (1999) Atten-

tion-sensitive alerting, in Proceedings of UAI 99

Conference on Uncertainty in Artificial Intelligence,

305313.

JENSEN, F.V. (2001) Bayesian Networks and Decision

Graphs, Berlin: Springer.

KOSTOFF, R.N., J.A. DEL RIO, J.A. HUMENIK, E.O.

GARCIA and A.M. RAMIREZ (2001) Citation mining:

integrating text mining and bibliometrics for re-

search user profiling, Journal of the American Society

for Information Science and Technology, 52 (13),11481156.


18/19

MIDDLETON, S.E., D.C. DE ROURE and N.R. SHAD-

BOLT (2001) Capturing knowledge of user preferences:

ontologies in recommender systems, in Proceedings ofthe International Conference on Knowledge Capture

K-CAP 2001, Victoria, B.C., Canada.

MIDDLETON, S.E., N.R. SHADBOLT and D.C. DE

ROURE (2003) Capturing interest through inference

and visualization: ontological user profiling in

recommender systems, in Proceedings of the Interna-

tional Conference on Knowledge Capture K-CAP

2003, Sundial Beach Resort, Sanibel Island, FL.MIDDLETON, S.E., N.R. SHADBOLT and D.C. DE

ROURE (2004) Ontological user profiling in recom-

mender systems, ACM Transactions on Information

Systems, 22 (1), 5488.

MUSSI, S. (2003) Providing websites with capabilities of

one-to-one marketing, Expert Systems, 20 (1), 819.NASRAOUI, O., R. KRISHNAPURAM, A. JOSHI and T.

KAMDAR (2002) Automatic Web user profiling and

personalization using robust fuzzy relational clus-tering, in e-Commerce and Intelligent Methods, J.

Segovia, P. Szczepaniak and M. Niedzwiedzinski

(eds), Studies in Fuzziness and Soft Computing,

Berlin: Springer.NOKELAINEN, P., H. TIRRI, M . MIETTINEN and T.

SILANDER (2002) Optimizing and profiling users

online with Bayesian probabilistic modeling, inProceedings of the NL 2002 Conference.

PEARL, J. (1988) Probabilistic Reasoning in Intelligent

Systems, San Mateo, CA: Morgan Kaufmann.ROUSSEAU, B., P. BROWNE, P . MALONE and M.

OFOGHLu (2004) User profiling for content perso-nalization in information retrieval, in Proceedings 19th ACM Symposium on Applied Computing, SAC

(Nicosia Cyprus).

SCHIAFFINO, S.N. and A. AMANDI (2000) User profil-

ing with case-based reasoning and Bayesian net-

works, in Proceedings International Joint Confer-

ence IBERAMIA-SBIA, 1221.

SHANNON, C.E. and W. WEAVER (1949) The Mathe-

matical Theory of Communication, Urbana, IL:

University of Illinois Press.

SOLTYSIAK, S.J. and I.B. CRABTREE (1998) Automaticlearning of user profiles towards the personaliza-

tion of agent services, BT Technology Journal, 16 (3).

VON WINTERFELDT, D. and W. EDWARDS (1986)Decision Analysis and Behavioral Research, Cam-bridge: Cambridge University Press.

WONG, S.K.M. and C.J. BUTZ (2000) A Bayesian

approach to user profiling in information retrieval,Technology Letters, 4 (1), 5056.

The author

Silvano Mussi

Silvano Mussi graduated in physics in 1975 from

the University of Milan, Italy. He worked at

ITALTEL for ten years in the fields of software

engineering and functional discrete simulations

of real-time systems. He has been with CILEA

since 1981. He has cooperated in research acti-

vities with Milan Polytechnic and Brescia Uni-

versity where for three academic years he

was contract-professor of artificial intelli-

gence. For over ten years he has been doingresearch in the fields of knowledge engineer-

ing and expert systems. His current research

interests address methods for providing web-

sites with capabilities of decision-making and

reasoning under conditions pervaded with un-

certainty.


19/19

web profiling shannon 2006

Documents