web profiling shannon 2006

Upload: yaron-samid

Post on 30-May-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 Web Profiling Shannon 2006

    1/19

    User profiling on the Web based on deepknowledge and sequential questioning

    Silvano MussiCILEA (Interuniversity Consortium for Information and Communication Technologies), via R.

    Sanzio 4, 20090 Segrate-Mi, Italy

    E-mail: [email protected]

    Abstract: User profiling on the Web is a topic that has attracted a great number of technological approachesand applications. In most user profiling approaches the website learns profiles from data implicitly acquired from

    user behaviours, i.e. observing the behaviours of users with a statistically significant number of accesses. This

    paper presents an alternative approach. In this approach the website explicitly acquires data from users, user

    interests are represented in a Bayesian network, and user profiles are enriched and refined over time. The profile

    enrichment is achieved through a sequential asking algorithm based on the value-of-information theory using the

    Shannon entropy concept. However, what mostly characterizes the approach is the fact that the user is involved in

    a collaborative process of profile building. The approach has been tried out for over a year in a real application.

    On the basis of the experimental results the approach turns out to be particularly suitable for applications where

    the website is strongly based on deep domain knowledge (as for example is the case for scientific websites) and has

    a community of users that share the same domain knowledge of the website and produce a low number ofaccesses (low compared to the high number of accesses of a typical commercial website). After presenting the

    technical aspects of the approach, we discuss the underlying ideas in the light of the experimental results and the

    literature on humancomputer interaction and user profiling.

    Keywords: user profiling, deep knowledge, value of information, Bayesian networks, sequential asking,Shannon entropy

    1. Introduction

    In order for a website to be able to be collabora-

    tive with its users (e.g. to be adaptive (Billsus

    et al., 2002; Brusilovsky & Maybury, 2002), toexhibit a personalized behaviour (Fink et al.,

    2002), to produce personalized alerting (Horvitz

    et al., 1999) etc.), it is necessary that the website

    knows the user interests at a sufficiently fine level

    of refinement. In other words, the website has to

    acquire the user profile. This paper presents an

    approach to the acquisition and construction of

    user profiling. A plain approach to this problem

    consists in making the website directly acquire

    from the user all the information it needs, e.g. byasking the user to fill in a sort of questionnaire

    such as: Are you interested in the topic X? If yes,

    to what extent? And so forth, for each possible

    topic X. However, such an approach could result

    in bothering the user both because there are toomany questions and because some questions

    might turn out to be not very easy for the user.

    In order to overcome such a difficulty a website

    has to be able to infer hypotheses about multiple

    user interests on the basis of little and simple

    information easily provided by the user, i.e.

    information concerning simple facts1 rather than

    information entailing a certain mental effort

    Article _____________________________

    1If we consider the medical domain, simple facts provided bythe patient are, for example, I have fever, I have a pain inmy stomach etc.

  • 8/9/2019 Web Profiling Shannon 2006

    2/19

    from the user (e.g. information resulting from

    mental activities like judgment, introspection,

    abstraction, reasoning etc.). However, it may

    happen that after these initial questions the

    information collected is not sufficient to infer

    user interests with a low margin of uncertainty.

    A website has therefore to consider the possibi-

    lity of asking additional questions, questions

    whose answers may be, in general, not very easy

    to provide. In other words, because these ques-

    tions are not as easy as the initial ones, there is

    the possibility that a user may feel bothered

    when he=she is asked them. The questionsshould therefore be asked only if it is necessary,

    and they cannot be asked all and indistinctly at

    the same time, given that each question entails a

    trade-off between expected benefit and expected

    bother. To solve such a problem, the paradigm

    of sequential asking is proposed and applied.

    The paper is organized as follows. Section 2

    introduces, as a background topic, the role of

    deep knowledge in user profiling, while Section 3

    introduces, as a foreground topic, the role of

    sequential asking in user profiling. Section 4

    presents the technical aspects of the proposal

    focusing on the paradigm of sequential asking.

    Section 5 illustrates a significant experimentalapplication, while Section 6 presents the experi-

    mental results and give some remarks on them.

    In Section 7 some strengths of the proposed

    approach are pointed out. Section 8 discusses

    the proposal in the context of humancomputer

    interaction research and related work. Finally,

    Section 9 draws some conclusions.

    2. Background: deep-knowledge-based user

    profiling

    Inferring user interests starting from acquired

    facts may be carried out by using a deep-

    knowledge-based approach. In the deep-knowl-

    edge-based approach to user profiling, deep

    knowledge is represented by causal knowledge

    linking user facts to user interests and organizing

    the user interests themselves in causal paths. In

    practice, user interests coincide with the topics ofthe domain knowledge. Let us notice that, in

    order to be realistic, interests inference cannot

    avoid taking uncertainty into account. As a

    consequence, Bayesian networks (Pearl, 1988;

    Jensen, 2001) turn out to be an ideal tool to

    represent deep knowledge and to reason with it

    under uncertainty. A site provided with deep

    knowledge does not need a statistically signifi-

    cant number of accesses before exhibiting a

    personalized behaviour, as is required from

    similarity-based approaches. Similarity-based

    approaches like content-based or collaborative

    filtering approaches (Hirsh et al., 2000) are, in a

    sense, based on shallow knowledge: an item is

    supposed to be relevant for the user if its content

    is similar to the content of the items he=shepreferred in the past (content-based approach)

    or if other users with a similar taste preferred it

    in the past (collaborative filtering approach). So,

    a site using these approaches does not know the

    deep reasons why a user prefers an item it lacks

    deep knowledge, i.e. it lacks causal knowledge

    linking the user preferences (effects) to the user

    goals (causes).

    Turning to Bayesian networks, let us face the

    problem of building the Bayesian network of the

    interests of a user working, or, more generally,

    operating, in a well-defined domain of real life,characterized by precise domain knowledge. Let

    us think, for example, of the medical domain, the

    computer science domain etc. Inside a domain of

    knowledge (for short, in the following, domain),

    multiple topics may be identified. Let us adopt

    the basic assumption that domain topics repre-

    sent possible user interests (in the respective

    topics). In general, working in a certain field

    involves pursuing certain goals, and pursuing a

    certain goal involves using certain tools. So wehave topics concerning fields (field-topics), goals

    (goal-topics) and tools (tool-topics). As a con-

    sequence, being interested in a certain field-topic

    involves being interested in certain goal-topics,

    and being interested in a certain goal-topic

    involves being interested in certain tool-topics.

    We have therefore implicitly built a causal

    network of interests. What we need now is to

    quantify uncertainty. So let us ask the domain

    expert to provide conditional probabilities, i.e.P(interest in topic X| interest in topic Y). We

  • 8/9/2019 Web Profiling Shannon 2006

    3/19

    have therefore obtained the Bayesian network of

    interests in the domain topics for an abstract user.

    In order to be able to personalize the network to a

    specific user we need to add user-facts, i.e. facts

    that affect interests or are consequences of

    interests. Among user-facts let us distinguish

    between facts concerning tools (tool-facts) and

    facts concerning fields (field-facts). A tool-fact

    concerns the use of a tool. For example, the user

    uses the software S is a tool-fact. A tool-fact is

    caused by the interest in the related tool, e.g. the

    fact that the user uses S results from the fact that

    the user is interested in S. Let us ask the expert

    to provide the probability of using a tool given

    the interest in it, e.g. P(the user uses the tool S |

    the user is interested in the tool S).2 A tool-fact

    plays the role of a symptom of the presence (in

    the mind of the user) of interest in the related

    tool-topic. A field-fact concerns a user context.

    For example, the user works in the field of

    software development is a field-fact. A field-fact

    affects the related field-interest, e.g. working in

    the field of software development affects interest

    in the field of software development. Let us ask

    the expert to provide the probability that a user

    is interested in a field given the fact that the user

    works in a certain field, e.g. P(the user is in-terested in the field of software development |

    the user works in the field of software develop-

    ment). A field-fact plays the role of a situation

    conditioning the probability of the presence (in

    the mind of the user) of interest in the related

    field-topic (it is the case of anamnesis in the me-

    dical field). In the following, for short, interests

    in field-topics, goal-topics and tool-topics will be

    called field-interests, goal-interests and tool-

    interests respectively. Let us now resume theconsiderations made so far by representing in

    Figure 1 the conceptual structure of the user-

    interests network. For simplicity let us consider

    two states for each node, i.e. y (yes state), n

    (no state). See Mussi (2003) for more details on

    a knowledge-based approach to user profiling.

    3. Foreground: question-asking-based user

    profiling

    Among user-facts we can distinguish between

    facts provided by the user without any effort at

    all, i.e. facts definitely easy to provide (for short,

    easy-facts), and facts provided by the user after a

    certain mental process, a certain reasoning ef-

    fort, i.e. facts less easy to provide (for short, less-

    easy-facts). For simplicity let us limit our analy-

    sis to consider less-easy-facts concerning tools(i.e. tool-facts less easy to provide). For example,

    let us consider the question Do you use X?. The

    answer is easy if X is an object such as for ex-

    ample a certain computer or a certain software

    product etc., but it might not be so easy if X

    denotes an approach to a problem or a frame-

    work etc. In fact the user might have developed a

    solution that is not just a mere instantiation of a

    well-known approach but, for example, takes

    some ideas from a certain approach, some otherideas from other approaches, and then intro-

    tool_fact

    tool_interest

    goal_interest

    field_interest

    field_fact

    tool_fact_1

    tool_interest

    goal_interest

    field_interest

    field_fact_1

    Figure 1: Conceptual structure of the user-

    interests network. The pure interests network

    (represented by nodes of type field-interest, goal-

    interest, tool-interest) has been enriched with

    nodes of type field-fact and tool-fact. The figure

    shows an abstract network, with only two nodes

    for each type. Figure 3 will show a real instance.

    2Let us note that in real-world applications the presence ofinterest in a tool is not sufficient to make the user use it,especially in cases in which the user has at his=her disposal

    possible alternative tools for pursuing the same goal.

  • 8/9/2019 Web Profiling Shannon 2006

    4/19

    duces some original variants.3 On the basis of

    these considerations we conclude that a website

    cannot avoid considering the possibility that a

    user be bothered by having to answer a question

    that entails a certain mental effort (reasoning,

    analysis, assessment etc.). So, whereas easy-facts

    may be asked without restrictions, less-easy-

    facts should be asked only if it is worth doing it,

    taking into account the probability of bothering

    the user. In the following, a question for acquir-

    ing an easy-fact will be called, for short, an easy-

    question, whereas a question for acquiring a

    less-easy-fact will be called a less-easy-question.

    The probability of bothering the user because of

    a less-easy-question is affected by the topic of the

    question itself. If the question concerns a topic

    the user is not interested in, there is a higher

    probability that the user gets bothered by the

    question. The fact that the user is not interested

    in the question topic might be due to the fact that

    the field he=she works in does not concern thattopic. In such a case he=she might perceive asense of logical discontinuity with his=her pre-viously entered easy-facts and might not know

    that topic very well or even at all. So in conclus-

    ion he=she is more likely to get bothered if the

    question concerns a topic he=she is not inter-ested in. These considerations prompt us to

    consider a bother-effect network beside the user-

    interests network. The bother-effect network

    simply consists of three nodes (each one with

    two states y, n): bother, bother_af, interest-

    ed. Let us examine their meanings and how they

    are connected. The bother node (which stands

    for the user is bothered) and the bother_af

    node (which stands for after the question the

    user is bothered) represent the current status(bothered or non-bothered) of the current user

    respectively before and after a question is asked.

    They are therefore linked by the causeeffect

    relation: bother ! bother_af. The interest-ed node (which stands for the user is interested

    in the topic of the question) plays the role of a

    conditioner node for the conditional probability

    table of the bother_af node. In other words, it

    modulates the probability that the user is in the

    state bother after the question. So, in formal

    terms, we have that P(bother_af y | bother n,interested y) is lower than P(bother_af y |bother n, interested n).4 Let us now resumethe considerations made so far by representing

    in Figure 2 the structure of the bother-effect

    network.

    4. Enriching and refining user profiles

    The website starts the profile building process by

    asking the user to enter a set of easy-facts and

    then by propagating them through the user-

    interests network, inferring in this way the user

    profile, or in other words inferring, for each

    topic, the probability that the interest in the

    topic is present in the users mind. At this stage

    the question arises: is the inferred profile suffici-

    ently defined? That is, have the user-interestsbeen inferred with a sufficiently low uncertainty

    margin? In practice, is it necessary or not to ask

    one or more less-easy-questions in order to

    capture further information from the user and

    infer a more accurate profile? The answer is that

    less-easy-questions should be asked only if it is

    worth doing it. Let us define the concept of it

    is worth doing it. In a broad sense, the purpose

    of asking questions is to know more about

    interested

    bother

    bother_af

    Figure 2: Structure of the bother-effect network.

    3The distinction between easy-facts and less-easy-facts istypical in the medical domain. In fact, information providedby the patient at the beginning of a medical examination is aneasy-fact, whereas information coming from a clinical test isa less-easy-fact. For example, the answer to the question Do

    you have fever? is an easy-fact, but the answer to the questionIs the bilirubin level in your blood normal? is a less-easy-fact(you have to undergo a blood test, you have to pay for it, ittakes some time etc.). 4Obviously, P(bother_af y | bother y) 1.

  • 8/9/2019 Web Profiling Shannon 2006

    5/19

    user-interests, or, more precisely, to decrease

    uncertainty about user-interests. In order to

    simplify the problem we wonder if among all the

    user-interests there is a subset of interests for

    which the margin of uncertainty is particularly

    important to be low. To this end let us notice

    that, since goals are at the basis of user actions,

    goal-interest nodes (i.e. goal-topics) are strate-

    gic, and as a consequence we aim at having

    low uncertainty about these nodes. So a less-

    easy-question is worth asking if, in spite of the

    unavoidable risk of bothering the user, it is

    expected to produce a significant decrease of

    uncertainty about the goal-interest nodes. Un-

    certainty about the states of a node is well

    represented by the concept of entropy (Shannon

    & Weaver, 1949), which is the hub concept of

    information theory. So a less-easy-question is

    worth asking if the expected decrease in entropy

    of the goal-interest nodes is significant enough to

    compensate the unavoidable risk of bothering

    the user. The next section will formally define the

    problem in the decision theory framework (von

    Winterfeldt & Edwards, 1986) based on entropy.

    4.1. Benefit expected from a less-easy-questionLet G be a node and let g denote a state ofG, for

    short g 2 G. The entropy of G is given by

    ENTG X

    g2G

    Pg log2Pg 1

    Let us use the value function ENT which in-creases with preference. Let Q be the set of less-

    easy-questions and let q be a less-easy-question,

    i.e. q 2 Q. Let Jq be the set of possible answers toq, and let jq be an answer to q, i.e. jq 2 Jq. Thevalue, with respect to the node G, of the

    information jq is given by VG(jq):

    VGjq ENTGjjq 2

    The expected value, with respect to the node G,

    of the question q is given by EVG(q):

    EVGq X

    jq2JqVGjq Pjq 3

    So the expected benefit of q, with respect to the

    node G, is given by

    EBGq EVGq ENTG 4

    Let us consider a set of nodes G. If we assume

    that each node has the same weight of importan-ce, the overall expected benefit from a question q

    is given by

    EBq X

    G

    EBGq 5

    Let us now turn back to the user-interests

    network. Let G denote a goal-interest node. For

    each question q 2 Q let the set Jq of the possibleanswers be {yes, no}. If the user answers yes

    (no) to q, we set the related tool-fact node to y

    (n). For example, let us consider the less-easy-

    question Do you use X? and the related tool-

    fact node use_X (standing for the user uses

    X). If the user answers yes, use_X is set to the

    state y; if he=she answers no, use_X is set tothe state n. So (2) becomes

    VGDo you use X? yes

    ENTGjuse X y 20

    VGDo you use X? no

    ENTGjuse X n 200

    Moreover, the probability of using X given the

    interest in X represents the probability of

    obtaining from the user the answer yes to the

    question Do you use X? given the interest in X,

    i.e. we do not consider lying. So (3) becomes

    EVGq ENTGjuse X y Puse X y ENTGjuse X n Puse X n

    30

    4.2. Bother-effect expected from a less-easy-

    question

    From the user-interests network we can obtain,

    through (5), the expected benefit of the question

    q: Do you use X?. However, in order to assess if

    q is worth asking or not, we have to take intoaccount the probability that, after q is asked,

  • 8/9/2019 Web Profiling Shannon 2006

    6/19

    the mental state of the user is bothered (because

    of q). In order to properly calculate this

    probability we have to take into account two

    influence factors: the first concerns a sort of

    cumulative effect in cases of multiple questions,

    and the second concerns the influence of the

    question topic. Let us examine the first factor.

    The probability that the user is bothered

    because of q is higher after the second question

    than after the first, and so forth. Let us model

    this cumulative effect in the following way. Let

    us initially start with P(bother y) 0; theneach time a question is asked let us replace the

    probability distribution of the bother node with

    that of the bother_af node. Let us now pass to

    examine the second factor.

    The probability that the user is interested in

    the question topic is given by the probability

    distribution of the related tool-interest node

    (Figure 1). So, given the question q, Do you use

    X?, let us replace the prior probability distribu-

    tion of the node interested (Figure 2) with the

    probability distribution of the tool-interest node

    the user is interested in X (Figure 1). After

    having assigned the proper probability distribu-

    tions to the nodes bother and interested, let us

    propagate. The expected bother-effect of q istherefore given by the value ofP(bother_af y)resulting from the propagation. Finally, let us

    note that P(bother_af y)>P(bother y).

    4.3. Profit expected from a less-easy-question

    In order to obtain a single number quantifying

    the opportunity of asking a less-easy-question

    q, we have to combine, in terms of utility,

    the expected benefit (entropy decrease) with theexpected bother-effect of q. Let us consider

    the utility function of the entropy of a node G. If

    the entropy ofG is minimum, i.e. 0, the utility is

    maximum, i.e. 1. If the entropy of G is maxi-

    mum, i.e. 1, the utility is minimum, i.e. 0. So,

    considering, for simplicity, a linear utility func-

    tion, we have that the expected benefit in terms

    of utility is equal to the expected benefit in terms

    of pure entropy. Let us now pass to consider

    the utility function of the bother-effect. If thebother-effect is absent, i.e. the state of the node

    bother_af is n, the utility is maximum, i.e. 1. If

    the bother-effect is present, i.e. the state of the

    node bother_af is y, the utility is minimum, i.e.

    0. So, the expected bother-effect in terms of

    utility is given by P(bother_af y | q) * 0 P(bother_af n | q)

    *1 1 P(bother_af y|q).

    Let us call the expected cost of q the quantity

    ECq Pbother af y j q 6

    Finally, let us notice that the entropy utility

    and the bother-effect utility have, in general,

    different importance. Let us therefore consider

    the importance weight distributions kENT and

    kdist, where kENT represents the importance

    weight assigned to the entropy and kdist repre-

    sents the importance weight assigned to the

    bother-effect.5 On the basis of all these con-

    siderations, the expected net advantage, which

    will be called expected profit (EP), from a

    question q is given by

    EPq EBq kENT ECq kdist 7

    where kENT kdist 1. In conclusion, a questionq is worth asking if EP(q)> 0. Moreover, the ques-tion with the maximum EP is the preferred one.

    4.4. Sequential asking algorithm

    In general, there is a certain number ( > 1) ofless-easy-questions. So a website during the pro-

    cess of solving the target problem (i.e. inferring

    the user profile) has to solve the meta-problem of

    choosing, under trade-off between benefit and

    cost, the more opportune less-easy-question to

    ask next in order to acquire a new piece of infor-

    mation which will contribute to solving the

    target problem, taking into account also the fact

    that the answer provided by the user has the side-

    effect of changing the benefits expected from

    asking other questions. Let us establish that ques-

    tions are asked one at a time. This strategy is

    called myopic. It addresses the following ques-

    tion: if you are allowed to ask at most one

    5The importance weights may be elicited with the probabilityindifference method. Considering for example kENT we havethat the gambling situation is G: [p* {entropy min, botherabsent}; (1 p)*{entropy max, bother present}], whereasthe certainty situation is I: [{entropy null, bother present}for certain].

  • 8/9/2019 Web Profiling Shannon 2006

    7/19

    question, which question would you choose? As

    is well known the analysis of all the possible

    sequences of questions is, in practice, intractable

    because the number of sequences grows expo-

    nentially with the number of tests (Heckerman

    et al., 1992). So, many real-world applications

    use the so-called myopic value-of-information

    approach. The myopic approach is in practice a

    good heuristic (Gorry & Barnett, 1985).

    We are now ready, on the basis of the consi-

    derations made so far, to define at a conceptual

    level the basic algorithm of sequential asking.

    Let x denote the current available knowledge,

    i.e. x all the information entered so far.

    0. Ask all the easy-questions, acquire the

    answers and set the related user-fact nodesto the appropriate states.

    1. Propagate the entered information through

    the user-interests network.

    2. If all the less-easy-questions have been

    asked, then EXIT.

    3. For each less-easy-question q not yet asked

    calculate EP(q | x).

    4. If each EP(q | x)r 0, then EXIT.

    5. Ask the user the q with the maximum EP,

    acquire the answer and set the related tool-fact node to the appropriate state.

    6. Go back to point 1.

    4.5. Asking over time

    After a certain number of questions the prob-

    ability that the user is bothered (because of the

    questions), i.e. P(bother_af y | q), increases somuch that for each q it happens that EP(q |

    x) r 0, and as a consequence the sequential

    asking stops. However, as time passes, it seemsreasonable to suppose that the probability that

    the user is still in the state bothered (because of

    the questions) decreases. As a consequence, it

    might happen that an EP that was negative in the

    past is now positive. In other words, after a cer-

    tain time interval there might be some questions

    that are worth asking, even if they were not in

    the past. Such a situation is modelled by dyna-

    mically calculating the values of the prior proba-

    bility distribution of the node bother on the basisof an appropriate decreasing temporal function.

    The practical consequence of all this is that the

    user profiling process is not accomplished in a

    single and initial phase but in a sequence of

    sessions over time. In each session the sequential

    asking algorithm is activated and, as a conse-

    quence, the user profile is enriched and refined.

    In practice, when, after a certain time from the

    last user profiling session, the user accesses the

    website, the website evaluates if there are some

    less-easy-questions that are worth asking. If yes,

    the website personalizes the user home page with

    a message inviting the user to resume the

    dialogue in order to better refine his=her profile.

    5. An application of the proposed approach

    within a supercomputing portal

    In this section we will describe a significant

    application of the approach so far presented.

    The proposal has been applied within the

    CILEA6 supercomputing portal:7 a portal de-

    voted to disseminating supercomputing culture

    and events in the community of researchers

    working in the supercomputing field. More pre-

    cisely, the proposal has been applied in the con-

    text of a news recommender system embedded in

    the portal. The proposal has been implementedin a prototype (developed by the author)8 tested

    on a PC off-line; then the prototype was inte-

    grated into the portal. Supercomputing news is

    recommended in a personalized manner both

    through a personalized presentation on the home

    page and by activating a personalized alerting

    service. The application regards a particular sub-

    field of supercomputing, that of computational

    fluid dynamics (CFD). CFD encompasses many

    heterogeneous application fields (aerospace, bio-medical etc.), various phenomenological areas

    (turbulent flows, porous media flows etc.),

    various computational techniques (large eddy

    simulation, coupled transport equations etc.),

    6CILEA is an Italian interuniversity consortium for informa-tion and communication technologies that provides super-computing services to its users in order to promotesupercomputing culture and services.7www.supercomputing.it

    8Tools used to implement the proposal: ASP pages, Clanguage, MySQL database, Hugin inference engine.

  • 8/9/2019 Web Profiling Shannon 2006

    8/19

    specialized software packages (Fluent, Fidap

    etc.) and so forth. A CFD user may be an aero-

    spatial engineer working in the aerospace in-

    dustry; another may be a biophysicist working in

    the biomedical field etc. A biophysicist probably

    pursues the goal of developing computational

    models of porous media flows, uses the coupled

    transport equations technique and runs his=hermodels with the package Fidap. Conversely, an

    aerospatial engineer probably focuses on mod-

    elling turbulent flows, and it is probable that

    he=she is not interested in the tools used by abiophysicist. So the need for building accurate

    user profiles for recommending the right news to

    the right users arises. Taking into account the

    general network structures of Figures 1 and 2,

    the CFD user-interests network and the bother-

    effect network illustrated in Figure 3 have been

    produced. Notice that the user-interests network

    encompasses six nodes (on the top) representing

    field-facts, 18 nodes (on the bottom) represent-

    ing tool-facts and 30 nodes (in the middle)

    representing interests, i.e. topics.

    Tool-facts have been distinguished as easy-

    facts and less-easy-facts. The facts concerning

    the use of software packages (i.e. the nodes

    whose names are prefixed by use_s_) have beenconsidered as easy-facts, whereas the facts

    concerning the use of computational techniques

    (i.e. the nodes whose names are prefixed by

    use_t_) have been considered as less-easy-facts.

    Referring to the algorithm defined in Section 4.4,

    let us start (step 0) by asking the user to enter the

    set of easy-facts (Figure 4). The entered facts are

    then propagated through the network, inferring

    in this way the CFD profile of the current user

    (step 1). So now the portal knows, for each of the30 topics, the probability that the current user is

    interested in it. The loop of the remaining steps is

    then performed, possibly asking less-easy-ques-

    tions depending on the specific current case. For

    example, if the user declares that he=she worksin the aerospace field and does not use any of

    the software packages listed in Figure 4, the

    portal finds that it is worth trying to decrease

    uncertainty about the goal-interests and begins

    to ask less-easy-questions (see Figure 5 foran example).

    In this application, user profiles are used by

    the portal to recommend CFD news. Such news

    is classified by a CFD expert. More precisely,

    given a piece of news N, a CFD expert, for each

    topic T, defines (through subjective estimation)

    the probability that N concerns T. The classified

    piece of news is then entered in the website

    database. The expected utility EU of each news

    N for a user U is calculated according to the

    following algorithm:

    EU(NU) 0For each T do

    EU(NU) EU(NU) EU(NU T)End for

    where

    EUNUT

    UNUTj N concerns T yes; interest in T

    yes

    PN concerns T yes

    Pinterest in T yes

    where U(NU T|N concerns T yes, interest inT yes) stands for utility of N for the user U,given that N concerns T and the user Uis inter-

    ested in T. For simplicity U(NUT | . . .) has been

    assumed to be equal to 1 for any piece of news

    and any user. The website has therefore at its dis-

    posal a personalized expected utility of each piece

    of news, calculated on the basis of the inferred

    user profile. So, when a user accesses the portal,

    the portal uses such EU values to recommend

    news to the user. Recommendation is carried out

    by presenting news with different emphasis and

    in decreasing order of relevance (Figure 6).

    As stated in Section 4.5, user profiles are

    enriched and refined over time. In practice, when

    a user accesses the portal, the portal evaluates if

    there are some less-easy-questions that are worth

    asking9 and, if that is the case, places at the

    beginning of the news list a message inviting the

    user to resume the dialogue (Figure 7).

    News recommendation is also accomplished

    via alerting e-mail. Users can take advantage of

    9The algorithm in Section 4.4 is performed by starting fromstep 2.

  • 8/9/2019 Web Profiling Shannon 2006

    9/19Figu

    re3:Thefigureshowsthebother-effectnetwork(topleft)andtheCFDuser-interestsnetworkoftheapplicationoftheproposaltothe

    CILEAsupercomputingportal.TheinterestsnetworkinstantiatestheabstractnetworkofFigure1.Moreprecisely,startingfromthetop,the

    nodesofthetoplayer(whosename

    sbeginwithWork_in)areoftypefield-fact.Comingdown,thenodesofthefollowingthree

    layersrepresent

    interestsandareoftypefield-interest,goal-interest,tool-interestrespectively.Thenodesofthebottomlayerareoftypetool-fact.Inparticular

    theeightnodesontheleftconcerntheuseofcomputationaltechniq

    ues(less-easy-facts),w

    hereas

    theremainingtenontherightc

    oncerntheuseof

    softw

    arepackages(easy-facts).

  • 8/9/2019 Web Profiling Shannon 2006

    10/19

    the intelligent alerting service: for each user, the

    portal considers the set of news not yet seen bythe user, and for each piece of news calculates

    the expected utility the piece of news has for the

    user; if the expected utility is greater than a given

    threshold, the portal sends the relevant piece of

    news to the user via e-mail.

    6. Experimental results

    The proposal implementation presented in Sec-tion 5 has been working for more than a year

    inside the supercomputing portal. In order to

    better test the proposal, several heterogeneous

    institutions (research centres, university depart-

    ments, organizations operating in the environ-

    mental physics field, industries, CFD software

    providers etc.) working in the CFD field have

    been involved. During that period of time

    several researchers working with those institu-

    tions have used the proposal implementation.They have been interviewed, collecting their

    comments both about the proposed approach to

    user profiling and about the accuracy of various

    types of profiles.

    6.1. User feedback about the approach

    Most users have expressed a favourable impres-

    sion about the approach. In particular, they

    have appreciated both the fact that their inferredprofiles were ready soon after their initial login

    Do you use (in your work activity) the computational technique:

    Volume of Fluid Methods?

    yes

    no

    I don't feel like answering this question at the moment

    I would prefer not to answer this question, even in the future

    SEND

    Figure 5: An example of a less-easy-question.

    If you wish you may enrich your profile record by selecting in the area ofComputational Fluid Dynamics the application field/s

    you work in

    Aerospace (Environmental Control System, Propulsion, Pumps, Rotor-Airframe Interaction, External Aerodynamics ...)

    Biomedical (Blood Handling Equipment, Physiological Flows, Toxicology Research ...)

    ChemicalProcess (Combustion, Drying, Emission Control, Filtration, Reaction, Water Treatment ...)

    Environment (Atmospheric Plume Dispersion, Coast Erosion, Hydro-Geological Applications, Irrigation Components,

    Meteorological Applications, Petroleum Platforms, Sea Technologies, Service Reservoirs, Wastewater Pump-stations, Weirs, ...)

    Automotive (Engine Cooling, External Aerodynamics, Intake Valves, Hunderhood Flow Simulation, Vehicle Interior, Vapor

    Dispersion, Windshield Washer Nozzles...)

    Energy (Boilers, Burners, Coal Transport and Classification,Combustor, Hydro-power, Incinerators, Nuclear Reactors,

    Turbomachinery ...)

    If you wish you may enrich your profile record by selecting the software package/s you use for running your models in the area of

    Computational Fluid Dynamics

    KIVA

    FLUENT

    FIDAP

    ADINA

    CFXCFX_TASKFLOW

    CFX_TURBOGRID

    GAMBIT

    TGRID

    STAR_CD

    Figure 4: The form used for entering the initial set of easy-facts, i.e. the subset of six field-facts (top)

    followed by the subset of ten tool-facts (bottom).

  • 8/9/2019 Web Profiling Shannon 2006

    11/19

    and the fact of being directly involved (through

    specific knowledge-based questions) in the pro-

    file refining process over time. They have

    declared that they are not bothered by being

    asked questions over time from the website;

    conversely they have declared that they arepleased to cooperate with the website. They have

    perceived the question-asking attitude from the

    website as a sort of constant and competent

    attention from the website to their needs (the

    website behaves like an expert that looks after

    their specialist interests). In fact questions are

    asked in an unobtrusive manner because of both

    the fact that they are asked over time according

    to the sequential asking algorithm and the fact

    that a user is not forced to answer them. A user isfirst asked if he=she agrees with the website

    resuming the dialogue. Moreover, for each

    question the user is offered the possibility of

    temporally suspending the answer if he=she doesnot feel like answering at that moment, or telling

    the website not to ask that question any longer

    (see the choice options appearing in Figure 5).We note that the type of user population that has

    been considered in this experimental year has the

    following characteristics. Users are researchers

    in the CFD field. In general they have used the

    alerting service to be automatically notified of

    possibly relevant newly created news, and have

    not accessed the portal very often. Both users

    and the website share a very specific field of

    knowledge (CFD knowledge), and it is just this

    knowledge on which the dialogue between usersand the website is based.

    GOOD NEWS FOR YOU!!!

    Int. conf. on Aerospace supercomputing

    FURTHER NEWS OF PROBABLE INTEREST FOR YOU

    Tutorial course on Turbulent Flows modelling

    OTHER NEWS YOU MIGHT BE INTERESTED IN

    Summer school on Fluent

    OTHER NEWS

    Advanced course on porous media flows modelling

    Int. Conf. on Biomedical supercomputing

    Figure 6: An example of personalized news recommendation, given the fact that the user works in

    the aerospace field.

    Figure 7: An example of a message inviting the user to resume the dialogue.

  • 8/9/2019 Web Profiling Shannon 2006

    12/19

    6.2. User feedback about profile accuracy

    Profile accuracy has been checked in an empiri-

    cal manner. Initially the probability tables of the

    user-interests network were elicited, with the aid

    of suitable forms, from the CILEA CFD expert.

    Then they were tuned with the cooperation of aset of end-users who had used both probability

    soundness tests and recommendation sound-

    ness tests. In order to clearly explain what these

    soundness tests consist of, let us consider three

    sample sets whose definitions are illustrated in

    the following section.

    6.2.1. Sample set definitions Some end-users

    belonging to heterogeneous CFD sub-fields havebeen involved in providing feedback about

    profile accuracy. Let SEU be the sample set of

    such end-users.

    Some typical hypothetical cases of user-facts

    have been defined. For example, let us consider a

    hypothetical case defined by the user-facts

    working in the aerospace field and using the

    software tool Fluent (e.g. the case of an

    aerospatial engineer), another defined by the

    user-facts working in the biomedical field andusing the software tool Fidap (e.g. the case of a

    bioengineer) and so on. Let SHC be the sample

    set of such hypothetical cases.

    Some suitable hypothetical piece of news has

    been created. More precisely, for each CFD

    topic T hypothetical news NT concerning T with

    probability 1 and the other topics with probabi-

    lity 0 has been created so that the expected utility

    of NT is equal to the probability of interest in

    T

    yes. For example, for the topic turbulence

    flow modelling news like International Confer-

    ence on Turbulence Flow Modelling has been

    created, for the topic porous media flow model-

    ling news like Summer School on Porous

    Media Flow Modelling has been created, and

    so forth. Let SHN be the sample set of such

    hypothetical news.

    6.2.2. Soundness tests Soundness tests have

    been performed according to the followingalgorithm.

    For each hypothetical case in SHC:

    enter the related user-facts and propagatethem through the user-interests network, in

    this way producing the related inferred

    profile;

    activate the recommender system with theSHN set, in this way producing the list of

    hypothetical news recommended according

    to the inferred profile;

    ask each end-user in the SEU to examine the inferred profile, to check if the inferred

    user-interest probabilities could be con-

    sidered acceptable in the light of the

    domain common sense, given the entered

    facts (probability soundness test);

    the emphasis given to each hypotheticalpiece of news by the recommender system,

    to check if the emphasis degree could be

    considered compatible with the inferred

    profile in the light of the domain common

    sense (recommendation soundness test).

    The feedback collected from the end-users of the

    SEU was then used to tune the probability tables

    of the user-interests network.

    7. Strengths of the proposed approach and

    suitable application domains

    A basic humancomputer interaction element

    underlying the proposed approach is represent-

    ed by the fact that the main aim of the website is

    that of cooperating with a user (without bother-

    ing him=her) to build an accurate profile of him=her. The website aims at building an accurate

    user profile to create a collaborative user rela-tionship over time, so that the user perceives the

    website as a collaborator watching over his=her interests and looking after his=her profileover time. In fact, even after the initial profile

    construction session, the website does not miss

    future favourable opportunities to invite the user

    to resume the dialogue in order to better refine

    his=her profile. This approach of involving theuser as a partner in the process of building

    his=her profile by asking him=her questions evenover time (without bothering him=her) is more

  • 8/9/2019 Web Profiling Shannon 2006

    13/19

    suitable to websites specialized in certain

    areas interesting a very specific community, such

    as, for example, a specific scientific community.

    In these cases, in general, the domain knowledge

    of the application expands more in depth than

    in broadness, so questions are not excessively

    numerous, are more specific, the answers are

    more significant because they refer to a specia-

    lized universe of knowledge, and the dialogue

    between the user and the website resembles

    a dialogue between two experts operating in

    the same field. As a consequence in these cases

    it is less probable that a user gets bored when

    the website asks him=her some competentquestions.

    Another strength of the proposed approach is

    represented by the fact that the website has at its

    disposal a complete inferred profile of the user

    soon after the initial session, where the user

    enters his=her user-facts. In other words, thewebsite does not need a statistically significant

    number of accesses from the user to build a com-

    plete user profile. This is a very useful feature for

    those websites that have a low number of access-

    es, possibly because of the type of population

    accessing the website, the type of news or ser-

    vices provided by the website etc. For example, adeep-knowledge-based website, concerning a

    specific field and having the main purpose of

    promoting and disseminating the specific field

    culture, may have a user population mostly

    consisting of professional people operating in

    the same field (e.g. researchers in the field). In

    general, such users do not need to access the

    website for their everyday tasks. They access the

    website for reading recent news. News concerns,

    in general, cultural events in the field: confer-ences, courses, books etc. infrequent events

    which do not occur daily. Moreover, the website

    has an alerting service which users might prefer

    to use instead of directly accessing the website.

    For all these reasons the website may have a low

    number of accesses, but nevertheless the website

    needs to have at its disposal an accurate user

    profile (for providing personalized services)

    from the beginning of the userwebsite relation-

    ship. The presented approach represents asolution to such a problem.

    Summarizing the considerations made so far,

    let us conclude that the approach presented in

    this paper is suitable for websites characterized

    by the following facts:

    they are based on specialist knowledge (such

    as for example scientific websites); the dialogue with users is on the basis of

    domain knowledge that the website and the

    users have in common;

    they may have a low number of accesses, butnevertheless they need to have at their

    disposal a complete inferred user profile

    soon after the user enters his=her initial data.

    Conversely, in cases of websites dealing with a

    broad and generic domain of knowledge, a verylarge population of visitors and a statistically

    significant number of accesses, such as for

    example popular commercial websites, famous

    book-store websites etc., alternative approaches

    are used. In general, these approaches are based

    on collaborative filtering techniques or other

    techniques of user profiling based on user be-

    haviour observation (i.e. implicit data acquisi-

    tion), i.e. techniques that avoid any type of

    active involvement from users. Some of suchtechniques will be reviewed in Section 8.

    8. Related work and discussion

    In this section we will discuss the ideas underly-

    ing the proposal in the wider context of human

    computer interaction research with a particular

    focus on user profiling. User profiling on the

    Web is a topic which has received much

    attention in several international journals, con-ferences and books. The topic has been ap-

    proached in the light of various technologies and

    has found application in several heterogeneous

    fields. An exhaustive overview of the state of the

    art is beyond the scope of this paper. We will

    limit ourselves to comparing the proposal to

    some significant approaches. User profiling is

    typically either knowledge-based or behaviour-

    based. In knowledge-based approaches informa-

    tion about users is acquired explicitly throughquestionnaires or single questions. Behaviour-

  • 8/9/2019 Web Profiling Shannon 2006

    14/19

    based approaches use implicit observations of

    user behaviour, commonly using machine-learn-

    ing techniques to discover useful patterns in the

    behaviour (data mining techniques are applied

    to log files to extract patterns). We can also clas-

    sify works on user profiling from two orthogonal

    points of view: the technology used to build pro-

    files and the target task the profiles are built for.

    For example, technologies may be Bayesian net-

    works, case-based reasoning, ontologies, data

    mining etc., while target tasks may be informa-

    tion retrieval, e-learning, e-commerce, recom-

    mender systems etc. According to these points of

    view, the proposal presented in the present paper

    may be classified as knowledge-based, using the

    Bayesian network technology and addressing

    the recommender system target task.

    Godoy et al. (2004) propose two intelligent

    agents, PersonalSearcher and NewsAgent, that

    assist users in tasks of filtering and organize

    information available on the Web. Personal-

    Searcher (a personalized Web searcher) is an

    interface agent that helps users who are search-

    ing the Web for relevant information by filtering

    a set of documents retrieved from several search

    engines according to users interests. NewsAgent

    (a personalized digital newspaper generator) isan interface agent that selects those articles that

    are relevant to a user from several online news-

    papers. The agents incrementally build a hier-

    archy (a tree) of users relevant topics by means

    of a textual case-based reasoning technique (a

    specialization of case-based reasoning for tex-

    tual documents). Profiles are adapted as agents

    interact with users over time. Both agents

    observe users behaviour while they are reading

    Web documents, recording the main featurescharacterizing these experiences. A user profile

    consists of a set of weighted topics relevant to a

    user. This approach is therefore behaviour-

    based, uses the case-based reasoning technology,

    and addresses the information retrieval and

    intelligent assistant target tasks. In our proposal

    also user profiles are represented in terms of topics,

    i.e. user interests. In our proposal, however, a

    topic hierarchy is represented by a network.

    Wong and Butz (2000) propose a method forrepresenting a user profile as a Bayesian net-

    work. Such a network is learned from a sample

    of documents that are judged by the user to be

    relevant or irrelevant. The proposed method

    addresses the information retrieval target task.

    An approach addressing the information

    retrieval target task and integrating the case-

    based reasoning and Bayesian network techni-

    ques to build user profiles incrementally is

    presented in Schiaffino and Amandi (2000).

    Case-based reasoning provides a mechanism to

    acquire knowledge about user actions that are

    worth recording to determine his=her habits andpreferences. Each case records the attributes or

    keywords used by a user to perform queries. A

    query is classified according to its similarity to

    previous recorded queries. The Bayesian net-

    work provides a tool to represent relationships

    between items of interest. It is built gradually as

    a user queries the database. Information stored

    in the form of cases is used to gradually build

    and update the Bayesian network as the interests

    of the user change over time. The user profile

    consists of a statistical profile (type of queries

    frequently made etc.) and an inferred (via the

    Bayesian network) profile. A profile is used to

    suggest the execution of relevant queries to a user

    at an appropriate moment. The authors focustheir attention on users who need data stored in

    the database to fulfil their everyday tasks, or at

    least who often use the database. Conversely,

    our proposal focuses on a type of population of

    users who do not access a website very often. In

    our case the website has at its disposal a general

    Bayesian network (knowledge base) that is then

    instantiated to a specific user and generates a

    specific inferred profile, but does not need to be

    built incrementally. This is an advantage inapplications where the user population does not

    produce a great number of accesses. In these

    cases the solution is just the knowledge-based

    one: asking users explicitly for some informa-

    tion, involving the user in a cooperative relation-

    ship. So, even soon after the first contact with a

    new user (registration phase and data entry of

    Figure 4) the website has at its disposal an

    inferred profile of the user. Another difference

    concerns the target task: our proposal addressesthe task of news recommendation.

  • 8/9/2019 Web Profiling Shannon 2006

    15/19

    In Nokelainen et al. (2002) an adaptive online

    questionnaire system, EDUFORM, is pre-

    sented. The proposal addresses the educational

    target task and uses Bayesian probabilistic

    models. The authors face the problem of one

    size fits all on-line questionnaires equipped with

    numerous propositions. In particular, EDU-

    FORM is a Web-based data gathering tool,

    which performs adaptive and dynamic optimi-

    zation of the number of questionnaire proposi-

    tions during the data gathering process.

    EDUFORM uses probabilistic Bayesian meth-

    ods to create user profiles that are then used to

    dynamically optimize the set of propositions

    that are presented to a user in order to maximize

    information extraction. In our approach the

    goal of minimizing the number of questions

    and maximizing information extraction is

    achieved by using the value-of-information

    framework.

    The approach presented in Adomavicius and

    Tuzhilin (1999, 2001) is well suited to the e-

    commerce target task. It is behaviour-based and

    uses data mining methods. More precisely, the

    authors present a method for constructing user

    behavioural profiles using data mining techni-

    ques. Profiles are specified with sets of ruleslearned from transactional histories. Since many

    rules can be spurious, irrelevant or trivial, a

    method for validating them separating good

    rules from bad ones is presented.

    Data mining techniques are also used in

    Nasraoui et al. (2002) where the authors present

    a framework (based on fuzzy relational cluster-

    ing) for mining typical user profiles from the

    vast amount of historical data stored in server

    access logs.Soltysiak and Crabtree (1998) use a heuristics-

    based clustering method to generate user interest

    profiles. The method is applied to the e-mails a

    user sends or receives, and the WWW pages

    he=she browses. They use a keyword extractiontechnique for identifying relevant keywords

    within a document. A document is then repre-

    sented as a vector of keywords. The vectors are

    used as the basis of grouping documents into

    clusters. A sufficiently large cluster of docu-ments represents a users interest.

    The same authors (Crabtree & Soltysiak,

    1998) address the information retrieval target

    task, focusing in particular on tracking interest

    themes over time through measuring the simi-

    larity of interest themes across time periods.

    Heuristics-based techniques are also used in

    Kostoff et al. (2001) and Rousseau et al. (2004)

    addressing the intelligent assistant and informa-

    tion retrieval target tasks respectively.

    In Esposito et al. (2003) a comparison of the

    effectiveness of two supervised methods for

    learning user profiles, inductive logic program-

    ming and Bayesian classifier, is accomplished.

    The comparison focuses on the two different

    learning strategies to infer models of user

    interests from textual book descriptions. Experi-

    mental results are conducted in the context of a

    content-based profiling system for a virtual

    bookshop on the Web.

    Recently particular attention has been paid to

    the use of ontology-based technologies in user

    profiling. Middleton et al. (2004) explore an

    ontological approach to user profiling within

    recommender systems, working on the problem

    of recommending online academic research

    papers. They present two experimental systems

    that create user profiles from unobtrusivelymonitored behaviour and relevance feedback,

    representing the profiles in terms of a research-

    paper topics ontology. Papers are classified

    using ontological classes. The database of re-

    search papers is classified using a research-paper

    topics ontology and a set of training examples.

    Recorded Web browsing and relevance feedback

    elicited from users are used to compute daily

    profiles of users research interests. Interest pro-

    files are represented in ontological terms, allow-ing other interests to be inferred. The interest

    profiles are visualized to allow elicitation of

    direct profile feedback.

    The same authors face the theme of the use of

    ontologies in recommender systems (Middleton

    et al., 2001), and in Middleton et al. (2003) they

    explore the idea of profile visualization to

    capture further knowledge about user interests.

    Let us classify in Table 1 the set of papers con-

    sidered so far. The papers, along with the presentproposal, are located in the table according to

  • 8/9/2019 Web Profiling Shannon 2006

    16/19Tab

    le1:

    Theworksexaminedcla

    ssifiedaccordingtothetechno

    logytheyuseandthetargettasktheyaddress

    e-C

    ommerce

    Informationretrieval

    Recommendersystem

    e-Learningand

    education

    Intelligentassistant

    Ontologies

    Middletonetal.,

    2004

    Middletonetal.,

    2001

    Middletonetal.,

    2003

    Bayesiannetworks

    Wong&Butz,

    2000

    a

    Case-basedreasoning

    Godoyet

    al.,

    2004

    G

    odoyetal.,

    2004

    Case-basedreasoningand

    Bayesiannetworks

    Schiaffino

    &

    Amandi,2000

    Bayesianprobabilistic

    mod

    els

    Nokelainen

    etal.,

    2002

    Dataminingandrule

    discovery

    Adomavicius&

    Tu

    zhilin,

    1999

    Adomavicius&

    Tu

    zhilin,

    2001

    Web

    logdataminingand

    fuzzyclustering

    N

    asraouietal.,

    2002

    Heu

    ristics-basedclustering

    Crabtree

    &

    Soltysiak,

    1998

    Rousseau

    etal.,

    2004

    Soltysiak&

    C

    rabtree,

    1998

    K

    ostoffetal.,

    2001

    Indu

    ctivelogicprogramming

    Es

    positoetal.,

    2003

    Bayesianclassification

    Es

    positoetal.,

    2003

    aWorkpresentedinthispaper.

  • 8/9/2019 Web Profiling Shannon 2006

    17/19

    the technology and the target task by which they

    are characterized. Although the set of applica-

    tions that have been considered is not exhaus-

    tive, the table gives an idea of the variety of

    technological approaches and target tasks of

    user profiling. It also gives an idea of both the

    enormous amount of work done in the user

    profiling field and the difficulties underlying the

    problem of building accurate user profiling.

    9. Conclusions

    This paper has presented a user profiling app-

    roach based on a deep-knowledge model of user

    interests (i.e. domain topics) and a sequentialasking algorithm using the value-of-information

    theory based on Shannon entropy. The app-

    roach has been implemented and tried out for

    over a year in a real context. Experimental re-

    sults have indicated that the proposal is parti-

    cularly suitable for websites addressing domains

    with specific deep knowledge (like scientific web-

    sites) and with a user population that is charac-

    terized by sharing the same knowledge and

    producing a low number of accesses (i.e. low ifconfronted with the high number of accesses of a

    typical commercial website). The approach has

    been discussed in the context of the scientific

    literature concerning humancomputer inter-

    action and in particular user profiling. The

    discussion has highlighted that the proposed

    approach differs from others mostly because of

    its emphasis on involving users in collaborative

    processes for building and refining their profiles.

    Acknowledgements

    In alphabetical order, thanks to Dr Enrico

    Cavalli for the integration of the prototype into

    the portal running on the server computer,

    thanks to Dr Paolo Ramieri for his role of

    CFD expert providing CFD knowledge, thanks

    to the anonymous reviewers for their valuable

    comments, and thanks to the users involved fortheir cooperation and feedback.

    References

    ADOMAVICIUS, G. and A. TUZHILIN (1999) User pro-

    filing in personalized applications through rule

    discovery and validation, in Proceedings of the Fifth

    ACM SIGKDD International Conference on Knowl-

    edge Discovery and Data Mining, San Diego, CA,

    377381.ADOMAVICIUS, G. and A. TUZHILIN (2001) Using data

    mining methods to build customer profiles, IEEE

    Computer, 34 (2), 7482.BILLSUS, D., C.A. BRUNK, C. EVANS, B. GLADISH and

    M. PAZZANI (2002) Adaptive interfaces for ubiqui-

    tous Web access, Communications of the ACM, 45

    (5), 3438.BRUSILOVSKY, P. and M.T. MAYBURY (2002) From

    adaptive hypermedia to the adaptive Web, Commu-

    nications of the ACM, 45 (5), 3033.

    CRABTREE, I.B. and S.J. SOLTYSIAK (1998) Identifying

    and tracking changing interests, International Jour-nal of Digital Libraries, 2, 3853.

    ESPOSITO, F., G. SEMERARO, S. FERILLI, M. DEGEM-

    MIS, N. DI MAURO, T.M.A. BASILE and P. LOPS

    (2003) Evaluation and validation of two approaches

    to user profiling, in Proceedings of the ECML=PKDD-2003 First European Web Mining Forum, B.

    Berendt, A. Hotho, D. Mladenic, M. van Someren,

    M. Spiliopoulou and G. Stumme (eds).

    FINK, J., J. KOENEMANN, S. NOLLER and I. SCHWAB

    (2002) Putting personalization into practice, Com-

    munications of the ACM, 45 (5), 4142.

    GODOY, D., S. SCHIAFFINO and A. AMANDI (2004)Interface agents personalizing Web-based tasks,

    Cognitive Systems Research, 5, 207222.

    GORRY, G.A. and G.O. BARNETT (1985) Experience

    with a model of sequential diagnosis, in Computer-

    assisted Medical Decision Making, J.A. Reggia and S.

    Turhim (eds), Berlin: Springer, Vol. 1, pp. 206222.HECKERMAN, D.E., E.J. HORVITZ and B.N. NATHWA-

    NI (1992) Toward normative expert systems, Part I:

    The Pathfinder project, Methods of Information in

    Medicine, 31 (2), 90105.

    HIRSH, H., C. BASU and B.D. DAVISON (2000) Learn-

    ing to personalize, Communications of the ACM, 43(8), 102106.

    HORVITZ, E., A. JACOBS and D. HOVEL (1999) Atten-

    tion-sensitive alerting, in Proceedings of UAI 99

    Conference on Uncertainty in Artificial Intelligence,

    305313.

    JENSEN, F.V. (2001) Bayesian Networks and Decision

    Graphs, Berlin: Springer.

    KOSTOFF, R.N., J.A. DEL RIO, J.A. HUMENIK, E.O.

    GARCIA and A.M. RAMIREZ (2001) Citation mining:

    integrating text mining and bibliometrics for re-

    search user profiling, Journal of the American Society

    for Information Science and Technology, 52 (13),11481156.

  • 8/9/2019 Web Profiling Shannon 2006

    18/19

    MIDDLETON, S.E., D.C. DE ROURE and N.R. SHAD-

    BOLT (2001) Capturing knowledge of user preferences:

    ontologies in recommender systems, in Proceedings ofthe International Conference on Knowledge Capture

    K-CAP 2001, Victoria, B.C., Canada.

    MIDDLETON, S.E., N.R. SHADBOLT and D.C. DE

    ROURE (2003) Capturing interest through inference

    and visualization: ontological user profiling in

    recommender systems, in Proceedings of the Interna-

    tional Conference on Knowledge Capture K-CAP

    2003, Sundial Beach Resort, Sanibel Island, FL.MIDDLETON, S.E., N.R. SHADBOLT and D.C. DE

    ROURE (2004) Ontological user profiling in recom-

    mender systems, ACM Transactions on Information

    Systems, 22 (1), 5488.

    MUSSI, S. (2003) Providing websites with capabilities of

    one-to-one marketing, Expert Systems, 20 (1), 819.NASRAOUI, O., R. KRISHNAPURAM, A. JOSHI and T.

    KAMDAR (2002) Automatic Web user profiling and

    personalization using robust fuzzy relational clus-tering, in e-Commerce and Intelligent Methods, J.

    Segovia, P. Szczepaniak and M. Niedzwiedzinski

    (eds), Studies in Fuzziness and Soft Computing,

    Berlin: Springer.NOKELAINEN, P., H. TIRRI, M . MIETTINEN and T.

    SILANDER (2002) Optimizing and profiling users

    online with Bayesian probabilistic modeling, inProceedings of the NL 2002 Conference.

    PEARL, J. (1988) Probabilistic Reasoning in Intelligent

    Systems, San Mateo, CA: Morgan Kaufmann.ROUSSEAU, B., P. BROWNE, P . MALONE and M.

    OFOGHLu (2004) User profiling for content perso-nalization in information retrieval, in Proceedings 19th ACM Symposium on Applied Computing, SAC

    (Nicosia Cyprus).

    SCHIAFFINO, S.N. and A. AMANDI (2000) User profil-

    ing with case-based reasoning and Bayesian net-

    works, in Proceedings International Joint Confer-

    ence IBERAMIA-SBIA, 1221.

    SHANNON, C.E. and W. WEAVER (1949) The Mathe-

    matical Theory of Communication, Urbana, IL:

    University of Illinois Press.

    SOLTYSIAK, S.J. and I.B. CRABTREE (1998) Automaticlearning of user profiles towards the personaliza-

    tion of agent services, BT Technology Journal, 16 (3).

    VON WINTERFELDT, D. and W. EDWARDS (1986)Decision Analysis and Behavioral Research, Cam-bridge: Cambridge University Press.

    WONG, S.K.M. and C.J. BUTZ (2000) A Bayesian

    approach to user profiling in information retrieval,Technology Letters, 4 (1), 5056.

    The author

    Silvano Mussi

    Silvano Mussi graduated in physics in 1975 from

    the University of Milan, Italy. He worked at

    ITALTEL for ten years in the fields of software

    engineering and functional discrete simulations

    of real-time systems. He has been with CILEA

    since 1981. He has cooperated in research acti-

    vities with Milan Polytechnic and Brescia Uni-

    versity where for three academic years he

    was contract-professor of artificial intelli-

    gence. For over ten years he has been doingresearch in the fields of knowledge engineer-

    ing and expert systems. His current research

    interests address methods for providing web-

    sites with capabilities of decision-making and

    reasoning under conditions pervaded with un-

    certainty.

  • 8/9/2019 Web Profiling Shannon 2006

    19/19