a maximum entropy web recommendation system: combining collaborative and content

Upload: machinelearner

Post on 30-May-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 A Maximum Entropy Web Recommendation System: Combining Collaborative and Content

    1/6

    A M a x i m u m E n t r o p y W e b R e c o m m e n d a t i o n S y s t e

    C o m b i n i n g C o l l a b o r a t i v e a n d C o n t e n t F e a t u r e s

    X i n J i n , Y a n z a n Z h o u , B a m s h a d M o b a s h e r

    C e n t e r f o r W e b I n t e l l i g e n c e

    S c h o o l o f C o m p u t e r S c i e n c e , T e l e c o m m u n i c a t i o n , a n d I n f o r m a t i o n S y s t e m s

    D e P a u l U n i v e r s i t y , C h i c a g o , I l l i n o i s , U S A

    {xjin,yzhou,mobasher }@cs.depaul.edu

    A B S T R A C T

    Web users display their preferences implicitly by navigatingthrough a sequence of pages or by providing numeric rat-ings to some items. Web usage mining techniques are usedto extract useful knowledge about user interests from suchdata. The discovered user models are then used for a vari-ety of applications such as personalized recommendations.

    Web site content or semantic features of objects provide an-other source of knowledge for deciphering users needs orinterests. We propose a novel Web recommendation systemin which collaborative features such as navigation or ratingdata as well as the content features accessed by the usersare seamlessly integrated under the maximum entropy prin-ciple. Both the discovered user patterns and the semanticrelationships among Web objects are represented as sets of constraints that are integrated to t the model. In the caseof content features, we use a new approach based on LatentDirichlet Allocation (LDA) to discover the hidden seman-tic relationships among items and derive constraints usedin the model. Experiments on real Web site usage data setsshow that this approach can achieve better recommendationaccuracy, when compared to systems using only usage infor-mation. The integration of semantic information also allowsfor better interpretation of the generated recommendations.

    C a t e g o r i e s a n d S u b j e c t D e s c r i p t o r s

    H.2.8 [Database Management ]: Database ApplicationsData Mining ; I.2.6 [Articial Intelligence ]: Learning; I.5.1[Pattern Recognition ]: ModelsStatistical

    G e n e r a l T e r m s

    Algorithms

    K e y w o r d s

    Maximum Entropy, Recommendation, Web Usage Mining,User Proling

    P e r m i s s i o n t o m a k e d i g i t a l o r h a r d c o p i e s o f a l l o r p a r t o f t h i s w o r k f o r

    p e r s o n a l o r c l a s s r o o m u s e i s g r a n t e d w i t h o u t f e e p r o v i d e d t h a t c o p i e s a r e

    n o t m a d e o r d i s t r i b u t e d f o r p r o f i t o r c o m m e r c i a l a d v a n t a g e a n d t h a t c o p i e s

    b e a r t h i s n o t i c e a n d t h e f u l l c i t a t i o n o n t h e f i r s t p a g e . T o c o p y o t h e r w i s e ,

    t o r e p u b l i s h , t o p o s t o n s e r v e r s o r t o r e d i s t r i b u t e t o l i s t s , r e q u i r e s p r i o r

    s p e c i f i c p e r m i s s i o n a n d / o r a f e e .

    K D D 0 5 , A u g u s t 2 1 2 4 , 2 0 0 5 , C h i c a g o , I l l i n o i s , U S A .

    C o p y r i g h t 2 0 0 5 A C M 1 - 5 9 5 9 3 - 1 3 5 - X / 0 5 / 0 0 0 8 . . . $5 . 0 0 .

    1 . I N T R O D U C T I O N

    Web recommendation systems have been widely used tohelp users locate information on the Web. In many cases,Web users interests or preferences are not directly accessi-ble; instead, they are implicitly captured during users in-teraction with the Web site, such as navigating through asequence of pages, or reected through numeric ratings to

    items.Many data mining and machine learning approaches havebeen used in building Web recommendation systems. Suchsystems, generally take the Web users navigation or ratingdata as input, and applying data mining or machine learn-ing approaches to discover usage patterns that represent ag-gregate user models. When a new user comes to the site,his/her activity will be matched against these patterns tond like-minded users and select potential interesting itemsas recommendations.

    Typically, recommendation algorithms rely only on userinformation (collaborative features) such as navigational his-tory or rating data, and additional information such as con-tent features of the items, which may provide valuable sourceof complementary knowledge about users activities, is usu-ally ignored. By incorporating content information with ausers navigation or rating behavior, we may be able to gaina deeper understanding of her underlying interests. For in-stance, we may nd an association pattern page A pageB with high support and condence in navigation data, ora pattern such as movie C and movie D are always ratedsimilarly in rating data. The discovery of such patterns,by itself, does not explain the underlying reasons for theirexistence.

    To address this issue considerable work has been done toenhance traditional recommendation systems by integratingdata from other sources such as content attributes, link-age structure, and user demographics [13, 8, 14, 17, 5]. Inthese approaches, content information such as keywords orattributes of items, or user demographic data is collectedas well as the usage or rating data. In order to make useof available data sources, different combination methods aretried to make recommendations more effective and inter-pretable. Generally, an integrated approach is prefered dur-ing the mining or model learning phase to avoid subjectiveor ad hoc ways of combining evidence. This is also one of our main motivations behind the maximum entropy recom-mendation system introduced in this paper.

    Maximum entropy model is a powerful statistical modelwhich has been widely applied in many domains such as sta-

  • 8/14/2019 A Maximum Entropy Web Recommendation System: Combining Collaborative and Content

    2/6

    tistical language learning [15], information retrieval [4] andtext mining [10]. The goal of a maximum entropy model isto nd a probability distribution which satises all the con-straints in the observed data while maintaining maximumentropy. One of the advantages of such a model is that itenables the unication of information from multiple knowl-edge sources in one framework. Each knowledge source canbe considered as a set of constraints in the model. Fromthe intersection of all these constraints, a probability distri-bution with the highest entropy can be learned. Recently,we have seen applications of maximum entropy in Web rec-ommendation systems [12, 18, 6]. In these systems, onlystatistics from users navigation or rating data are used asfeatures to train a maximum entropy model. Therefore, theadvantage of integrating multiple knowledge sources is notfully exploited.

    In this paper, we propose a novel Web recommendationsystem, in which users navigational or rating data and thesemantic content features associated with items are seam-lessly integrated under the maximum entropy principle. First,we discover statistics from users navigation data and usethem as one set of constraints. Secondly, for content infor-mation, we use a new approach based on Latent Dirichlet

    Allocation (LDA) [1] to discover the hidden semantic rela-tionships among visited or rated items and specify anotherset of constraints based on these item association patterns.The LDA model is a proven approach that can uncover thehidden relationships among co-occurring objects. An exam-ple of such hidden patterns are topics connecting docu-ments and words [1], and the author-topic model in [9]. Inour framework, the two sets of constraints are combined tot the maximum entropy model, based on which we generatedynamic recommendations for active users.

    2 . R E C O M M E N D A T I O N S B A S E D O N M A X -

    I M U M E N T R O P Y

    In this section, we present our maximum entropy recom-mendation model and its algorithm. The scalability of thissystem is also discussed.

    2 . 1 N o t a t i o n

    In this paper, we will use the following notations:

    U = {u 1 , u 2 , . . . , u m }: a set of m users.

    T = {t 1 , t 2 , . . . , t n }: a set of n pages/items.

    A = {a 1 , a 2 , . . . , a l }: a set of l attributes for pages/itemswithin a site. In a content-intensive site, these at-tributes can be keywords from the site. While in adatabase-driven site, each web page/item represents

    an object which corresponds to a record in the under-lying database. In this case, we extract the attributesof these records. Each page/item then can be repre-sented as an attribute vector t j = < a j1 , a j2 , > .

    For navigation data, each user is represented as a se-quence of visited pages as: u i = < t i1 , t i2 , > , wheret ik T . We use binary representation here, assumingeach page is either visited or not visited. (Our modelcan be easily extended to handle non-binary data aswell.)

    For rating data, each user is represented as a set as:u i = {< t i1 , r i1 >,< t i2 , r i2 > , . . . }, where t ik T and r iktakes ordinal values, such as from {1,2,3,4,5}.

    H (u i ): recent navigation/rating history of user i , asubset of u i . Since the most recent activities havegreater inuence on reecting a users present inter-ests, we only use the last several pages to represent ausers history.

    2 . 2 T h e M a x i m u m E n t r o p y M o d e l

    The system consists of two components. The offline com-ponent accepts constraints from the navigation/rating dataand Web site content information, and estimates the modelparameters. The online part reads an active user sessionand runs the recommendation algorithm to generate recom-mendations (a set of pages or rating predictions for unrateditems) for the user.

    We use the following approach to generate predictions orrecommendations for active users. In case of navigationdata, the conditional probability P r (t d |H (u i )) of a page t dbeing visited next given a users recent navigational historyH (u i ). In case of rating data, the conditional probabilityP r

    (< t

    d, r

    d>

    |H

    (u

    i )) of an itemt

    d receiving ratingr

    d nextgiven a users recent rating history H (u i ).In our model, we use two sources of knowledge about Web

    users navigational behavior, namely features based on item-level usage patterns, and features based on item contentassociations. Features will be presented in Section 3. Basedon each feature f s , we represent a constraint as:

    u i

    t d T

    P r (t d |H (u i )) f s (H (u i ) , t d ) =

    u i

    f s (H (u i ), D (H (u i )))

    (1)

    where D (H (u i )) denotes the page following u i s history inthe training data. Each constraint species that the ex-pected value of each feature w.r.t. the model distributionshould always equal its observation value in the trainingdata. After we have dened a set of features, F = {f 1 , f 2 , , f t }and accordingly, generated constraints for each feature, itsguaranteed that, under all the constraints, a unique distri-bution exists with maximum entropy [3]. This distributionhas the form:

    P r (t d |H (u i )) =exp (

    s s f s (H (u i ) , t d ))Z (H (u i ))

    (2)

    where Z (H (u i )) =

    t d T P r (t d |H (u i )) is the normaliza-tion constant ensuring that the distribution sums to 1, and s are the parameters needed to be estimated . Thus, eachsource of knowledge can be represented as features (and

    constraints) with associated weights. By using Equation 2,all knowledge sources (represented by various features) aretaken into account to make predictions about users nextaction given their past navigation.

    There have been several algorithms which can be appliedto estimate the s. Here we use the Sequential ConditionalGeneralized Iterative Scaling (SCGIS) [2], which seems to bevery efficient. To boost efficiency techniques such as dimen-sionality reduction using clustering [2, 11], efficient trainingalgorithms [7], and automatic feature selection methods canbe used.

  • 8/14/2019 A Maximum Entropy Web Recommendation System: Combining Collaborative and Content

    3/6

    2 . 3 A l g o r i t h m f o r G e n e r a t i n g R e c o m m e n d a -

    t i o n s

    After we have estimated the parameters associated witheach feature, we can use Equation 2 to compute the probabil-ity of any unvisited page t i being visited next given certainusers history, and then pick the pages with highest proba-bilities as recommendations, i.e. we compute the conditionalprobability P r (t i |u a ) given an active user u a . Then we sort

    all pages based on the probabilities and pick the top N pagesto get a recommendation set. The algorithm is as follows:Input: an active user session u a , parameters s estimated

    from the training data.Output: a list of N pages sorted by probability of being

    visited next given u a .

    1. Consider the last pages of the active user session, foreach page t i that does not appear in the active session,assume it is the next page to be visited, and evaluateall the features based on their denitions. (In thispaper, we use bigram as usage features, so we onlylook at the last visited page. Apparently, high-orderpage transitions can be used as features as well, withhigher computation complexity.)

    2. Using Equation 2 to compute P r (t i |u a ).

    3. Sort all the pages in descending order of P r (t i |u a ) andpick the top N pages to get a recommendation set.

    In case of making a rating prediction for a target item t d ,we consider all rated items with their ratings as this usershistory H (u i ). Then we consider each possible rating (1-5)for the target item and evaluate all the features. We choosethe rating r d with the highest probability P r (< t d , r d >|H (u i )) as predicted rating for t d .

    3 . I D E N T I F I C A T I O N O F F E A T U R E S

    Under maximum entropy principle, each knowledge sourceis considered as a set of features and constraints imposed onthe model. Thereafter, we will introduce our methods of identifying features from both usage and content informa-tion.

    3 . 1 F e a t u r e s B a s e d o n U s a g e / R a t i n g I n f o r -

    m a t i o n

    Users interests and preferences are implicitly embeddedin their navigation/rating activities on some Web pages/items.Here we discover page/item associations from users naviga-tion/rating data as features.

    For navigation data, since the order of pages beingvisited conveys important information, we dene fea-tures based rst-order Markov page transitions. Foreach page transition t a t b where P r (t b |t a ) ( isa pre-dened threshold), we dene a feature as:

    f t a ,t b (H (u i ), t d ) =

    1 if page t a is the last pagein H (u i ), and t b = t d ,

    0 otherwise

    For rating data, generally the order of ratings is notimportant. We have to resort to other approaches to

    select highly correlated item pairs as features. Here weuse item rating similarity to measure the associationsbetween items. To offset the difference of rating scales,we adopt the Adjusted Cosine Similarity measure:

    sim (t a , t b ) =

    m

    k =1(r ka r k ) (r kb r k )

    m

    k =1(r ka r k ) 2

    m

    k =1(r kb r k ) 2

    where, r ka denotes user ks rating on item a , and r kstands for the average rating of user k .

    For each item-pair < t a , t b > , where sim (t a , t b ) (is the user-specied threshold, used to lter out someless related item pairs), we dene a feature as:

    f t a ,r a ,t b ,r b (H (u i ) , t b , r b ) =

    1 if user rated item t awith rating r a anditem t b with rating r b ,

    0 otherwise

    3 . 2 F e a t u r e s B a s e d o n C o n t e n t I n f o r m a t i o n

    Web site content information provides another kind of in-formation of users interests and preferences. For example,in a university site, if a user visited several pages which con-tain frequent words like admission, application, we mayguess this user was interested in applying for admissions intoa program. In a movie site, high ratings for Indiana Jonesand Air Force One may suggest that a user is a HarrisonFords fan and enjoys Action-Adventure movies. We exploitsuch content features in our maximum entropy model. Intext-oriented Web sites, content features may be terms orphrases extracted from pages. In sites with an underlyingrelational schema among object (e.g, products or movies),features can be semantic attributes associated with the ob- jects.

    Generally, considering all the attributes in a site is notfeasible, due to the huge number of attributes and the ir-relevancy and redundancy of many features. In this paper,we propose an attribute selection method based on the La-tent Dirichlet Allocation (LDA) model [1], rst proposedin the context of text mining, used to discover the hiddentopics underlying a corpus. Here, we assume there exista set of hidden variables underlying the item-attribute co-occurrence data. The hidden variables can be thought of as classes or types of these items, with each item be-ing probabilistically associated with multiple classes. Thecomplete probability model is:

    Dirichlet ( ) (3)z i | t i Discrete (t i ) (4)

    Dirichlet ( ) (5)a j |z i , z i Discrete (z i ) (6)

    Here, z represents a set of hidden classes. t i denotesitem t i s association with multiple classes and z i speciesclass z i as a distribution over attributes. and arehyperparameters of the prior of and .

  • 8/14/2019 A Maximum Entropy Web Recommendation System: Combining Collaborative and Content

    4/6

    We use the Variational Bayes technique to estimate eachitems association with multiple classes ( ), and the as-sociations between classes and attributes ( ). As shownin [1], these classes themselves can be used as derived at-tributes for text classication task. We observe that mostitems are only strongly associated with one class (basedon vector for each user), so we assign each item to itsdominant class. Therefore, each class comprises of aset of items which are semantically similar. To better inter-pret these classes, we can easily identify their prominentattributes based on .

    After assigning each item to a class, we can dene ourfeatures. In the case of navigation data, for each item pair< t a , t b > where t a and t b both belong to the same classz , we dene a feature function as:

    f t a ,t b ,z (H (u i ), t d ) =

    1 if page t a is the last pagein H (u i ), t b = t d ,

    0 otherwise

    In the case of rating data, similarly, if t a and t b bothbelong to the same class z , we dene a feature as:

    f t a ,r a ,t b ,r b ,z (H (u i ) , t b , r b ) =

    1 if user rateditem t a with rating r a ,item t b with rating r b ,

    0 otherwise

    4 . E X P E R I M E N T S

    In this section, we report our experiments conducted ontwo data sets with different characteristics. One data set isusage data associated with a real estate Web site (togetherwith the semantic attributes associated with the real estateproperties). The second data set is data set is the EachMovierating data often used in the context of collaborative ltering(together with content attributes associated with movies).

    4 . 1 D a t a S e t s a n d E v a l u a t i o n M e t r i c s

    The primary function of the Real Estate site is to allowprospective buyers to visit various pages and informationrelated to some 300 residential properties. The portion of the Web usage data during the period of analysis containedapproximately 24,000 user sessions from 3,800 unique users.The data was ltered to limit the nal data set to those usersthat had visited at least 3 properties. We also extracted thecontent attributes related to each property including price,number of bedrooms, size, school district, etc. We refer tothis data set as the Realty data. The navigation data israndomly divided into 10 training and test sets to be usedfor cross-validation.

    Our evaluation metric for this data set is called Hit Ratio

    and is used in the context of top- N recommendation frame-work: for each user session in the test set, we take the rst K pages as a representation of an active session to generate atop- N recommendations. We then compare the recommen-dations with page K + 1 in the test session, with a matchbeing considered a hit . We dene the Hit Ratio as the totalnumber of hits divided by the total number of user sessionsin the test set. Note that the Hit Ratio increases as thevalue of N increases. Thus, in our experiments, we pay spe-cial attention to smaller number recommendations (between1 and 10) that result in good hit ratios.

    Movie Class 8

    Movie Name Attribute Probability

    Mighty Morphin Power Rangers Genre: Adventure 0.290

    Stargate Genre: Action 0.151

    2001: A Space Odyssey Genre: Sci-Fi 0.087

    20,000 Leagues Under the Sea A: James Doohan 0.023

    E.T.: The Extrater restrial A: Walter Koenig 0.023

    Raiders of the Lost Ark A: William Shatner 0.023

    A Clockwork Orange D: Steven Spielberg 0.023

    Back to the Future A: DeForest Kelley 0.020

    Jaws

    Star Trek: Generation

    Star Trek: The Motion Picture

    and other Star Trek series

    Movie Class 11

    Movie Name Attribute Probability

    Sense and Sensibilit y Genre: Romance 0.073

    Restoration Genre: Drama 0.039

    I.Q. A: Hugh Grant 0.034

    Bitter Moon A: Meg Ryan 0.034

    Four Weddings and a Funeral A: Emma Thompson 0.027

    The Englishman Who Went up a Hill

    But Came Down a Mountain

    A: Billy Crystal 0.023

    The Remains of the Day A: Roger Ashton-Griffiths 0.023

    Sirens

    Sleepless in Seattle A: Rob Reiner 0.012

    When Harry Met Sally

    Top Gun

    Courage Under Fire

    Figure 1: Examples of movie classes and their top at-tributes

    The Movie data set contains a total of 2,811,983 numericratings (rating scale 1-6) of 72916 users for 1628 differentmovies. We extracted movie content information from theInternet Movie Database (http://www.imdb.com), includ-ing Movies genre, director, cast, etc. We kept 900 moviesthat has complete content information, and chose 5000 userswho had rated at least 20 movies. The evaluation metric forthis data set is the standard Mean Absolute Error (MAE).Specically, for each user-movie pair in test set, we make aprediction for the target movie and compute the absolutedeviation between the actual rating and the predicted rat-ing. MAE is dened as the sum of all the deviations dividedby the total number of predictions. Note that lower MAEvalues represent higher recommendation accuracy.

    4 . 2 E x a m p l e s o f I t e m G r o u p s

    As stated in Section 3, we use LDA to identify item groupsfrom content information. Here we run LDA on property-attribute matrix and movie-attribute matrix respectivelyand assign each item to one dominant class. To illustratethese item groups, we list the top attributes (based on ) aswell as the items of each class.

    Figure 1 depicts two movie classes from 20 classes we gen-

    erate. For each class, we show the movies, top attributesand corresponding probability ( ) associated with the class.We can see Class 8 consists of mostly Sci-Fi movies, in-cluding the Star Trek series. Naturally attributes like Ad-venture, Action, Sci-Fi have dominant probabilities.Actors from Star Trek series and director Steven Spielbergalso appear on top of this list. Movie Class 11 seems to be agroup of romance comedy movies, acted by Hugh Grant andMeg Ryan. As we can see attributes Romance, Drama,Hugh Grant and Meg Ryan have highest probabilities.

    In this example, we see that a user has given high rat-

  • 8/14/2019 A Maximum Entropy Web Recommendation System: Combining Collaborative and Content

    5/6

    Figure 2: Examples of real estate property classes andtheir top attributes

    ing (6) to Sense and Sensibility, and the system correctlypredicts the users rating (6) for Four Weddings and aFuneral. From the training data, we nd that these twomovies have relatively high rating similarity (0.65). At thesame time, the prediction algorithm notices that the con-tent features involving them also tend to have high weights,which indicate they belong to the same movie group andshare some common attributes.

    Similarly, Figure 2 depicts the top attributes associateswith two of the discovered classes among real estate prop-erties. In each case, the (normalized) weight indicating theimportance of the attribute in the class is shown on the left.It can be seen that Class 1 represents relatively new 2-storyproperties in $200K-$300K range with 4-5 bedrooms all of which are located in the WDM school district. On the otherhand, Class 2 represents much smaller and older propertieswith 1-2 bedrooms. The most prominent attribute in thisgroup is the DSM school district.

    4 . 3 Q u a n t i t a t i v e E v a l u a t i o n

    For Realty data, to exploit the ordering informationin users navigation data, we built another recommenda-tion system based on the standard rst-order Markov modelto predict and recommend which page to visit next. TheMarkov-based system models each page as a state in a statediagram with each state transition labeled by the condi-

    tional probabilities estimated from the actual navigationaldata from the server log data. It generates recommendationsbased on the state transition probability of the last page inthe active user session.

    Figure 3 depicts the comparison of the Hit Ratio measuresfor these two recommender systems in Realty data. Theexperiments show that the maximum entropy recommenda-tion system has a clear overall advantage in terms of accu-racy over the rst-order Markov recommendation system onthis data set.

    For the movie data, we implement a standard item-based

    Figure 3: Recommendation accuracy: maximum en-tropy model vs. Markov model

    MAE Comparison

    ItemCF 1.13

    Maximum

    Entropy

    1.08

    Figure 4: Recommendation accuracy: maximum en-tropy model vs. Item-Based CF

    collaborative ltering algorithm (Item-based CF) [16] to gen-erate predictions for users in test data for comparison. Foreach user we took 50% of the data as training data, andthe rest as test data. For Item-based CF, we tried differentneighbor sizes and used the best result achieved (with 40neighbors). For maximum entropy model, we ran LDA togenerate 20 movie groups and dene features as in Section 3.

    The results are shown in Figure 4. By actually lookingat the cases where maximum entropy system makes betterprediction, we nd that in most cases, item=based CF algo-rithm can not nd similar item neighbors (highest similarityis only about 0.50-0.60), which makes the predictions basedon item neighbors less accurate.

    One of the problems associated with some traditional rec-ommendation algorithms emanated from the sparsity of datasets to which they are applied. Data sparsity has a negativeimpact on the accuracy and predictability of recommenda-tions. This is one area in which, we believe, the integrationof semantic knowledge with ratings data can provide signi-cant advantage. Here we created multiple training/test datasets in which the proportion of the training data to the com-plete ratings data set was changed from 90% to 10% (Foreach user, we randomly select certain percentage of ratings

    as training data and the rest as test data). These propor-tions have a direct correspondence with the level of sparsityin the ratings data.

    Figure 5 indicates that the advantage of our system overthe item-based CF system tends to be more distinctive whenthe data gets sparser. One possible explanation is that whenthe training data gets sparser, we are less likely to nd movieneighbors with very high rating similarity and make accuratepredictions based on neighbor movies ratings. At this time,for maximum entropy recommendation system, some fea-tures based on movie rating similarity also get lower weights,

  • 8/14/2019 A Maximum Entropy Web Recommendation System: Combining Collaborative and Content

    6/6

    Figure 5: Impact of Data Sparsity on Recommendations

    but features based on movie content similarity will be lessaffected and still contribute remarkable weights to make ac-curate predictions.

    5 . C O N C L U S I O N

    In this paper, we have proposed a novel Web recommenda-tion system in which users navigation or rating data as wellas the content features accessed by the users are seamlesslyintegrated under the maximum entropy principle. In thecase of content information, we use a new approach basedon Latent Dirichlet Allocation (LDA) to discover the hiddensemantic relationships among items. The resulting modelsare used to generate dynamic recommendations for activeusers. Experiments show that this approach can achievebetter recommendation accuracy with small number of rec-ommendations. Furthermore, the proposed framework pro-vides reasonable accuracy in predictions in the face of highdata sparsity.

    6 . R E F E R E N C E S

    [1] D. Blei, A. Ng, and M. Jordan. Latent dirichletallocation. Journal of Machine Learning Research ,3:9931022, 2003.

    [2] J. Goodman. Sequential conditional generalizediterative scaling. In Proceedings of NAACL-2002 , 2002.

    [3] F. Jelinek. Statistical Methods for Speech Recognition .MIT Press, MA, 1998.

    [4] J. Jeon and R. Manmatha. Using maximum entropyfor automatic image annotation. In Proceedings of theInternational Conference on Image and VideoRetrieval (CIVR-2004) , 2004.

    [5] X. Jin, Y. Zhou, and B. Mobasher. A unied approachto personalization based on probabilistic latentsemantic models of web usage and content. InProceedings of the AAAI 2004 Workshop on SemanticWeb Personalization (SWP04) , San Jose, CA, 2004.

    [6] X. Jin, Y. Zhou, and B. Mobasher. Task-oriented webuser modeling for recommendation. In Proceedings of the 10th International Conference on User Modeling (UM05) , UK, April 2005.

    [7] R. Malouf. A comparison of algorithms for maximumentropy parameter estimation. In Proceedings of the

    Sixth Conference on Natural LanguageLearning(2002) , 2002.

    [8] B. Mobasher, H. Dai, T. Luo, Y. Sun, and J. Zhu.Integrating web usage and content mining for moreeffective personalization. In E-Commerce and WebTechnologies: Proceedings of the EC-WEB 2000 Conference , Lecture Notes in Computer Science(LNCS) 1875, pages 165176. Springer, September2000.

    [9] M.Steyvers, P. Smyth, M. Rosen-Zvi, and T. Griffiths.Probabilistic author-topic models for informationdiscovery. In Proceedings of the International Conference on Knowledge Discovery and Data Mining ,2004.

    [10] K. Nigram, J. Lafferty, and A. McCallum. Usingmaximum entropy for text classication. InProceedings of IJCAI-1999 , 1999.

    [11] D. Pavlov, E. Manavoglu, D. Pennock, and C. Giles.Collaborative ltering with maximum entropy. IEEE Intelligent Systems, Special Issue on Mining the WebActionable Knowledge , 2004.

    [12] D. Pavlov and D. Pennock. A maximum entropyapproach to collaborative ltering in dynamic, sparse,high-dimensional domains. In Proceedings of Neural Information Processing Systems(2002) , 2002.

    [13] M. Pazzani. A framework for collaborative,content-based and demographic ltering. Articial Intelligence Review , 13(5-6):393408, 1999.

    [14] A. Popescul, L. Ungar, D. Pennock, and S. Lawrence.Probabilistic models for unied collaborative andcontent-based recommendation in sparse-dataenvironments. In Proceedings of 17th UAI , Seattle,WA, 2001.

    [15] R. Rosenfeld. Adaptive statistical language modeling:A maximum entropy approach. Phd dissertation,CMU, 1994.

    [16] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl.

    Item-based collaborative ltering recommendationalgorithms. In Proceedings of the 10th International WWW Conference , Hong Kong, May 2001.

    [17] K. Yu, A. Schwaighofer, V. Tresp, W. Ma, andH. Zhang. Collaborative ensembling learning:Combining collaborative and content-basedinformation ltering. In Proceedings of 19th UAI , 2003.

    [18] C. Zitnick and T. Kanade. Maximum entropy forcollaborative ltering. In Proceedings of 20th International Conference on Uncertainty in Articial Intelligence (UAI04) , Banff, Canada, July 2004.