humanities research recommendations via …cs229.stanford.edu/proj2016spr/poster/008.pdfempirical...
TRANSCRIPT
Humanities Research Recommendations via Collaborative Topic ModelingNitya Mani and Andy Chen
Overview of Recommendation Algorithms
Recommendation Algorithms:
Research Recommendations:• Datasets: CS/STEM research publications• Content-Focused: keywords imply readership• Filtering: extensive user feedback required• Hybrid: imbalanced + large datasets needed
Content-Based:• Item Keywords• Probabilistic
Topic Modeling• Cluster Analysis• ScientiOic Articles• Music + Movies
Filtering:• User Ratings• Nearest neighbor• Implicit + Explicit• Current News• Shopping + Social
Networks
Hybrid:• Collaborative• Knowledge-Based• Content-Based• Weighted• Mixed• NetOlix
Dataset
Humanities research publications• Topic modeling less effective• International Journal of Comparative Psychology
Small amounts of user feedback• Author-driven interest• CiteULike user libraries
Goal• Adapt hybrid modeling algorithm• Effective for little + no user feedback
Collaborative Filtering: Matrix Factorization
Setup• I users U={u1,....,uI} and J items V={v1,....,vJ}• Userirecommendsitemj:rij=1(else0)• Fix hyperparameters:λu,λv
Collaborative Topic Regression
Model Overview:1. Users have interests
(implicit article recs)2. Documents have topics
(LDA) some of which explain readership
Initialize:1. Foreachuseri=1,...,Iui~N(0,1/λuIK)
2. Foreachitemj=1...,Jvj~N(θj,1/λvIK)
3. Assumerij~N(uiTvj,cij-1)
Learning:1. Model latent document
vector with content2. Find MAP estimate of U,V,R (coordinate ascent)
3. Minimize regularized LS
Coordinate Ascent:
Empirical Study: Simulating User Feedback
• Often no access to user feedback• Simulate user-item interactions to improve recs• Users: lists of original recommendations• Updated using CTR and cross-validation• International Journal of Comparative Psychology• 4827articles,580“users”,20recs/user
Empirical Study: Humanities Research
• Sparser datasets (fewer users, recommendations)• Topic models less accurate/relevant• Non-content-focused abstracts• CiteULike: Users studying Eastern and European
languages, History, Linguistics, Classics, Politics• 1269 articles, 220 users, 715 user-item interactions
Making and Validating Recommendations
• Article Recommendation• Recommendation rating is expected value of uiTvj• Provide at least 10 recommendations if user has provided atleast20 recommended articles
• Ranking Articles• Rank articles by the predicted recommendation uiTvj• Chose prediction bar 0.75/0.9 (conOidence to recommend)
• Recommendation Validation• Precision
• Predicts the hidden original article (simulating user feedback)• Predicts relevant witheld recommendations
• Recall• Recalls the original provided user-item interactions with high
conOidence (rating over 0.9)
Data Overview + Analysis
Simulating Implicit User Feedback• Hyperparametersearch:optimalprecision+recallatK=100,λu=0.01,λv=0.1,cij=1,0.001
• 97%accuracyinrecommendingthehiddenoriginalarticlewith>0.9conLidenceandwithintop10recommendations
• 99.9%recallinratingprovidedrecommendationswithintop20andconLidence>90%
• 95%precisioninrelevanceofrandomsampleofrecommendations
CiteULikeHumanitiesResearchers• Hyperparametersearch:K=40,λu=0.01,λv=100,cij=1,0.01• Evaluateaccuracyonuserswith>20recommendations• 92%accuracyfortraininguser-articlerecommendations• Predicted64%ofentireuserrecommendations(halfhidden)–extremelyunlikelybychance~Bin(20,1/715)
• Precisionbasedonrandomsample:89%(forbothpredictionbars)
Applications + Current Work
• ApplicationofLDA+CFcanimprovecontent-basedrecommendationsforplatformswithoutaccesstouserfeedback
• Hybridmodelscaneffectivelyrecommendonsmalldatasets• Articleswithlargeproportionofout-of-vocab,non-Englishwords• Currentprojectwork:• Diversifyingtopicsinarticledataset• RunningLDAonintroductionratherthanabstract• ApplyingHMMwithLDAratherthanusingbag-of-words• Updatingparametersbasedonarticlecitationsandauthors
Sample Data(CiteULike)
Eastern Languages Users:
Probabilistic Topic Modeling: Latent Dirichlet Allocation
Setup• M documents W1,....,WM in corpus• K topics β1,β2,...,βK ; distribution over vocabulary V• Fix hyperparameters α,β,ξ
InitializeforeachWi:1. Wordlength:Ni ~Poisson(ξ)
2. Topicdistribution:θj ~Dirichlet(α)overKtopics
Foreachwordwi∈Wi:1. Chooseatopic:zij ~Multinomial(θi)
2. Choosetheword:wij ~p(wij |βzij)conditionontopic
Maximizelikelihood:1. Givenvalueofparameterα
2. EMalgorithmtolearnβ1,...,βKandtopicsθ1, ..., θM
Initialize:1. Foreachuseri=1,...,Iui~N(0,1/λuIK)
2. Foreachitemj=1...,Jvj~N(0,1/λvIK)
Foralluserpairs(i,j):1. Assignaratingrij~N(uiTvj,1/cij)
2. FixprecisionparameterscijtoreLlectconLidence
OptimizeU,V:1. Minimizeregularizedleastsquarederroroveralluser-articlepairs
2. PredictratinguiTvj
Sample Data (UserSimulation)
User6wq9p6zn
RatingconLidencepercentofprovidedandwitheldarticlerecommendations
(K=25,λu,λv=0.01,cij=1,0.01)
Titles of Provided Article Recommendations Class Rank
Play and Developmental Outcomes in Infant Siblings of Children with Autism + 1
Teaching to Play or Playing to Teach: An examination of play targets and generalization in two interventions for children with autism
+ 3
The Development of Strain Typical Defensive Patterns in the Play Fighting of Laboratory Rats
+ 6
A Novel Teacher Implemented Protocol to Assess Early Social Communication and Play Skills in Preschool Children with Autism
+ 7
Role of Peers in Cultural Innovation and Cultural Transmission: Evidence from the Play of Dolphin Calves
+ 9
A normative model of peer review: qualitative assessment of manuscript reviewers’ attitudes towards peer review
– 10
Impacts of Ferry Terminals on Juvenile Salmon Movement along Puget Sound Shorelines
– 14
Securing Resources in Collaborative Environments: A Peer-to-peer Approach – 15
Peer-mediated inference making intervention for students with autism spectrum disorders
– 16
Towards Distributed Data Collection and Peer-to-Peer Data Sharing – 17
Titles of Witheld Article Recommendations Class Rank
Pretend Play of Young Children in North Tehran: A Descriptive Cultural Study of Children's Play and Maternal Values
+ 2
More than a Child’s Work: Framing Teacher Discourse about Play + 4
Integrated Drama Groups: Promoting Symbolic Play, Empathy, and Social Engagement With Peers in Children with Autism
+ 5
Comparing Object Play in Captive and Wild Dolphins + 19
Development of “Anchoring” in the Play Fighting of Rats: Evidence for an Adaptive Age-Reversal in the Juvenile Phase
+ 20
Normative model of peer review - Qualitative assessment – NP
Strategic defense and the global public good – NP
Gender-Typed Play Behavior in Early Childhood: Adopted Children with Lesbian, Gay, and Heterosexual Parents
+ 28/NP
Japan’s Defense White Paper as a Tool for Promoting Defense Transparency – NP
Normative model of peer review - Qualitative assessment – NP
Titles of New Article Recommendations Class Rank
The Development of Juvenile-Typical Patterns of Play Fighting in Juvenile Rats does not Depend on Peer-Peer Play Experience in the Peri-Weaning Period
+ 8
Sacred Playground: Adult Play and Transformation at Burning Man + 11
Altruism in Animal Play and Human Ritual + 12
How Studies of Wild and Captive Dolphins Contribute to our Understanding of Individual Differences and Personality
+ 13
The Behavioral Development of Two Beluga Calves During the First Year of Life + 18
LDA Topic Model
LDATopicModelVisualizationforK=25(CiteULikeHumanities
Research)Sample Topics (IJCP)• 'health risk methods factors'• 'cultural american historical'• 'expression genetic function'• 'species patterns california populations habitat'• 'brain activity neural cell'• 'public policy economic state'
AccuracyontrainingandtestingdatawithvariednumbersoftopicsK