computational journalism at columbia, fall 2013, lecture 5: hybrid filtering
TRANSCRIPT
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
1/24
Fron%ersof
Computa%onalJournalism
ColumbiaJournalismSchool
Week5:HybridFiltersOctober2,2013
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
2/24
Week5:HybridFiltering
FilteringCommentsbyVo%ng
User-itemrecommenda%onsystems
GeneralHybridFilters
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
3/24
FilteringComments
Thousandsofcomments,whatarethegoodones?
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
4/24
Commentvo%ng
Problem:pungcommentswithmostvotesattopdoesntwork.Why?
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
5/24
RedditCommentRanking
Hypothe%cally,supposeallusersvotedonthecomment,andvoutofNup-voted.Thenwecouldsort
bypropor%onp=v/Nofupvotes.
N=16
v=11
p=11/16=0.6875
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
6/24
RedditCommentRanking
Actually,onlynusersoutofNvote,givinganobserved
approximatepropor%onp=v/n
n=3
v=1
p=1/3=0.333
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
7/24
RedditCommentRanking
Limitedsamplingcanrankvoteswrongwhenwedont
haveenoughdata.
p=0.333
p=0.6875
p=0.75
p=0.1875
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
8/24
Randomerrorinsampling
Ifweobservepupvotesfromnrandomusers,whatisthedistribu%onofthetruepropor%onp?
Distribu%onofpwhenp=0.5
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
9/24
Confidenceinterval
Givenobservedp,intervalthattruephasa
probabilityoflyinginside.
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
10/24
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
11/24
Week5:HybridFiltering
FilteringCommentsbyVo%ng
User-itemrecommenda%onsystems
GeneralHybridFilters
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
12/24
User-itemmatrix
Storesra%ngofeachuserforeachitem.Couldalso
bebinaryvariablethatsayswhetheruserclicked,liked,
starred,shared,purchased...
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
13/24
User-itemmatrix
Nocontentanalysis.Weknownothingaboutwhatisineachitem.
Typicallyverysparseauserhasntwatchedeven1ofallmovies.
Filteringproblemisguessingunknownentryinmatrix.Highguessedvaluesarethingsuser
wouldwanttosee.
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
14/24
Filteringprocess
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
15/24
Howtoguessunknownra%ng?
Basicidea:suggestsimilaritems.
Similaritemsareratedinasimilarwaybymany
differentusers.
Remember,ra%ngcouldbeaclick,alike,a
purchase. UserswhoboughtAalsoboughtB... UserswhoclickedAalsoclickedB... UserswhosharedAalsosharedB...
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
16/24
Similaritems
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
17/24
Itemsimilarity
Cosinesimilarity!
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
18/24
Otherdistancemeasures
adjustedcosinesimilarity
Subtractsaveragera%ngforeachuser,tocompensate
forgeneralenthusiasm(mostmoviessuckvs.most
moviesaregreat)
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
19/24
Genera%ngarecommenda%on
Weightedaverageofitemra%ngsbytheirsimilarity.
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
20/24
Week5:HybridFiltering
FilteringCommentsbyVo%ng
User-itemrecommenda%onsystems
GeneralHybridFilters
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
21/24
DifferentFilteringSystems
Purealgorithmic:Newsblasteranalyzethetopicsinthedocuments.Noconceptofusers.
Puresocial:WhatIseeonTwierdeterminedbywhoI
follow.Nocontentanalysis.
Hybrid:Redditcommentsfilteredbyanalgorithmthattakesvotesasinput.
Hybrid:Itemsrecommendedbasedco-consump%onbyallusers.
Whatelseispossible?
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
22/24
ItemContent MyData OtherUsersData
Textanalysis,topic
modeling,clustering...
whoIfollow
whatIveread/liked
socialnetworkstructure,
otheruserslikes
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
23/24
Howtoevaluate/op%mizethefilter?
-
7/27/2019 Computational Journalism at Columbia, Fall 2013, Lecture 5: Hybrid Filtering
24/24
Howtoevaluate/op%mizethefilter?
Nelix:trytopredictthera%ngthattheusergivesamovieaerwatchingit.
Amazon:sellmorestuff. Googlewebsearch:humanratersA/Btesteverychange