preference amplification in recommender systems
TRANSCRIPT
Preference Amplification in Recommender SystemsDimitris Kalimeris
Harvard University
Smriti Bhagat
Shankar Kalyanaraman
Udi Weinsberg
ABSTRACTRecommender systems have become increasingly accurate in sug-
gesting content to users, resulting in users primarily consuming
content through recommendations. This can cause the userβs inter-
est to narrow toward the recommended content, something we refer
to as preference amplification. While this can contribute to increased
engagement, it can also lead to negative experiences such as lack of
diversity and echo chambers. We propose a theoretical framework
for studying such amplification in a matrix factorization based rec-
ommender system. We model the dynamics of the system, where
users interact with the recommender systems and gradually βdriftβ
toward the recommended content, with the recommender system
adapting, based on user feedback, to the updated preferences. We
study the conditions under which preference amplification mani-
fests, and validate our results with simulations. Finally, we evaluate
mitigation strategies that prevent the adverse effects of preference
amplification and present experimental results using a real-world
large-scale video recommender system showing that by reducing
exposure to potentially objectionable content we can increase user
engagement by up to 2%.
CCS CONCEPTSβ’ Information systemsβ Recommender systems; β’ Comput-ing methodologies β Learning from implicit feedback.
KEYWORDSRecommender systems; echo chambers; filter bubbles; fixed point
ACM Reference Format:Dimitris Kalimeris, Smriti Bhagat, Shankar Kalyanaraman, and Udi Weins-
berg. 2021. Preference Amplification in Recommender Systems. In Proceed-ings of the 27th ACM SIGKDD Conference on Knowledge Discovery and DataMining (KDD β21), August 14β18, 2021, Virtual Event, Singapore. ACM, New
York, NY, USA, 11 pages. https://doi.org/10.1145/3447548.3467298
1 INTRODUCTIONRecommender systems have grown to dominate how people navi-
gate through and consume the vast amounts of content that exists
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a
fee. Request permissions from [email protected].
KDD β21, August 14β18, 2021, Virtual Event, Singapore.Β© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-8332-5/21/08. . . $15.00
https://doi.org/10.1145/3447548.3467298
on the Internet. As users interact with the recommendations they
are given, they provide feedback that is used by the system to
improve its understanding of their preferences, and subsequently
improve the quality of future recommendations. This iterative pro-
cess of fine-tuning new recommendations based on user feedback
results in a feedback loop. Such feedback loops may occasionally
cause negative experiences for users.
For the purpose of illustration, consider a user who is curious
about videos that might be considered objectionable. If a recom-
mender system repeatedly presents them with such videos it will
likely receive positive feedback and learn about the userβs prefer-
ence for this content. Hence, over time the system may converge
to primarily showing them objectionable videos, and most impor-
tantly, the user also might end up actively seeking and liking such
content. These phenomena of progressive reinforcement of oneβs
own views as a result of the feedback loop as well as the narrowing
exposure to different types of content, have been referred to as echochambers and filter bubbles respectively, and have received much
recent attention [18, 24, 31, 38, 39]. Given the impact that recom-
mender systems have on our lives, it is important to understand
when recommendation dynamics result in echo chambers or filter
bubbles, and how to mitigate any negative experiences to the user.
In this work, we study echo chambers from a theoretical point
of view. We consider a matrix factorization-based recommender
system and discuss the formation of feedback loops, resulting from
users exhibiting a drift (or affinity) toward certain item categories
recommended to them. Essentially, drift captures the intuition that
a user might become more interested in an item after exposure
to it, for example after reading an article they really enjoyed they
might want to read more similar articles showing a shift of their
preferences towards that particular article content.
The most natural solution concept that allows us to predict the
long-term behavior of the dynamical interaction between the rec-
ommender system and a user is that of a stable fixed point, i.e. asituation in which the user preferences do not change in response
to the systemβs recommendations. We prove that under mild as-
sumptions such a stable fixed point does not exist at all, implying
that if users experience drift towards certain item categories, re-
peated exposure to content from those categories could result in
increasing the userβs preference for them, absent any intervention.
Moreover, this is true even if the prevalence of these item categories
is low. We then apply this theoretical formulation in simulations
involving integrity scenarios, where the item categories could be
representative of problematic content. We show using both syn-
thetic and real-world datasets that if the user has initial affinity
Figure 1: : Interaction between the user and the system. Thesystem usesπ’π‘ to recommend item π₯π‘ to the user. The systemthen updates its parameters π’π‘ to reflect on the feedback π¦π‘ .The user also slightly shifts their preferences ππ‘ after inter-acting with the item that was recommended.
towards problematic content, the system will move towards show-
ing more problematic content to them. We propose strategies for
mitigating these outcomes, and present results from simulations as
well as investigations on real-world datasets. Finally, we present
supporting evidence of these strategies from a large user experi-
ment run on a real world content recommender system and observe
an improvement in metrics used to measure user engagement. In
summary, our contributions are
(1) a theoretical characterization of the dynamic interactions
between a user and the recommender system,
(2) evidence through simulations on synthetic and real-world
data of the divergence in user preferences and the existence
of echo chambers in matrix factorization systems,
(3) results from a large user experiment conducted at Facebook
showing how applying personalized mitigation can increase
user engagement.
1.1 Related workThe study of feedback loops and echo chambers in recommender
systems has received a lot of attention [19, 23, 29] and has been cata-
logued in a survey by Chen et al. [8] who discuss feedback loops and
present a taxonomy for biases that exist in recommender systems.
In investigating how users interact with recommender systems,
some earlier work has focused on the static (single snapshot) per-
formance of the system [11, 36, 37], while others havemodeled these
interactions in a dynamic, sequential setting, seeking to understand
the effect that the past ratings of a user have on the future content
that is available to them [1, 5, 7, 12, 13, 25, 40, 41]. Along the same
line of work, in [28, 32] the authors propose a general framework
to predict the long-term behavior of machine learning models in
cases where the model predictions directly influence the observed
data. Recommender systems broadly fall in that category; however
many real systems, for example matrix factorization ones, do not
satisfy the requirements of their framework, thereby warranting a
different approach. Echo chambers have also been extensively stud-
ied in recent years in a variety of contexts besides recommender
systems, like online social networks [3, 6, 9, 10, 15] and opinion
dynamics [4, 22, 33, 35], with content diversity being proposed most
commonly to address user polarization [2, 21, 23, 27, 30].
Finally, the work that is conceptually closest to ours is that of
Jiang et al. [20]. The authors provide a theoretical model to study
echo chambers, which however, fails to capture core aspects of real
recommender systems. First, they do not consider any systematic
way through which the recommender system recommends items
to the user (their assumption is that each item is recommended
infinitely often). Second, in their model items are independent
from each other in the sense that reacting to an item does not
provide any information about the possible reaction to a similar
item. This assumption violates the essence of how collaborative
filtering works. In this work, we study similar phenomena in a more
complete and realistic setting, where there is an underlying matrix
factorization-based recommender system.
1.2 OutlineWe formalize the model for the recommender system and the user
behavior in Section 2. In Section 3, we present our main theoretical
results showing the lack of stable fixed points in the user-system
interaction, hinting at the existence of echo chambers in matrix-
factorization systems. In Section 4, we evaluate our results through
simulations on real and synthetic data. We discuss mitigation strate-
gies in Section 4.2, and describe a real-world large user experiment
applying these strategies conducted at Facebook in Section 4.3.
2 PROBLEM SETTINGConsider a collection ofπ items that are available in a platform.
At each time step π‘ , a recommender system presents an item to
a user and receives feedback π¦ β {0, 1} indicating whether they
liked, clicked on, watched, etc. the item that was suggested. This
feedback is utilized to improve subsequent recommendations. We
provide an illustration of the user-system interaction in Figure 1
and a notation summary in Table 1.
Linear Scores.Ourwork focuses on stochastic matrix factorization
recommender systems. In such systems, each item is assigned a
linear score π π₯ = π’β€π₯ , where π’, π₯ β Rπ are feature vectors for the
user and for the item, respectively. Here π βͺ π, i.e. the user and
item representation is low-dimensional. Following standard logistic
regression modeling, the system then predicts the probability of the
user liking an item π₯ at time π‘ as ππ‘π₯ = π (π π‘π₯ ), where π (Β·) denotesthe sigmoid function. We assume that the item features are a static
description of the item, i.e., they correspond to scores coming from
some classifiers or represent fixed characteristics of the item that
do not change after an interaction with the user. The user features
on the other hand are dynamic and are updated based on their
feedback, to reflect the true user preferences more accurately. For
example, for a movie recommendation, π₯ can describe the genre of
the movie, the duration, the language, etc., while π’ describes how
much does the user values each of these. As a result, the scores for
items are time-varying: π π‘π₯ = π’β€π‘ π₯ for each time step π‘ .
Stochastic recommendation.We consider stochastic recommender
systems, a relaxation of the classical Top-k recommender system
[12]. Such a system maps the score of each item to a selection
probability through the softmax function. Hence, higher scorescorrespond to higher selection probabilities. Specifically, at each
iteration, an item π₯ is chosen as π₯ βΌ ππ‘ , where ππ‘ :
ππ‘ (π₯) =ππ½π
π‘π₯β
π₯ β² ππ½π π‘
π₯β²(1)
Table 1: Notation Summary
Symbol Meaning
π number of items
π’π‘ β Rπ inferred user feature vector at time π‘
π₯ β Rπ item feature vector (static)
π π‘π₯ score of item π₯ at time π‘ : π π‘π₯ = π’β€π‘ π₯ππ‘π₯ = π (π π‘π₯ ) predicted probability that the user will like item π₯
ππ‘ (π₯) true probability that the user will like π₯ at time π‘
ππ‘ softmax of item scores at time π‘
ππ‘ (π₯) probability of recommending π₯ at time π‘
π½ > 0 system sensitivity, determines the βsteepnessβ of ππ‘πΎ β (0, 1) sensitivity parameter of the user
πΌπ‘,π₯ rescaling coeff. for item π₯ at time π‘ (see Def. 1)
Notice that the selection probability of an item ππ‘ (π₯), which corre-
sponds to the probability of presenting π₯ to the user, is different
than the probability ππ‘π₯ which is the probability that the user likes
item π₯ upon been presented with it. Using different values for the
sensitivity parameter π½ in ππ‘ we can induce different behaviors in
the system1. In particular, high π½ results in a steep ππ‘ where only
the items with the highest scores have non-negligible probability
of been recommended. In particular, if π½ β β then ππ‘ is exactly
a Top-1 recommender system. On the other hand, small values of
π½ lead to a more uniform behavior where all items have a similar
probability of being presented to the user. See Section 4 for more
details on the effect of π½ and the shape of ππ‘ (e.g. Fig 6).
System update. Upon receiving feedback π¦π‘ for the item π₯π‘ that
was recommended to the user, the system updates π’π‘ to account
for it, attempting to learn the true user preferences accurately. We
assume that the update of π’π‘ happens through gradient descent, a
standard algorithm with strong performance guarantees [17] that is
employed in practice for online learning with convex loss functions.
Specifically, at iteration π‘ , the expected update in the system is:
π’π‘+1 = π’π‘ β [π‘ Β· Eπ₯βΌππ‘[βπ’ β (π’π‘ , (π₯,π¦))
](2)
where β is the negative log-likelihood loss function2 β (π’, (π₯,π¦)) =
βπ¦ lnπ (π’β€π₯) β (1 β π¦) Β· ln(1 β π (π’β€π₯)).Remark. Notice that in (2), the gradient of the loss is inside the
expectation and their order cannot be exchanged since the distribu-
tion ππ‘ according to which π₯ is chosen depends on π’π‘ . This is one of
the main reasons why the analysis of such a dynamical model dif-
fers from the analysis of the gradient descent dynamics in standard
settings; the distribution of our data (theπ items in the system in
our case) is dynamically adapting and depends on the parameters
of the model that we utilize in the respective iteration.
Fixed points. The most natural solution concept to characterize
the interaction between the user and a dynamically-adjusting rec-
ommender system is that of a fixed point, i.e., a user vector π’β suchthat, if the recommender system presents items according to the
1We call π½ sensitivity parameter because it determines how βsensitiveβ will ππ‘ be to
changes of the score. The smaller the π½ , the more uniform its behavior.
2This loss is standard for classification problems with binary labels.
Figure 2: The shape of the drift function (on the left) indi-cates that the users do not reinforce their preferences foran item uniformly but rather, depending on howmuch theyalready like it. After some threshold (peak), this reinforce-ment fades since they already have a firm opinion for theparticular item (and similar ones) and as a result it is un-likely that they will significantly alter their preferences forit. On the right we see the difference between the true andthe predicted probability of liking an item, in red and bluerespectively. The difference between these two is attributedto the drift function. More vivid red corresponds to moresensitive users that are more aggressive in changing theirpreference for an item.
distribution π β induced by it (according to (1)), there will be no
more updates to the user vector π’π‘ ; the user and the system have
reached equilibrium. Formally, π’β corresponds to a fixed point if:
π’β = argmin
π’Eπ₯βΌπ β,π¦ [β
(π’, (π₯,π¦)
)]
Fixed points can be further categorized with respect to their long
term behavior. Specifically, a fixed point π’β is called:
β’ stable, if for every neighborhoodπ ofπ’β there exists a neigh-borhoodπ β²
of π’β such that π’π‘ β π for every π‘ if π’0 β π β², i.e.,
if we start close to π’β we will not move far from it.
β’ attracting, if there exists a neighborhoodπ of π’β for whichπ’π‘ β π’β if π’0 β π , i.e., we converge to π’β if we start close.
β’ asymptotically stable, if it is stable and attracting.
Naturally, asymptotically stable fixed points correspond to the
most desirable solution since they allow us to characterize and
predict the limit behavior of a time-evolving system easily. However,
it is easy to see that some assumptions on the user behavior are
required even for the user-system interaction to have a fixed point.
Otherwise, if the user arbitrarily changes the way they interact
with items in every iteration, the system will always try to adapt
by updating π’π‘ . In the next subsection, we formalize this idea by
making specific assumptions on the true user preferences.
Fixed points, echo chambers and filter bubbles. Note that thenotion of a fixed point is orthogonal to the ideas of an echo chamber
or a filter bubble. Echo chambers correspond to a situation where
the true preferences of a user are constantly reinforced leading to
polarization: the user likes more of what they previously endorsed
and less of what they did not. Similarly, in a filter bubble the system
actively narrows down the categories of items it presents to a user.
However, by definition at a fixed point there is no update on π’β,hence such reinforcement does not occur.
2.1 User behaviorThe behavior of the user is determined by a preference function
ππ‘ : Rπ β [0, 1], unknown to the recommender system. It denotes
their real preference for each item at time π‘ and is time-dependent,
meaning that it can be modified based on the recommendations that
the user receives or on exogenous factors. The user reacts positively
(like, watch, etc.) to an item π₯ that they are presented with at time
π‘ with probability ππ‘ , i.e., π¦ βΌ Bernoulli(πt (x)).Our main assumption is that the likelihood of a user reacting
to a particular item slightly increases if the score of the item is
positive and slightly decreases if it is negative. Recall our example
in Section 1, where users have a slight inclination to reinforce
their opinion, i.e., increase their preference towards articles that
they seem to correlate well with, and decrease it otherwise3. This
deviation in user behavior is formally modeled by a drift function.Specifically, a drift function is a function π : Rβ [β1, 1] that relatesthe userβs shift in the probability of liking an item, to its score π π‘π₯ .
The higher (resp. lower) the drift of a particular item, the more the
user will positively (resp. negatively) reinforce their opinion for this
item. In this work we will use π (π§) = πΎ Β· π§ Β· πβπ§2 , where πΎ β 2.33 is
just a normalizing constant to ensure that π (π§) β [β1, 1]. The shapeof this function along with its effect on the user preferences can be
seen in Figure 2. It captures core properties of the user dynamics
as explained in the caption. Similar functions have been used to
model opinion dynamics for example regarding agreement with
political parties [35].
Finally, we define user sensitivity to formalize the intuition that
in order to reach an equilibrium, the behavior of the user cannot be
decoupled from the recommender system. Essentially, sensitivity
constitutes a bound on how much the true preference function of a
user can deviate from the recommender systemβs prediction. The
magnitude of this deviation depends on the drift function. The name
βsensitivityβ refers to the fact that a sensitive user will significantly
reinforce their opinion after being presented with an item, i.e., they
are not firm in their preferences and are prone to influence by
the system/environment; the more sensitive they are (larger πΎ as
defined below), the higher the influence.
Definition 1 (πΎ-sensitive user). A user is πΎ-sensitive with re-spect to a drift function π , forπΎ β (0, 1), if for all time steps π‘ and itemsπ₯ the difference between the true user preferences and the perceivedpreferences by the recommender system is bounded as follows:
ππ‘ (π₯) β π (π π‘π₯ ) = πΎ Β· πΌπ‘,π₯ Β· π (π π‘π₯ )where:
πΌπ‘,π₯ =
{1 β π (π π‘π₯ ) , if π π‘π₯ β₯ 0
π (π π‘π₯ ) , if π π‘π₯ < 0
is a scaling factor to ensure that ππ‘ (π₯) β [0, 1] for all items π₯ .
In the following, we will slightly abuse the notation and refer
to the πΎ function associated with a πΎ-sensitive user as: πΎ (π π₯ ) =
πΎ Β· πΌπ‘,π₯ Β· π (π π₯ ) for all items π₯ .
3 EVOLUTION OF USER PREFERENCESIn the previous section, we defined asymptotically stable fixed
points and argued that they constitute the gold standard in the
3We measure this correlation using cosine similarity between π’π‘ and π₯ .
long-term behavior of a dynamically evolving recommender system,
ruling out echo chamber formation. Hence, the natural question is:
Are there asymptotically stable fixed pointsin the user-system interaction?
In this section, we present the main theoretical results of our
paper. We answer the above question negatively and formally prove
that the matrix factorization recommender system we defined will
not reach a fixed point, hinting at the existence of echo chambers.
We start by proving that under a mild condition on the items to be
recommended there is a unique fixed point of the system dynamics.
We defer the proof to Appendix A. All the results in this section
hold for any choice of learning rates {[π‘ }βπ‘=1.
Lemma 1 (Existence of Fixed Points.). Let π β RπΓπ bethe matrix of the items available for recommendation and ππβ€ =βπ₯ π₯π₯
β€ the respective covariance matrix. If ππππ (ππβ€) = π , i.e., thecovariance matrix has full rank, then π’β = 0 is the unique FP of (2).
In the following lemma, we answer the second part of the ques-
tion regarding the stability of that fixed point. Specifically, we prove
that the norm of the user vector π’π‘ always increases, ruling out the
possibility for π’β = 0 to be a stable equilibrium between the user
and the recommender system (Corollary 1).
Lemma 2. If ππππ (ππβ€) = π and π’0 β 0, then β₯π’π‘+1β₯ > β₯π’π‘ β₯for all π‘ . That is, the norm of the user preferences vector π’π‘ strictlyincreases in every iteration.
Proof. Following similar calculations to those in the proof of
Lemma 1 (shown analytically in Appendix A) we get
π’π‘+1 = π’π‘ β [π‘ Β· Eπ₯βΌππ‘[βπ’ β (π’π‘ , (π₯,π¦))
]= π’π‘ + [π‘ Β·
βπ₯
π₯ Β· P[π₯] Β· πΎ (π’β€π‘ π₯)
= π’π‘ + [π‘ Β·βπ₯
π₯ Β· P[π₯] Β· πΎ Β· πΌπ‘,π₯ Β· πΎ Β· (π’β€π₯) Β· πβ(π’β€π‘ π₯)2
The last equality follows by the definition of the πΎ function with
π (π§) = πΎ Β· π§ Β· πβπ§2 . To simplify notation, let ππ‘,π₯ = [π‘ Β· πΎ Β· πΎ Β· P[π₯] Β·πΌπ‘,π₯ Β· πβ(π’β€
π‘ π₯)2 . Notice that ππ‘,π₯ > 0 for all π‘, π₯ . Hence, we have:
π’π‘+1 = π’π‘ +(β
π₯
ππ‘,π₯ Β· π₯π₯β€)π’π‘
=(πΌπ +
βπ₯
ππ‘,π₯ Β· π₯π₯β€)π’π‘
Consider the π Γ π diagonal matrix with values ππ‘,π₯ in the
diagonal: πΆπ‘ = ππππ( [ππ‘,π₯π ]ππ=1). Then, we can rewrite
βπ₯ ππ‘,π₯ Β·
π₯π₯β€ = ππΆπ‘πβ€. Also, notice that all the terms ππ‘,π₯ > 0 for all π‘, π₯ ,
hence πΆπ‘ is positive definite. Now if ππππ (ππβ€) = π we have that
for any π£ β Rπ s.t. π£ β 0:
π£β€ππΆπ‘πβ€π£ = (πβ€π£)β€πΆπ‘ (πβ€π£) > 0,
since πβ€π£ = 0 has a unique solution π£ = 0. That is because
ππππ (ππβ€) = π β ππππ (πβ€) = π . This implies that ππΆπ‘πβ€is
positive definite and thus all of its eigenvalues are strictly positive.
Let π΄π‘ := πΌπ + ππΆπ‘πβ€, which is symmetric and hence diagonal-
izable as π΄π‘ = ππ‘Ξπ‘πβ€π‘ , where ππ‘ : orthogonal matrix. Moreover, all
its eigenvalues, i.e. the diagonal elements of Ξπ‘ , are strictly greater
than 1 for every π‘ , since they correspond to the eigenvalues of πΌπ(all 1) plus the eigenvalues of ππΆπ‘π
β€(all strictly positive).
Now, consider the transformation
π§π‘+1 = πβ€π‘ π’π‘+1, for t β₯ 1, with π§0 = π’0
β π§π‘+1 = πβ€π‘ π’π‘+1 = πβ€π‘ π΄π‘π’π‘ = π
β€π‘ ππ‘Ξπ‘π
β€π‘ π’π‘ = Ξπ‘π§π‘
Starting with π’0 = π§0 β 0 yields π§π‘ β 0 for all π‘ since Ξπ‘ : di-
agonal matrix with non-zero diagonal elements. Moreover, since
all the diagonal elements of Ξπ‘ are strictly greater than 1 we have
β₯π§π‘+1β₯ > β₯π§π‘ β₯. Now, since ππ‘ , ππ‘β1: orthogonal matrices we have
that β₯π§π‘+1β₯ = β₯πβ€π‘ π’π‘+1β₯ = β₯π’π‘+1β₯ and similarly β₯π§π‘ β₯ = β₯π’π‘ β₯, com-
pleting the proof of the lemma.
β‘
Corollary 1 (Stability of FPs.). Assume that ππππ (ππβ€) = π .Then, the unique fixed point π’β = 0 is not asymptotically stable.
The two lemmas above essentially show that the unique fixed
point is irrelevant as a solution concept of the user system inter-
action, hinting at the existence of echo chambers in matrix fac-
torization recommender systems. Moreover, in the case where the
features of the items in the system are transformed to be decor-related, i.e. have identity covariance matrix
4following the proof
of Lemma 2, we can formally prove the amplification of the item
scores. The proof is deferred to Appendix A.
Lemma 3 (Score amplification and EchoChambers). Ifππβ€ =
πΌπ , then and for every item π₯ s.t. π 0π₯ β 0 it holds |π π‘+1π₯ | > |π π‘π₯ | andπ π‘+1π₯ Β· π π‘π₯ > 0. That is, each score gets amplified with time.
In the next section we provide simulations to show that even if
the data is not decorrelated, i.e. ππβ€ β πΌπ , the average absolute
value of the score increases with time, essentially creating two well-
separated groups of items in the system, the ones the user likes and
the ones that they do not.
4 SIMULATIONSIn the previous section we proved that there is no asymptotically
stable fixed point in the user-system interaction, giving evidence
towards the existence of echo chambers and filter bubbles even in
simple matrix factorization recommender systems. In this section,
we conduct simulations on real and synthetic datasets to further
investigate these phenomena by looking at the evolution of the user
preferences π’π‘ and the shape of the recommender system ππ‘ , under
different model sizes π , system parameters π½ , user sensitivities πΎ ,
and item distributions.
Metrics. We use the following metrics to evaluate the behavior of
the recommender system and the evolution of the user preferences:
(1) Norm of π’π‘ . We expect that the β2 norm of the user vector
will be driven away from 0, leading to a shift in the user
preferences and a divergence of the item scores, as indicated
by Lemma 2.
4This is a common data preprocessing procedure, often applied for efficient optimiza-
tion and can be easily done by applying the transformation οΏ½ΜοΏ½ = πβ1 (π₯ β `) to eachitem π₯ , where ` = 1
π
βπ₯ π₯ and π is any matrix for which it holds ππβ€ = ππβ€
. Such
a matrix exists if ππππ (ππβ€) = π .
(2) Average probability per item type.We partition the items into
βlikableβ and βnot likableβ based on whether their assigned
score at time π‘ , π π‘π₯ is positive or negative. We measure the
average ππ‘π₯ for likable and not likable items.
(3) Probability mass on items correlating well with the initialpreference vector π’0. We consider the top 5% of items having
the highest cosine similarity with π’0. These are the items
the system initially believes the user will like. We report the
total probability mass assigned to them by ππ‘ for each π‘ .
Experimental Setup. We fix the dimension of the system to be
π = 15 but we found consistent results across a range of model
sizes (relevant plots can be found in Appendix B.1). We run each
simulation for 200 iterations for the synthetic datasets and for 750
for the real ones, where the user vector π’π‘ is updated as in equation
(2) with learning rate [π‘ = 1. In Appendix B.2 we include simi-
lar experiments using Stochastic Gradient Descent (SGD) updates,
which essentially corresponds to the case where the user receives
π recommendations at each time step. We consider a range of dif-
ferent π½ values π½ β {0.5, 1, 1.5, 2} that lead to different behavior
of the recommender system, as well as different user sensitivities
πΎ β {0.2, 0.4, 0.6, 0.8}. Below, we describe the datasets that we use.Synthetic datasets. We synthetically generate 100,000 items. The
features of each item are generated from a fixed distribution πitem.
We consider the following distributions:
β’ Uniform in the hypercube. Each item feature is generated inde-
pendently and uniformly in [β1, 1]: π₯ βΌ Uniform( [β1, 1]d).β’ Mixture of two uniforms. In many real platforms there is a di-
chotomy in the items present in the system, for example, reg-
ular content vs. content that violates some of the platformβs
policies. Also, some of the item features often correspond
to classifier scores, with higher values indicating that the
content is problematic (e.g. such classifiers can predict the
probability of violent content, nudity, etc.). We model this
effect using a mixture distribution. Specifically, with prob-
ability 1 β πΌ , π₯ βΌ Uniform( [β1, b]d) (regular items) while
with probability πΌ , π₯ βΌ Uniform( [b, 1]d) (problematic items
with high feature values). Here, πΌ is the mixture coefficient
and π is a threshold that we set to 0.8 for the simulations. We
set πΌ = 1% which corresponds to the estimated prevalence
of problematic content in real platforms, see e.g. [14].
For the synthetic datasets we initialize the user preference vector
π’0 βΌ Uniform( [βπ, π]d) for π = 0.3. Note that for small β₯π’0β₯ wehave π0 β πitem, i.e., the recommender systems select items almost
uniformly at random to present to the user.
Real datasets. We also consider the more realistic scenario of ob-
taining the initial user and the item features from real datasets. We
use the MovieLens 10M [16] and the Yahoo [42] datasets. Movielens
contains movie ratings for 69,878 users and 10,677 different movies,
while Yahoo song ratings for 5,400 surveyed users and 1,000 songs.
We selected 50 users with the highest number of ratings in each
case, corresponding to 10,208 movies for Movielens and 982 songs
for Yahoo. Since we use binary labels in our model, we mark each
movie/song that was ranked with at least 3.5/5 as 1, indicating
that the user liked it and the rest of them as 0. We used a Python
library [26] that employs kernel matrix factorization [34] to obtain
Figure 3: Evolution of β₯ππ β₯ with π‘ . In all cases the user preferences diverge as predicted by Lemma 2.
Figure 4: Evolution of the average probability of a likable vs a non-likable item with π . Solid lines indicate the likable items,whose probability of receiving a positive reaction from the user, ππ
π , is above 0.5, while the dashed lines indicate the non-likable items. The model size is π = 15 and πΈ = 0.5. The probability of clicking on an item moves towards 1 if its score ispositive and towards 0 otherwise.
Figure 5: Total probability mass assigned by ππ to the items with the highest cosine similarity with π0. The model size is π = 15andπΈ = 0.5. There is a clear tendency to recommend items that the systembelieves that the user already likes and this tendencybecomes more pronounced for higher values of π· , making ππ more steep and approximating a Top-1 recommender system.
the user and item features as π = π Β· π , where π is the binarized
ratings matrix, U is the matrix of initial features for each user and
π is the matrix of the item features.
4.1 ResultsWe evaluate the proposed metrics on the real and synthetic datasets.
For the synthetic datasets we repeat each simulation 10 times with
different vectors π’0, while for the real ones we iterate over each
of the 50 different subsampled users. We report the means and the
standard deviations of the run in Figs 3- 5, 7. Additional plots for
different model sizes and SGD updates are presented in Appendix B.
Echo chambers and filter bubbles. First, we observe that the
norm of π’π‘ clearly diverges with time for different models and
distributions as predicted by Lemma 2, indicating that a fixed point
is never reached. As a result, the scores for each item are pushed
towards the extremes as can be seen in Figure 4. Hence, every item
tends to be deterministically liked or not liked by the user, suggestingan echo chamber. Secondly, in Figure 5 we demonstrate the filter
Figure 6: The effect of different π· on the recommender systemππ . Consider the uniform distribution on the unit circle andπ0 > 0 coordinate-wise. The left two pictures show the dis-tribution ππ in the beginning and after 100 steps of interac-tionwith the user for π· = 0.5. Brighter colors indicate higherprobability mass. The two pictures on the right correspondto π· = 2. It is evident that for larger π· the transformation ofππ is much stronger with all of its mass concentrated arounditems with the highest possible scores that correlate wellwithπ’0. In otherwords, the system tries to recommend itemsthat have the highest probability of been liked. For smallerπ· the behavior is more explorative, and items with smallerscores have non-trivial probability of been recommended.
bubble effect that can occur in a matrix factorization recommender
system. Specifically, we see that a small subset of items that have
high cosine similarity with π’0, i.e., the system initially believes
that the user will like them, are the ones that will end up being
recommended with overwhelmingly high probability, narrowing
down the exposure to any other type of content.
The effect of model sensitivity π· . In Figures 3, 4 and 5 we plot
our metrics for different values of π½ and fixed πΎ = 0.5. For the syn-
thetic distributions, we see that higher values of π½ lead to slower
divergence of the objectionable user preferences π’π‘ while the op-
posite is true for the real ones. However, preference amplification,
and the convergence of ππ‘ to the items that correlate well with π’0are faster. This is explained by the fact that for higher values of π½ ,
the stochastic recommender system of (1) turns to a Top-1 recom-
mender system, and is therefore more likely to present the users
with items that they already liked. This leads to a stronger rein-
forcement of their preferences and increases the filter bubble effect.
Visually, this difference in the recommender system for different
values of π½ can be seen in Figure 6 (see also Fig. 14 in Appendix B)
where we plot ππ‘ , for two-dimensional items, for π‘ = 0 and π‘ = 100.
We use two different values of π½ : π½ = 0.5 and π½ = 2 and we initialize
with π’0 > 0 (coordinate-wise positive). It is evident that higher π½
leads to significantly higher concentration in the areas correspond-
ing to items with high initial scores, i.e., items with large positive
values in both coordinates (since π’0 > 0).
Effect of user-sensitivity πΈ . We consider π½ = 1 and varying
values forπΎ . In Figure 7 we include the relevant plots for the uniform
distribution and defer the results for the rest of the datasets to
Figure 13 in Appendix B. Users with higher πΎ , i.e., the ones that are
more prone to reinforce their previous opinion after been presented
with an item, are the ones for whom the norm of their preferences
vector diverges the fastest and they also experience the larger echo
chamber and filter bubble effects. This implies that while there is a
vulnerability in the recommender system, how much these effects
will manifest is ultimately up to the user, matching findings in [3].
Figure 7: Effect of user sensitivityπΈ . On the left we plot β₯ππ β₯,in the middle the average probability for likable (solid) andnon-likable items (dashed) and on the right the probabilitymass on the items that correlate well with π0. Themodel sizeis π = 15 and π· = 1. More sensitive users experience largermovements in ππ .
Figure 8: On the left we see the total probability that is as-signed to the Type-2, low-prevalence items. The differentlines correspond to different prevalence of such items in thesystem ranging from 1% to 0.1%. On the right we plot theprobability on the Type-1 items that have high cosine sim-ilarity, i.e., they correlate well, with the initial preferencevector π0. These items act as a βbridgeβ for the probability toconcentrate to the low-prevalence items.
4.2 Mitigation strategiesConsider the synthetic mixture distribution that we defined in the
beginning of the section to model two different types of content (e.g.
regular vs problematic). We will show experimentally, that even if
the prevalence of one of the two types is very low, the recommender
system is able to discover items of this type efficiently and present
them to a user that shows interest for them. While this is generally
a positive outcome, it can become an issue if the system contains
problematic content. In this section, we will discuss two strategies,
global and personalized demotion, to alleviate preference amplifi-
cation, and in Section 4.3, we show the effect of such mitigation on
the userβs experience for borderline problematic content in a real
large-scale recommender system.
Experimental Setup. Our basic mixture distribution consists of
100,000 items. They can be of Type-1 sampled asπ₯ βΌ Uniform( [β1, 0.8]π )with probabilityπΌ = 0.99, or Type-2 sampled asπ₯ βΌ Uniform( [0.8, 1]π )with probability πΌ = 0.01. We initialize π’0 > 0 (coordinate-wise
positive), to focus on users that have high correlation with the low-
prevalence Type-2 content in the system, hence they are interested
in these items the most. We repeat each simulation 10 times.
Global Action. In this case, we actively remove items to reduce
the prevalence of the Type-2 content in the system. We use two
metrics to evaluate how the recommender system adapts to that:
(1) Probability mass on Type-2 items.(2) Probability mass on Type-1 items that have high cosine simi-
larity with the initial preference vector π’0. These are Type-1items that the system initially believes that the user likes.
In Figure 8 we adjust the count of Type-2 items in the system so that
their prevalence, would be πΌ% for πΌ β {1, 0.75, 0.5, 0.25, 0.1}. It canbe seen that the probability mass of having a low-prevalence item
presented to the user keeps increasing as the user interacts with
the system more, no matter how low the actual prevalence of thatcontent is. This indicates that matrix factorization recommender
systems are able to surface items if they are relevant to the user.
Additionally, we focus on Type-1 items that appear attractive to
the user in the beginning and observe that they act as a βbridgeβ
for the probability mass of ππ‘ to transfer to the low-prevalence
Type-2 items. Intuitively, in a real platform these could correspond
to borderline problematic items that introduce the user to more
problematic content.
Personalized Action. A recommender system applying person-
alized demotion avoids showing the user excessive amounts of
a particular type of content without explicitly removing it from
the system. In other words, it specifically targets the personalized
recommendation system ππ‘ of a user, and demotes the selection
probability of certain item types. A way to achieve this is to use
more reluctant updates of the preference vector π’π‘ upon receiving
user feedback. For example, a decaying learning rate schedule can
be employed in (2) to slow down the preference adjustment and
discourage βoverfittingβ to the users interests, hence preventing
the creation of a filter bubble, or even a different learning rate
for each particular feature π whose magnitude is inversely propor-
tional to (π’π‘ )π . In Figure 9 we consider four different learning rate
schedules for the system update (2). In each of these cases the [π‘is adjusted every π‘ππ π = 20 iterations. We use π‘//π‘ππ π to denote
integer division:
(1) constant learning rate: [π‘ = 1,
(2) decaying learning rate: 1/(1 + π‘//π‘ππ π )(3) decaying learning rate: 1/
(1 +
βπ‘//π‘ππ π
)(4) feature specific learning rate: In this case the learning rate
corresponds to the matrix [π‘ = ππππ[ (
1
(π’π‘ )π)ππ=1
], yielding
different update for each coordinate. In particular the values
for large coordinates are modified at a slower rate.
It can be seen that more reluctant updates in the preferences
vector π’π‘ mitigate the echo chamber and filter bubble effects, since
the system is not actively changing the content that presents to the
user, hence keeping it more diverse despite feedback loops.
4.3 Mitigation in Facebook recommendionsWe now present an experiment that we conducted to study miti-
gation in a real-world large-scale video recommender system in
Facebook. While this recommender system is complex and beyond
the scope of this work, we describe a few relevant details here. Users
Figure 9: Evolution of the user preferences vector ππ andthe recommender system for different learning rates. Morereluctant updates in the preferences vector ππ mitigate theecho chamber and filter bubble effects.
Figure 10: The difference in the average impressions of bor-derline problematic content viewed by treatment and con-trol groups. As time progresses, users that received the treat-ment see progressively less problematic content.
can engage with videos in several ways including liking, comment-
ing and flagging them as being βproblematicβ. Videos have several
features. The most relevant to our experiment is the probability
that the video contains borderline nudity. We note that this content
is mild and does not violate any enforcement standards; however, it
may be considered problematic especially after repeated exposure.
We first identify users that are repeatedly exposed to such con-
tent. Specifically, using a classifier, we compute, for each user, the
percentage of problematic videos (classifier output above a thresh-
old) that they were exposed to. If it averages over 25% for a week,
we reduce their exposure to this content for future sessions, hence
attempting to reduce the echo chamber effect proactively and in-
crease the diversity of the content recommended to them.
In the previous section, we illustrated personalized mitigation
by altering the learning rate of the recommender system. However,
changing the parameters of the recommender system is infeasible
for a system in production. Instead, we adopt a simple personalized
mitigation by showing lesser borderline content. This is achieved
by lowering the score of borderline problematic content (an instan-
tiation of ππ‘ (π₯)) by multiplying it with a factor of 0.1 for users in the
experiment. The factor of 0.1 was chosen after careful consideration
of the item score distribution.
We ran this experiment for three weeks, with 37K users in each
of the treatment and control groups, where the users in treatment
received the mitigation. Figure 10 shows the difference in the aver-
age impressions of borderline problematic content viewed by users
in control and those in treatment. At the start of the experiment,
the two groups viewed about the same amount of problematic con-
tent, however, over time, the treatment group saw less, with the
difference being statistically significant. Moreover, we observed
that the overall interaction with content (likes, comments, etc.) by
users in the treatment group increased by up to 2%. In conclusion,
this experiment shows that echo chambers related to specific item
features can create a negative user experience. Explicitly mitigating
such effects can have an overall positive impact.
5 DISCUSSION AND FUTUREWORKIn this paper we examined the ability of stochastic matrix factoriza-
tion recommender systems to reinforce the user preferences creat-
ing echo chambers and filter bubbles of content, from a theoretical
point of view. This work serves as a step towards understanding
and addressing such vulnerabilities with several future directions
to explore further. A key question is whether we can prove quan-
titative rates for the divergence of β₯π’π‘ β₯, hence understanding theexact dependence of echo chamber formation on the parameters of
the system. Secondly, a natural question is what is the largest class
of true preference functions for the user ππ‘ (Β·) for which there is no
asymptotically stable fixed point? Finally, an interesting effect that
we need to take into account when applying mitigation strategies,
is whether there are trade-offs in user engagement with the system.
In Section 4.3, we provide preliminary evidence that this is not the
case in real systems but further investigation is required.
REFERENCES[1] Guy Aridor, Duarte Goncalves, and Shan Sikdar. 2020. Deconstructing the Filter
Bubble: User Decision-Making And Recommender Systems. In Fourteenth ACMConference on Recommender Systems (RecSys β20).
[2] Γigdem Aslay, Antonis Matakos, Esther Galbrun, and Aristides Gionis. 2018.
Maximizing the Diversity of Exposure in a Social Network. In IEEE InternationalConference on Data Mining, ICDM 2018, Singapore, November 17-20, 2018.
[3] E. Bakshy, S. Messing, and L. A. Adamic. 2015. Exposure to ideologically diverse
news and opinion on Facebook.. In Science. 348:1130β1132.[4] S. Banisch and E. Olbrich. 2019. Opinion polarization by learning from social
feedback. The Journal of Mathematical Sociology 43, 2 (2019), 76β103.
[5] Andrea Barraza-Urbina and Dorota Glowacka. 2020. Introduction to Bandits in
Recommender Systems. In Fourteenth ACM Conference on Recommender Systems(RecSys β20). Association for ComputingMachinery, New York, NY, USA, 748β750.
[6] Stephen Bonner and Flavian Vasile. 2018. Causal embeddings for recommendation.
In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018.[7] Djallel Bouneffouf, Amel Bouzeghoub, and Alda Gançarski. 2012. A Contextual
Bandit Algorithm for Mobile Context-Aware Recommender System. NeuralInformation Processing Systems.
[8] Jiawei Chen, Hande Dong, XiangWang, Fuli Feng, MengWang, and Xiangnan He.
2020. Bias and Debias in Recommender System: A Survey and Future Directions.
CoRR abs/2010.03240 (2020). arXiv:2010.03240
[9] Uthsav Chitra and Christopher Musco. 2020. Analyzing the Impact of Filter
Bubbles on Social Network Polarization. In Proceedings of the 13th InternationalConference on Web Search and Data Mining (Houston, TX, USA) (WSDM β20).
[10] Matteo Cinelli, Gianmarco De Francisci Morales, Alessandro Galeazzi, Walter
Quattrociocchi, and Michele Starnini. 2020. Echo Chambers on Social Media: A
comparative analysis. CoRR abs/2004.09603 (2020). arXiv:2004.09603
[11] Mark A. Davenport, Yaniv Plan, Ewout van den Berg, and Mary Wootters. 2012.
1-Bit Matrix Completion. CoRR abs/1209.3672 (2012). arXiv:1209.3672
[12] SarahDean,Mihaela Curmei, and Benjamin Recht. 2020. Designing Recommender
Systems with Reachability in Mind. In Participatory Approaches to Machine Learn-ing workshop at ICML.
[13] Sarah Dean, Sarah Rich, and Benjamin Recht. 2020. Recommendations and user
agency: the reachability of collaboratively-filtered information. In FAT* β20: ACMConference on Fairness, Accountability, and Transparency, Barcelona, Spain.
[14] Facebook. 2020. Community Standards Enforcement Report. (2020). https:
//transparency.facebook.com/community-standards-enforcement
[15] Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, and Michael
Mathioudakis. 2018. Political Discourse on Social Media: Echo Chambers, Gate-
keepers, and the Price of Bipartisanship. In Proceedings of the 2018 World WideWeb Conference, WWW 2018, Lyon, France.
[16] F. Maxwell Harper and Joseph A. Konstan. 2015. TheMovieLens Datasets: History
and Context. 5, 4 (2015).
[17] Elad Hazan. 2019. Introduction to Online Convex Optimization. CoRRabs/1909.05207 (2019). arXiv:1909.05207
[18] Homa Hosseinmardi, Amir Ghasemian, Aaron Clauset, David Rothschild, Marine
Mobius, and Duncan Watts. 2020. Evaluating the scale, growth, and origins of
right-wing echo chambers on YouTube. https://arxiv.org/abs/2011.12843
[19] Olivier Jeunen. 2019. Revisiting offline evaluation for implicit-feedback recom-
mender systems. In Proceedings of the 13th ACM Conference on RecommenderSystems. 596β600.
[20] Ray Jiang, Silvia Chiappa, Tor Lattimore, AndrΓ‘s GyΓΆrgy, and Pushmeet Kohli.
2019. Degenerate Feedback Loops in Recommender Systems. In Proceedings of theAAAI/ACM Conference on AI, Ethics, and Society, AIES 2019, Honolulu, HI, USA.
[21] Marius Kaminskas and Derek Bridge. 2016. Diversity, Serendipity, Novelty, and
Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in
Recommender Systems. ACM Trans. Interact. Intell. Syst., Article 2 (Dec. 2016).[22] Tyll Krueger, Janusz SzwabiΕski, and Tomasz Weron. 2017. Conformity, Anti-
conformity and Polarization of Opinions: Insights from a Mathematical Model of
Opinion Dynamics. Entropy 19, 7 (Jul 2017), 371.
[23] Matevz Kunaver and Tomaz Pozrl. 2017. Diversity in recommender systems - A
survey. Knowl. Based Syst. 123 (2017), 154β162.[24] Mark Ledwich and Anna Zaitsev. 2020. Algorithmic extremism: Examining
YouTubeβs rabbit hole of radicalization. First Monday (02 2020).
[25] Y. Lei and W. Li. 2019. When Collaborative Filtering Meets Reinforcement
Learning. ArXiv abs/1902.00715 (2019).
[26] Kernel Matrix Factorization Library. 2020. https://pypi.org/project/matrix-
factorization/
[27] Antonis Matakos and Aristides Gionis. 2018. Tell me Something My Friends
do not Know: Diversity Maximization in Social Networks. In IEEE InternationalConference on Data Mining, ICDM 2018, Singapore, November 17-20, 2018. IEEEComputer Society, 327β336.
[28] Celestine Mendler-DΓΌnner, Juan C. Perdomo, Tijana Zrnic, and Moritz Hardt.
2020. Stochastic Optimization for Performative Prediction. In Annual Conferenceon Neural Information Processing Systems 2020, NeurIPS 2020, virtual.
[29] Judith MΓΆller, Damian Trilling, Natali Helberger, and Bram van Es. 2018. Do not
blame it on the algorithm: an empirical assessment of multiple recommender
systems and their impact on content diversity. Information, Communication &Society 21, 7 (2018), 959β977.
[30] Tien T. Nguyen, Pik-Mai Hui, F. Maxwell Harper, Loren Terveen, and Joseph A.
Konstan. 2014. Exploring the Filter Bubble: The Effect of Using Recommender
Systems on Content Diversity. In Proceedings of the 23rd International Conferenceon World Wide Web (WWW β14).
[31] Jack Nicas. 2018. How Youtube drives people to the internetβs darkest corners.
Wall Street Journal, February 2018 (2018). https://www.wsj.com/articles/how-
youtube-drives-viewers-to-the-\internets-darkest-corners-1518020478
[32] Juan C. Perdomo, Tijana Zrnic, Celestine Mendler-DΓΌnner, and Moritz Hardt.
[n.d.]. Performative Prediction. In Proceedings of the 37th International Conferenceon Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event.
[33] Rocha L.E.C. Perra, N. 2019. Modelling opinion dynamics in the age of algorithmic
personalisation. Sci Rep (2019). https://doi.org/10.1038/s41598-019-43830-2
[34] Steffen Rendle and Lars Schmidt-Thieme. 2008. Online-Updating Regularized
Kernel Matrix Factorization Models for Large-Scale Recommender Systems. In
Proceedings of the 2008 ACM Conference on Recommender Systems (Lausanne,Switzerland) (RecSys β08).
[35] David Sabin-Miller and Daniel M. Abrams. 2020. When pull turns to shove: A
continuous-time model for opinion dynamics. Phys. Rev. Research 2 (Oct 2020),
043001. Issue 4. https://doi.org/10.1103/PhysRevResearch.2.043001
[36] J. Ben Schafer, Dan Frankowski, Jonathan L. Herlocker, and Shilad Sen. 2007.
Collaborative Filtering Recommender Systems. In The Adaptive Web, Methods andStrategies of Web Personalization (Lecture Notes in Computer Science). Springer.
[37] J. Ben Schafer, Joseph A. Konstan, and John Riedl. 2001. E-Commerce Recom-
mendation Applications. Data Min. Knowl. Discov. 5, 1/2 (2001), 115β153.[38] Joan E. Solsman. 2018. Youtubeβs AI is the puppet master over most of what you
watch. CNET (2018). www.cnet.com/news/youtube-ces-2018-neal-mohan/
[39] Zeynep Tufekci. 2018. Youtube, the great radicalizer. The New York Times, March(2018). www.nytimes.com/2018/03/10/opinion/sunday/youtube-politics-radical
[40] Berk Ustun, Alexander Spangher, and Yang Liu. 2019. Actionable Recourse in
Linear Classification. In Proceedings of the Conference on Fairness, Accountability,and Transparency (Atlanta, GA, USA) (FAT* β19). Association for Computing
Machinery, New York, NY, USA, 10β19. https://doi.org/10.1145/3287560.3287566
[41] Zeng Wei, Jun Xu, Yanyan Lan, Jiafeng Guo, and Xueqi Cheng. 2017. Reinforce-
ment Learning to Rank with Markov Decision Process.
[42] Yahoo! 2006. R3 β Music ratings for User Selected and Randomly Selected songs.
https://webscope.sandbox.yahoo.com/catalog.php?datatype=r
A OMITTED PROOFSProof of Lemma 1. In order for π’π‘ to be at an exact fixed point
we need
π’π‘+1 = π’π‘ β Eπ₯,π¦[ββ
(π’π‘ , (π₯,π¦)
) ]= 0
In the following, we write βπ’ (π₯,π¦) instead of β(π’π‘ , (π₯,π¦)
)to sim-
plify notation. Using the definition of the negative log-likelihood
function β , we have that
Eπ₯,π¦[ββπ’ (π₯,π¦)
]=βπ₯,π¦
P[π₯]P[π¦ |π₯]ββπ’ (π₯,π¦)
=βπ₯
P[π₯](P[π¦ = 1|π₯]ββπ’ (π₯, 1)
+P[π¦ = 0|π₯]ββπ’ (π₯, 0))=βπ₯
π₯ Β· P[π₯] Β·(π (π’β€π‘ π₯) β π (π’π‘ , π₯)
)= β
βπ₯
π₯ Β· P[π₯] Β· πΎ (π’β€π‘ π₯)
where the last line follows by the definition of a πΎ-sensitive user.
Based on the definition of the functionπΎ we can expand the equation
as follows: βπ₯
π₯ Β· P[π₯] Β· πΎ Β· πΌπ’π‘ ,π₯ Β· (π’β€π‘ π₯) Β· πβ(π’β€π‘ π₯)2 = 0
Letting ππ‘,π₯ = πΎ Β· P[π₯] Β· πΌπ’π‘ ,π₯ Β· πβ(π’β€π‘ π₯)2 we have:(β
π₯
ππ’,π₯ Β· π₯π₯β€)π’π‘ = 0
The above system has unique solutionπ’π‘ = 0 iff
βπ₯ ππ‘,π₯ Β·π₯π₯β€ has
full rank. Consider theπ Γπ diagonal matrix with values ππ‘,π₯ in
the diagonal:πΆπ‘ = ππππ( [ππ‘,π₯π ]ππ=1). We can rewrite
βπ₯ ππ‘,π₯ Β· π₯π₯β€ =
ππΆπ‘πβ€. Also, notice that all the terms ππ‘,π₯ > 0 for all π‘, π₯ , henceπΆπ‘
is positive definite.
Now if ππππ (ππβ€) = π we have that for any π£ β Rπ s.t. π£ β 0:
π£β€ππΆπ‘πβ€π£ = (πβ€π£)β€πΆπ‘ (πβ€π£) > 0,
since πβ€π£ = 0 has a unique solution π£ = 0 because ππππ (πβ€) = πas well. This implies that π΄π‘ is positive definite and hence, also full
rank: ππππ (ππΆπ‘πβ€) = π , completing the proof of the lemma.
β‘
Proof of Lemma 3. Let π π‘ = (π π‘π₯ )π₯ be the vector of the scores of
all the items at time π‘ . Using the notation and the derivations of
the proof of Lemma 1 we have:
π π‘+1 = πβ€π’π‘+1
= πβ€ (πΌπ +βπ₯
ππ‘,π₯π₯π₯β€)π’π‘
= πβ€ (πΌπ + ππΆπ‘πβ€)π’π‘= πβ€π
(πΌπ +πΆπ‘
)πβ€π’π‘ =
(πΌπ +πΆπ‘
)π π‘
where we used the fact that ππβ€ = πΌπ . Now, since πΆπ‘ is diagonal
with positive diagonal elements each score gets amplified, meaning
that it keeps the same sign and its absolute value increases, and the
lemma follows. β‘
B ADDITIONAL PLOTSB.1 Model Size π
Figure 11: Effect of different model sizes π for the uniform(first row), andmixture distribution (second row), theMovie-lens dataset (third row) and the Yahoo dataset (fourth row).
We experiment with different model sizes π β {5, 15, 30, 60} forfixed π½ = 1 and πΎ = 0.5. We repeat the simulation for each model
size 10 times and we report the mean and the standard deviations of
our metrics for all four datasets in Figure 11. It is evident from the
plots, the effect of model size heavily depends on the distribution
of the items in the system. If the distribution is not heavily skewed
towards some particular direction, then higher model sizes lead
to faster divergence in the user preferences but the effect is not
particularly strong. However, in the case of the mixture distribution
where by construction we initialize π’0 > 0 the initial scores of
the Type-2 items are very high. As a result all the mass is already
concentrated there leading to essentially no movement in π’π‘ .
B.2 Stochastic Gradient DescentBy analyzing the gradient descent update in (2) we can study the
expected trajectory of the vector π’π‘ and user-system interaction.
However, this update is difficult to implement and in practice the
common approach is to substitute it with SGD updates. In that case,
Figure 13: Effect of gamma for the mixture distribution(first row), the Movielens dataset (second row) and the Ya-hoo dataset (third row). With the πΎ parameter controllingthe amount of preference reinforcement that a user exhibits,it is expected that in all cases higher values for πΎ lead tofaster divergence of the user preferences as well as higherseparation between likable and not likable items.
Figure 14: Evolution of the recommender system ππ for a userthat is interested in low-prevalence content. Consider the two-dimensional item distribution where the first feature is uni-form in [-1, 1], while the second feature is uniform in [-1,0.8] with probability 90% and uniform in (0.8, 1] with theremaining 10%. Hence, all item for which π2 > 0.8 are signif-icantly more rare in the system. We consider a user with π0whose second coordinate is positive. The two pictures on theleft show the distribution ππ in the beginning and after 100steps of interaction with the user for π· = 0.5. Brighter col-ors indicate higher probability mass. The two pictures onthe right correspond to π· = 2. In the case where π· = 0.5 theinitial surface is virtually identical to the actual distributionof items in the system, while in the case where π· = 2 there isnoticeable difference. At π = 100, in both cases the probabil-ity mass concentrates on the items for which π2 is large, andthe effect is much more pronounced with higher π· . In par-ticular, for π· = 2, already after 100 iterations the mass startsconcentrating in the low-prevalence items, while for π· = 0.5themass concentrates on the barrier between thehighpreva-lence and the low prevalence items, without being able toefficiently surface the low prevalence items yet.
the vector π’π‘ evolves as:
π’π‘+1 = π’π‘ β [π‘ Β·1
π
πβπ=1
ββ(π’π‘ , (π₯π‘π , π¦
π‘π ))
where π can be thought as the number of items that are presented
to the user at each iteration. The user responds with feedback
π¦π‘πβ {0, 1} to each of them. We set π = 30 for the experiments. For
the synthetic distributions we run the experiments with 20 different
initializations and for each initialization we run it for 15 times. For
the real datasets we just run SGD updates for the 50 selected users
as before. We report the mean and the standard deviations of the
run for diffrent values of the parameter π½ in Figure 12. As before
we run the simulation for 200 different iterations for the synthetic
and for 750 for the real datasets and we use [ = 0.1.
As in the case of the the expected loss (Figures 3 - 5) the norm
of π’π‘ diverges in all cases and the userβs preference get polarized,
with the noise of SGD having limited effect. On the other hand, we
see that the filter-bubble phenomenon, i.e. how fast does the rec-
ommender system converge into showing the user only items that
they like is less evident here and the respective standard deviations
are very large, indicating that the trajectory of ππ‘ is very sensitive
to the initial conditions π’0.
Figure 12: SGD with different π½ parameters for the uni-form distribution (first row), the mixture distribution (sec-ond row), Movielens (third row) and Yahoo (fourth row).