preference amplification in recommender systems

Preference Amplification in Recommender SystemsDimitris Kalimeris

[email protected]

Harvard University

Smriti Bhagat

[email protected]

Facebook

Shankar Kalyanaraman

[email protected]

Facebook

Udi Weinsberg

[email protected]

Facebook

ABSTRACTRecommender systems have become increasingly accurate in sug-

gesting content to users, resulting in users primarily consuming

content through recommendations. This can cause the user’s inter-

est to narrow toward the recommended content, something we refer

to as preference amplification. While this can contribute to increased

engagement, it can also lead to negative experiences such as lack of

diversity and echo chambers. We propose a theoretical framework

for studying such amplification in a matrix factorization based rec-

ommender system. We model the dynamics of the system, where

users interact with the recommender systems and gradually “drift”

toward the recommended content, with the recommender system

adapting, based on user feedback, to the updated preferences. We

study the conditions under which preference amplification mani-

fests, and validate our results with simulations. Finally, we evaluate

mitigation strategies that prevent the adverse effects of preference

amplification and present experimental results using a real-world

large-scale video recommender system showing that by reducing

exposure to potentially objectionable content we can increase user

engagement by up to 2%.

CCS CONCEPTS• Information systems→ Recommender systems; • Comput-ing methodologies → Learning from implicit feedback.

KEYWORDSRecommender systems; echo chambers; filter bubbles; fixed point

ACM Reference Format:Dimitris Kalimeris, Smriti Bhagat, Shankar Kalyanaraman, and Udi Weins-

berg. 2021. Preference Amplification in Recommender Systems. In Proceed-ings of the 27th ACM SIGKDD Conference on Knowledge Discovery and DataMining (KDD ’21), August 14–18, 2021, Virtual Event, Singapore. ACM, New

York, NY, USA, 11 pages. https://doi.org/10.1145/3447548.3467298

1 INTRODUCTIONRecommender systems have grown to dominate how people navi-

gate through and consume the vast amounts of content that exists

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for profit or commercial advantage and that copies bear this notice and the full citation

on the first page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specific permission and/or a

fee. Request permissions from [email protected].

KDD ’21, August 14–18, 2021, Virtual Event, Singapore.© 2021 Association for Computing Machinery.

ACM ISBN 978-1-4503-8332-5/21/08. . . $15.00

https://doi.org/10.1145/3447548.3467298

on the Internet. As users interact with the recommendations they

are given, they provide feedback that is used by the system to

improve its understanding of their preferences, and subsequently

improve the quality of future recommendations. This iterative pro-

cess of fine-tuning new recommendations based on user feedback

results in a feedback loop. Such feedback loops may occasionally

cause negative experiences for users.

For the purpose of illustration, consider a user who is curious

about videos that might be considered objectionable. If a recom-

mender system repeatedly presents them with such videos it will

likely receive positive feedback and learn about the user’s prefer-

ence for this content. Hence, over time the system may converge

to primarily showing them objectionable videos, and most impor-

tantly, the user also might end up actively seeking and liking such

content. These phenomena of progressive reinforcement of one’s

own views as a result of the feedback loop as well as the narrowing

exposure to different types of content, have been referred to as echochambers and filter bubbles respectively, and have received much

recent attention [18, 24, 31, 38, 39]. Given the impact that recom-

mender systems have on our lives, it is important to understand

when recommendation dynamics result in echo chambers or filter

bubbles, and how to mitigate any negative experiences to the user.

In this work, we study echo chambers from a theoretical point

of view. We consider a matrix factorization-based recommender

system and discuss the formation of feedback loops, resulting from

users exhibiting a drift (or affinity) toward certain item categories

recommended to them. Essentially, drift captures the intuition that

a user might become more interested in an item after exposure

to it, for example after reading an article they really enjoyed they

might want to read more similar articles showing a shift of their

preferences towards that particular article content.

The most natural solution concept that allows us to predict the

long-term behavior of the dynamical interaction between the rec-

ommender system and a user is that of a stable fixed point, i.e. asituation in which the user preferences do not change in response

to the system’s recommendations. We prove that under mild as-

sumptions such a stable fixed point does not exist at all, implying

that if users experience drift towards certain item categories, re-

peated exposure to content from those categories could result in

increasing the user’s preference for them, absent any intervention.

Moreover, this is true even if the prevalence of these item categories

is low. We then apply this theoretical formulation in simulations

involving integrity scenarios, where the item categories could be

representative of problematic content. We show using both syn-

thetic and real-world datasets that if the user has initial affinity

https://doi.org/10.1145/3447548.3467298

https://doi.org/10.1145/3447548.3467298

Figure 1: : Interaction between the user and the system. Thesystem uses𝑢𝑡 to recommend item 𝑥𝑡 to the user. The systemthen updates its parameters 𝑢𝑡 to reflect on the feedback 𝑦𝑡 .The user also slightly shifts their preferences 𝜋𝑡 after inter-acting with the item that was recommended.

towards problematic content, the system will move towards show-

ing more problematic content to them. We propose strategies for

mitigating these outcomes, and present results from simulations as

well as investigations on real-world datasets. Finally, we present

supporting evidence of these strategies from a large user experi-

ment run on a real world content recommender system and observe

an improvement in metrics used to measure user engagement. In

summary, our contributions are

(1) a theoretical characterization of the dynamic interactions

between a user and the recommender system,

(2) evidence through simulations on synthetic and real-world

data of the divergence in user preferences and the existence

of echo chambers in matrix factorization systems,

(3) results from a large user experiment conducted at Facebook

showing how applying personalized mitigation can increase

user engagement.

1.1 Related workThe study of feedback loops and echo chambers in recommender

systems has received a lot of attention [19, 23, 29] and has been cata-

logued in a survey by Chen et al. [8] who discuss feedback loops and

present a taxonomy for biases that exist in recommender systems.

In investigating how users interact with recommender systems,

some earlier work has focused on the static (single snapshot) per-

formance of the system [11, 36, 37], while others havemodeled these

interactions in a dynamic, sequential setting, seeking to understand

the effect that the past ratings of a user have on the future content

that is available to them [1, 5, 7, 12, 13, 25, 40, 41]. Along the same

line of work, in [28, 32] the authors propose a general framework

to predict the long-term behavior of machine learning models in

cases where the model predictions directly influence the observed

data. Recommender systems broadly fall in that category; however

many real systems, for example matrix factorization ones, do not

satisfy the requirements of their framework, thereby warranting a

different approach. Echo chambers have also been extensively stud-

ied in recent years in a variety of contexts besides recommender

systems, like online social networks [3, 6, 9, 10, 15] and opinion

dynamics [4, 22, 33, 35], with content diversity being proposed most

commonly to address user polarization [2, 21, 23, 27, 30].

Finally, the work that is conceptually closest to ours is that of

Jiang et al. [20]. The authors provide a theoretical model to study

echo chambers, which however, fails to capture core aspects of real

recommender systems. First, they do not consider any systematic

way through which the recommender system recommends items

to the user (their assumption is that each item is recommended

infinitely often). Second, in their model items are independent

from each other in the sense that reacting to an item does not

provide any information about the possible reaction to a similar

item. This assumption violates the essence of how collaborative

filtering works. In this work, we study similar phenomena in a more

complete and realistic setting, where there is an underlying matrix

factorization-based recommender system.

1.2 OutlineWe formalize the model for the recommender system and the user

behavior in Section 2. In Section 3, we present our main theoretical

results showing the lack of stable fixed points in the user-system

interaction, hinting at the existence of echo chambers in matrix-

factorization systems. In Section 4, we evaluate our results through

simulations on real and synthetic data. We discuss mitigation strate-

gies in Section 4.2, and describe a real-world large user experiment

applying these strategies conducted at Facebook in Section 4.3.

2 PROBLEM SETTINGConsider a collection of𝑚 items that are available in a platform.

At each time step 𝑡 , a recommender system presents an item to

a user and receives feedback 𝑦 ∈ {0, 1} indicating whether they

liked, clicked on, watched, etc. the item that was suggested. This

feedback is utilized to improve subsequent recommendations. We

provide an illustration of the user-system interaction in Figure 1

and a notation summary in Table 1.

Linear Scores.Ourwork focuses on stochastic matrix factorization

recommender systems. In such systems, each item is assigned a

linear score 𝑠𝑥 = 𝑢⊤𝑥 , where 𝑢, 𝑥 ∈ R𝑑 are feature vectors for the

user and for the item, respectively. Here 𝑑 ≪ 𝑚, i.e. the user and

item representation is low-dimensional. Following standard logistic

regression modeling, the system then predicts the probability of the

user liking an item 𝑥 at time 𝑡 as 𝑝𝑡𝑥 = 𝜎 (𝑠𝑡𝑥 ), where 𝜎 (·) denotesthe sigmoid function. We assume that the item features are a static

description of the item, i.e., they correspond to scores coming from

some classifiers or represent fixed characteristics of the item that

do not change after an interaction with the user. The user features

on the other hand are dynamic and are updated based on their

feedback, to reflect the true user preferences more accurately. For

example, for a movie recommendation, 𝑥 can describe the genre of

the movie, the duration, the language, etc., while 𝑢 describes how

much does the user values each of these. As a result, the scores for

items are time-varying: 𝑠𝑡𝑥 = 𝑢⊤𝑡 𝑥 for each time step 𝑡 .

Stochastic recommendation.We consider stochastic recommender

systems, a relaxation of the classical Top-k recommender system

[12]. Such a system maps the score of each item to a selection

probability through the softmax function. Hence, higher scorescorrespond to higher selection probabilities. Specifically, at each

iteration, an item 𝑥 is chosen as 𝑥 ∼ 𝑓𝑡 , where 𝑓𝑡 :

𝑓𝑡 (𝑥) =𝑒𝛽𝑠

𝑡𝑥∑

𝑥 ′ 𝑒𝛽𝑠𝑡

𝑥′(1)

Table 1: Notation Summary

Symbol Meaning

𝑚 number of items

𝑢𝑡 ∈ R𝑑 inferred user feature vector at time 𝑡

𝑥 ∈ R𝑑 item feature vector (static)

𝑠𝑡𝑥 score of item 𝑥 at time 𝑡 : 𝑠𝑡𝑥 = 𝑢⊤𝑡 𝑥𝑝𝑡𝑥 = 𝜎 (𝑠𝑡𝑥 ) predicted probability that the user will like item 𝑥

𝜋𝑡 (𝑥) true probability that the user will like 𝑥 at time 𝑡

𝑓𝑡 softmax of item scores at time 𝑡

𝑓𝑡 (𝑥) probability of recommending 𝑥 at time 𝑡

𝛽 > 0 system sensitivity, determines the “steepness” of 𝑓𝑡𝛾 ∈ (0, 1) sensitivity parameter of the user

𝛼𝑡,𝑥 rescaling coeff. for item 𝑥 at time 𝑡 (see Def. 1)

Notice that the selection probability of an item 𝑓𝑡 (𝑥), which corre-

sponds to the probability of presenting 𝑥 to the user, is different

than the probability 𝑝𝑡𝑥 which is the probability that the user likes

item 𝑥 upon been presented with it. Using different values for the

sensitivity parameter 𝛽 in 𝑓𝑡 we can induce different behaviors in

the system1. In particular, high 𝛽 results in a steep 𝑓𝑡 where only

the items with the highest scores have non-negligible probability

of been recommended. In particular, if 𝛽 → ∞ then 𝑓𝑡 is exactly

a Top-1 recommender system. On the other hand, small values of

𝛽 lead to a more uniform behavior where all items have a similar

probability of being presented to the user. See Section 4 for more

details on the effect of 𝛽 and the shape of 𝑓𝑡 (e.g. Fig 6).

System update. Upon receiving feedback 𝑦𝑡 for the item 𝑥𝑡 that

was recommended to the user, the system updates 𝑢𝑡 to account

for it, attempting to learn the true user preferences accurately. We

assume that the update of 𝑢𝑡 happens through gradient descent, a

standard algorithm with strong performance guarantees [17] that is

employed in practice for online learning with convex loss functions.

Specifically, at iteration 𝑡 , the expected update in the system is:

𝑢𝑡+1 = 𝑢𝑡 − [𝑡 · E𝑥∼𝑓𝑡[∇𝑢 ℓ (𝑢𝑡 , (𝑥,𝑦))

](2)

where ℓ is the negative log-likelihood loss function2 ℓ (𝑢, (𝑥,𝑦)) =

−𝑦 ln𝜎 (𝑢⊤𝑥) − (1 − 𝑦) · ln(1 − 𝜎 (𝑢⊤𝑥)).Remark. Notice that in (2), the gradient of the loss is inside the

expectation and their order cannot be exchanged since the distribu-

tion 𝑓𝑡 according to which 𝑥 is chosen depends on 𝑢𝑡 . This is one of

the main reasons why the analysis of such a dynamical model dif-

fers from the analysis of the gradient descent dynamics in standard

settings; the distribution of our data (the𝑚 items in the system in

our case) is dynamically adapting and depends on the parameters

of the model that we utilize in the respective iteration.

Fixed points. The most natural solution concept to characterize

the interaction between the user and a dynamically-adjusting rec-

ommender system is that of a fixed point, i.e., a user vector 𝑢∗ suchthat, if the recommender system presents items according to the

1We call 𝛽 sensitivity parameter because it determines how “sensitive” will 𝑓𝑡 be to

changes of the score. The smaller the 𝛽 , the more uniform its behavior.

2This loss is standard for classification problems with binary labels.

Figure 2: The shape of the drift function (on the left) indi-cates that the users do not reinforce their preferences foran item uniformly but rather, depending on howmuch theyalready like it. After some threshold (peak), this reinforce-ment fades since they already have a firm opinion for theparticular item (and similar ones) and as a result it is un-likely that they will significantly alter their preferences forit. On the right we see the difference between the true andthe predicted probability of liking an item, in red and bluerespectively. The difference between these two is attributedto the drift function. More vivid red corresponds to moresensitive users that are more aggressive in changing theirpreference for an item.

distribution 𝑓 ∗ induced by it (according to (1)), there will be no

more updates to the user vector 𝑢𝑡 ; the user and the system have

reached equilibrium. Formally, 𝑢∗ corresponds to a fixed point if:

𝑢∗ = argmin

𝑢E𝑥∼𝑓 ∗,𝑦 [ℓ

(𝑢, (𝑥,𝑦)

)]

Fixed points can be further categorized with respect to their long

term behavior. Specifically, a fixed point 𝑢∗ is called:

• stable, if for every neighborhood𝑈 of𝑢∗ there exists a neigh-borhood𝑈 ′

of 𝑢∗ such that 𝑢𝑡 ∈ 𝑈 for every 𝑡 if 𝑢0 ∈ 𝑈 ′, i.e.,

if we start close to 𝑢∗ we will not move far from it.

• attracting, if there exists a neighborhood𝑈 of 𝑢∗ for which𝑢𝑡 → 𝑢∗ if 𝑢0 ∈ 𝑈 , i.e., we converge to 𝑢∗ if we start close.

• asymptotically stable, if it is stable and attracting.

Naturally, asymptotically stable fixed points correspond to the

most desirable solution since they allow us to characterize and

predict the limit behavior of a time-evolving system easily. However,

it is easy to see that some assumptions on the user behavior are

required even for the user-system interaction to have a fixed point.

Otherwise, if the user arbitrarily changes the way they interact

with items in every iteration, the system will always try to adapt

by updating 𝑢𝑡 . In the next subsection, we formalize this idea by

making specific assumptions on the true user preferences.

Fixed points, echo chambers and filter bubbles. Note that thenotion of a fixed point is orthogonal to the ideas of an echo chamber

or a filter bubble. Echo chambers correspond to a situation where

the true preferences of a user are constantly reinforced leading to

polarization: the user likes more of what they previously endorsed

and less of what they did not. Similarly, in a filter bubble the system

actively narrows down the categories of items it presents to a user.

However, by definition at a fixed point there is no update on 𝑢∗,hence such reinforcement does not occur.

2.1 User behaviorThe behavior of the user is determined by a preference function

𝜋𝑡 : R𝑑 → [0, 1], unknown to the recommender system. It denotes

their real preference for each item at time 𝑡 and is time-dependent,

meaning that it can be modified based on the recommendations that

the user receives or on exogenous factors. The user reacts positively

(like, watch, etc.) to an item 𝑥 that they are presented with at time

𝑡 with probability 𝜋𝑡 , i.e., 𝑦 ∼ Bernoulli(𝜋t (x)).Our main assumption is that the likelihood of a user reacting

to a particular item slightly increases if the score of the item is

positive and slightly decreases if it is negative. Recall our example

in Section 1, where users have a slight inclination to reinforce

their opinion, i.e., increase their preference towards articles that

they seem to correlate well with, and decrease it otherwise3. This

deviation in user behavior is formally modeled by a drift function.Specifically, a drift function is a function 𝑟 : R→ [−1, 1] that relatesthe user’s shift in the probability of liking an item, to its score 𝑠𝑡𝑥 .

The higher (resp. lower) the drift of a particular item, the more the

user will positively (resp. negatively) reinforce their opinion for this

item. In this work we will use 𝑟 (𝑧) = 𝐾 · 𝑧 · 𝑒−𝑧2 , where 𝐾 ≈ 2.33 is

just a normalizing constant to ensure that 𝑟 (𝑧) ∈ [−1, 1]. The shapeof this function along with its effect on the user preferences can be

seen in Figure 2. It captures core properties of the user dynamics

as explained in the caption. Similar functions have been used to

model opinion dynamics for example regarding agreement with

political parties [35].

Finally, we define user sensitivity to formalize the intuition that

in order to reach an equilibrium, the behavior of the user cannot be

decoupled from the recommender system. Essentially, sensitivity

constitutes a bound on how much the true preference function of a

user can deviate from the recommender system’s prediction. The

magnitude of this deviation depends on the drift function. The name

“sensitivity” refers to the fact that a sensitive user will significantly

reinforce their opinion after being presented with an item, i.e., they

are not firm in their preferences and are prone to influence by

the system/environment; the more sensitive they are (larger 𝛾 as

defined below), the higher the influence.

Definition 1 (𝛾-sensitive user). A user is 𝛾-sensitive with re-spect to a drift function 𝑟 , for𝛾 ∈ (0, 1), if for all time steps 𝑡 and items𝑥 the difference between the true user preferences and the perceivedpreferences by the recommender system is bounded as follows:

𝜋𝑡 (𝑥) − 𝜎 (𝑠𝑡𝑥 ) = 𝛾 · 𝛼𝑡,𝑥 · 𝑟 (𝑠𝑡𝑥 )where:

𝛼𝑡,𝑥 =

{1 − 𝜎 (𝑠𝑡𝑥 ) , if 𝑠𝑡𝑥 ≥ 0

𝜎 (𝑠𝑡𝑥 ) , if 𝑠𝑡𝑥 < 0

is a scaling factor to ensure that 𝜋𝑡 (𝑥) ∈ [0, 1] for all items 𝑥 .

In the following, we will slightly abuse the notation and refer

to the 𝛾 function associated with a 𝛾-sensitive user as: 𝛾 (𝑠𝑥 ) =

𝛾 · 𝛼𝑡,𝑥 · 𝑟 (𝑠𝑥 ) for all items 𝑥 .

3 EVOLUTION OF USER PREFERENCESIn the previous section, we defined asymptotically stable fixed

points and argued that they constitute the gold standard in the

3We measure this correlation using cosine similarity between 𝑢𝑡 and 𝑥 .

long-term behavior of a dynamically evolving recommender system,

ruling out echo chamber formation. Hence, the natural question is:

Are there asymptotically stable fixed pointsin the user-system interaction?

In this section, we present the main theoretical results of our

paper. We answer the above question negatively and formally prove

that the matrix factorization recommender system we defined will

not reach a fixed point, hinting at the existence of echo chambers.

We start by proving that under a mild condition on the items to be

recommended there is a unique fixed point of the system dynamics.

We defer the proof to Appendix A. All the results in this section

hold for any choice of learning rates {[𝑡 }∞𝑡=1.

Lemma 1 (Existence of Fixed Points.). Let 𝑋 ∈ R𝑑×𝑚 bethe matrix of the items available for recommendation and 𝑋𝑋⊤ =∑𝑥 𝑥𝑥

⊤ the respective covariance matrix. If 𝑟𝑎𝑛𝑘 (𝑋𝑋⊤) = 𝑑 , i.e., thecovariance matrix has full rank, then 𝑢∗ = 0 is the unique FP of (2).

In the following lemma, we answer the second part of the ques-

tion regarding the stability of that fixed point. Specifically, we prove

that the norm of the user vector 𝑢𝑡 always increases, ruling out the

possibility for 𝑢∗ = 0 to be a stable equilibrium between the user

and the recommender system (Corollary 1).

Lemma 2. If 𝑟𝑎𝑛𝑘 (𝑋𝑋⊤) = 𝑑 and 𝑢0 ≠ 0, then ∥𝑢𝑡+1∥ > ∥𝑢𝑡 ∥for all 𝑡 . That is, the norm of the user preferences vector 𝑢𝑡 strictlyincreases in every iteration.

Proof. Following similar calculations to those in the proof of

Lemma 1 (shown analytically in Appendix A) we get

𝑢𝑡+1 = 𝑢𝑡 − [𝑡 · E𝑥∼𝑓𝑡[∇𝑢 ℓ (𝑢𝑡 , (𝑥,𝑦))

]= 𝑢𝑡 + [𝑡 ·

∑𝑥

𝑥 · P[𝑥] · 𝛾 (𝑢⊤𝑡 𝑥)

= 𝑢𝑡 + [𝑡 ·∑𝑥

𝑥 · P[𝑥] · 𝛾 · 𝛼𝑡,𝑥 · 𝐾 · (𝑢⊤𝑥) · 𝑒−(𝑢⊤𝑡 𝑥)2

The last equality follows by the definition of the 𝛾 function with

𝑟 (𝑧) = 𝐾 · 𝑧 · 𝑒−𝑧2 . To simplify notation, let 𝑐𝑡,𝑥 = [𝑡 · 𝛾 · 𝐾 · P[𝑥] ·𝛼𝑡,𝑥 · 𝑒−(𝑢⊤

𝑡 𝑥)2 . Notice that 𝑐𝑡,𝑥 > 0 for all 𝑡, 𝑥 . Hence, we have:

𝑢𝑡+1 = 𝑢𝑡 +(∑

𝑥

𝑐𝑡,𝑥 · 𝑥𝑥⊤)𝑢𝑡

=(𝐼𝑑 +

∑𝑥

𝑐𝑡,𝑥 · 𝑥𝑥⊤)𝑢𝑡

Consider the 𝑚 × 𝑚 diagonal matrix with values 𝑐𝑡,𝑥 in the

diagonal: 𝐶𝑡 = 𝑑𝑖𝑎𝑔( [𝑐𝑡,𝑥𝑖 ]𝑚𝑖=1). Then, we can rewrite

∑𝑥 𝑐𝑡,𝑥 ·

𝑥𝑥⊤ = 𝑋𝐶𝑡𝑋⊤. Also, notice that all the terms 𝑐𝑡,𝑥 > 0 for all 𝑡, 𝑥 ,

hence 𝐶𝑡 is positive definite. Now if 𝑟𝑎𝑛𝑘 (𝑋𝑋⊤) = 𝑑 we have that

for any 𝑣 ∈ R𝑑 s.t. 𝑣 ≠ 0:

𝑣⊤𝑋𝐶𝑡𝑋⊤𝑣 = (𝑋⊤𝑣)⊤𝐶𝑡 (𝑋⊤𝑣) > 0,

since 𝑋⊤𝑣 = 0 has a unique solution 𝑣 = 0. That is because

𝑟𝑎𝑛𝑘 (𝑋𝑋⊤) = 𝑑 ⇒ 𝑟𝑎𝑛𝑘 (𝑋⊤) = 𝑑 . This implies that 𝑋𝐶𝑡𝑋⊤is

positive definite and thus all of its eigenvalues are strictly positive.

Let 𝐴𝑡 := 𝐼𝑑 + 𝑋𝐶𝑡𝑋⊤, which is symmetric and hence diagonal-

izable as 𝐴𝑡 = 𝑃𝑡Λ𝑡𝑃⊤𝑡 , where 𝑃𝑡 : orthogonal matrix. Moreover, all

its eigenvalues, i.e. the diagonal elements of Λ𝑡 , are strictly greater

than 1 for every 𝑡 , since they correspond to the eigenvalues of 𝐼𝑑(all 1) plus the eigenvalues of 𝑋𝐶𝑡𝑋

⊤(all strictly positive).

Now, consider the transformation

𝑧𝑡+1 = 𝑃⊤𝑡 𝑢𝑡+1, for t ≥ 1, with 𝑧0 = 𝑢0

⇒ 𝑧𝑡+1 = 𝑃⊤𝑡 𝑢𝑡+1 = 𝑃⊤𝑡 𝐴𝑡𝑢𝑡 = 𝑃

⊤𝑡 𝑃𝑡Λ𝑡𝑃

⊤𝑡 𝑢𝑡 = Λ𝑡𝑧𝑡

Starting with 𝑢0 = 𝑧0 ≠ 0 yields 𝑧𝑡 ≠ 0 for all 𝑡 since Λ𝑡 : di-

agonal matrix with non-zero diagonal elements. Moreover, since

all the diagonal elements of Λ𝑡 are strictly greater than 1 we have

∥𝑧𝑡+1∥ > ∥𝑧𝑡 ∥. Now, since 𝑃𝑡 , 𝑃𝑡−1: orthogonal matrices we have

that ∥𝑧𝑡+1∥ = ∥𝑃⊤𝑡 𝑢𝑡+1∥ = ∥𝑢𝑡+1∥ and similarly ∥𝑧𝑡 ∥ = ∥𝑢𝑡 ∥, com-

pleting the proof of the lemma.

□

Corollary 1 (Stability of FPs.). Assume that 𝑟𝑎𝑛𝑘 (𝑋𝑋⊤) = 𝑑 .Then, the unique fixed point 𝑢∗ = 0 is not asymptotically stable.

The two lemmas above essentially show that the unique fixed

point is irrelevant as a solution concept of the user system inter-

action, hinting at the existence of echo chambers in matrix fac-

torization recommender systems. Moreover, in the case where the

features of the items in the system are transformed to be decor-related, i.e. have identity covariance matrix

4following the proof

of Lemma 2, we can formally prove the amplification of the item

scores. The proof is deferred to Appendix A.

Lemma 3 (Score amplification and EchoChambers). If𝑋𝑋⊤ =

𝐼𝑑 , then and for every item 𝑥 s.t. 𝑠0𝑥 ≠ 0 it holds |𝑠𝑡+1𝑥 | > |𝑠𝑡𝑥 | and𝑠𝑡+1𝑥 · 𝑠𝑡𝑥 > 0. That is, each score gets amplified with time.

In the next section we provide simulations to show that even if

the data is not decorrelated, i.e. 𝑋𝑋⊤ ≠ 𝐼𝑑 , the average absolute

value of the score increases with time, essentially creating two well-

separated groups of items in the system, the ones the user likes and

the ones that they do not.

4 SIMULATIONSIn the previous section we proved that there is no asymptotically

stable fixed point in the user-system interaction, giving evidence

towards the existence of echo chambers and filter bubbles even in

simple matrix factorization recommender systems. In this section,

we conduct simulations on real and synthetic datasets to further

investigate these phenomena by looking at the evolution of the user

preferences 𝑢𝑡 and the shape of the recommender system 𝑓𝑡 , under

different model sizes 𝑑 , system parameters 𝛽 , user sensitivities 𝛾 ,

and item distributions.

Metrics. We use the following metrics to evaluate the behavior of

the recommender system and the evolution of the user preferences:

(1) Norm of 𝑢𝑡 . We expect that the ℓ2 norm of the user vector

will be driven away from 0, leading to a shift in the user

preferences and a divergence of the item scores, as indicated

by Lemma 2.

4This is a common data preprocessing procedure, often applied for efficient optimiza-

tion and can be easily done by applying the transformation �̃� = 𝑆−1 (𝑥 − `) to eachitem 𝑥 , where ` = 1

𝑚

∑𝑥 𝑥 and 𝑆 is any matrix for which it holds 𝑆𝑆⊤ = 𝑋𝑋⊤

. Such

a matrix exists if 𝑟𝑎𝑛𝑘 (𝑋𝑋⊤) = 𝑑 .

(2) Average probability per item type.We partition the items into

“likable” and “not likable” based on whether their assigned

score at time 𝑡 , 𝑠𝑡𝑥 is positive or negative. We measure the

average 𝑝𝑡𝑥 for likable and not likable items.

(3) Probability mass on items correlating well with the initialpreference vector 𝑢0. We consider the top 5% of items having

the highest cosine similarity with 𝑢0. These are the items

the system initially believes the user will like. We report the

total probability mass assigned to them by 𝑓𝑡 for each 𝑡 .

Experimental Setup. We fix the dimension of the system to be

𝑑 = 15 but we found consistent results across a range of model

sizes (relevant plots can be found in Appendix B.1). We run each

simulation for 200 iterations for the synthetic datasets and for 750

for the real ones, where the user vector 𝑢𝑡 is updated as in equation

(2) with learning rate [𝑡 = 1. In Appendix B.2 we include simi-

lar experiments using Stochastic Gradient Descent (SGD) updates,

which essentially corresponds to the case where the user receives

𝑘 recommendations at each time step. We consider a range of dif-

ferent 𝛽 values 𝛽 ∈ {0.5, 1, 1.5, 2} that lead to different behavior

of the recommender system, as well as different user sensitivities

𝛾 ∈ {0.2, 0.4, 0.6, 0.8}. Below, we describe the datasets that we use.Synthetic datasets. We synthetically generate 100,000 items. The

features of each item are generated from a fixed distribution 𝑓item.

We consider the following distributions:

• Uniform in the hypercube. Each item feature is generated inde-

pendently and uniformly in [−1, 1]: 𝑥 ∼ Uniform( [−1, 1]d).• Mixture of two uniforms. In many real platforms there is a di-

chotomy in the items present in the system, for example, reg-

ular content vs. content that violates some of the platform’s

policies. Also, some of the item features often correspond

to classifier scores, with higher values indicating that the

content is problematic (e.g. such classifiers can predict the

probability of violent content, nudity, etc.). We model this

effect using a mixture distribution. Specifically, with prob-

ability 1 − 𝛼 , 𝑥 ∼ Uniform( [−1, b]d) (regular items) while

with probability 𝛼 , 𝑥 ∼ Uniform( [b, 1]d) (problematic items

with high feature values). Here, 𝛼 is the mixture coefficient

and 𝑏 is a threshold that we set to 0.8 for the simulations. We

set 𝛼 = 1% which corresponds to the estimated prevalence

of problematic content in real platforms, see e.g. [14].

For the synthetic datasets we initialize the user preference vector

𝑢0 ∼ Uniform( [−𝜖, 𝜖]d) for 𝜖 = 0.3. Note that for small ∥𝑢0∥ wehave 𝑓0 ≈ 𝑓item, i.e., the recommender systems select items almost

uniformly at random to present to the user.

Real datasets. We also consider the more realistic scenario of ob-

taining the initial user and the item features from real datasets. We

use the MovieLens 10M [16] and the Yahoo [42] datasets. Movielens

contains movie ratings for 69,878 users and 10,677 different movies,

while Yahoo song ratings for 5,400 surveyed users and 1,000 songs.

We selected 50 users with the highest number of ratings in each

case, corresponding to 10,208 movies for Movielens and 982 songs

for Yahoo. Since we use binary labels in our model, we mark each

movie/song that was ranked with at least 3.5/5 as 1, indicating

that the user liked it and the rest of them as 0. We used a Python

library [26] that employs kernel matrix factorization [34] to obtain

Figure 3: Evolution of ∥𝒖𝒕 ∥ with 𝑡 . In all cases the user preferences diverge as predicted by Lemma 2.

Figure 4: Evolution of the average probability of a likable vs a non-likable item with 𝒕 . Solid lines indicate the likable items,whose probability of receiving a positive reaction from the user, 𝒑𝒕

𝒙 , is above 0.5, while the dashed lines indicate the non-likable items. The model size is 𝒅 = 15 and 𝜸 = 0.5. The probability of clicking on an item moves towards 1 if its score ispositive and towards 0 otherwise.

Figure 5: Total probability mass assigned by 𝒇𝒕 to the items with the highest cosine similarity with 𝒖0. The model size is 𝒅 = 15and𝜸 = 0.5. There is a clear tendency to recommend items that the systembelieves that the user already likes and this tendencybecomes more pronounced for higher values of 𝜷 , making 𝒇𝒕 more steep and approximating a Top-1 recommender system.

the user and item features as 𝑅 = 𝑈 · 𝑋 , where 𝑅 is the binarized

ratings matrix, U is the matrix of initial features for each user and

𝑋 is the matrix of the item features.

4.1 ResultsWe evaluate the proposed metrics on the real and synthetic datasets.

For the synthetic datasets we repeat each simulation 10 times with

different vectors 𝑢0, while for the real ones we iterate over each

of the 50 different subsampled users. We report the means and the

standard deviations of the run in Figs 3- 5, 7. Additional plots for

different model sizes and SGD updates are presented in Appendix B.

Echo chambers and filter bubbles. First, we observe that the

norm of 𝑢𝑡 clearly diverges with time for different models and

distributions as predicted by Lemma 2, indicating that a fixed point

is never reached. As a result, the scores for each item are pushed

towards the extremes as can be seen in Figure 4. Hence, every item

tends to be deterministically liked or not liked by the user, suggestingan echo chamber. Secondly, in Figure 5 we demonstrate the filter

Figure 6: The effect of different 𝜷 on the recommender system𝒇𝒕 . Consider the uniform distribution on the unit circle and𝒖0 > 0 coordinate-wise. The left two pictures show the dis-tribution 𝒇𝒕 in the beginning and after 100 steps of interac-tionwith the user for 𝜷 = 0.5. Brighter colors indicate higherprobability mass. The two pictures on the right correspondto 𝜷 = 2. It is evident that for larger 𝜷 the transformation of𝒇𝒕 is much stronger with all of its mass concentrated arounditems with the highest possible scores that correlate wellwith𝑢0. In otherwords, the system tries to recommend itemsthat have the highest probability of been liked. For smaller𝜷 the behavior is more explorative, and items with smallerscores have non-trivial probability of been recommended.

bubble effect that can occur in a matrix factorization recommender

system. Specifically, we see that a small subset of items that have

high cosine similarity with 𝑢0, i.e., the system initially believes

that the user will like them, are the ones that will end up being

recommended with overwhelmingly high probability, narrowing

down the exposure to any other type of content.

The effect of model sensitivity 𝜷 . In Figures 3, 4 and 5 we plot

our metrics for different values of 𝛽 and fixed 𝛾 = 0.5. For the syn-

thetic distributions, we see that higher values of 𝛽 lead to slower

divergence of the objectionable user preferences 𝑢𝑡 while the op-

posite is true for the real ones. However, preference amplification,

and the convergence of 𝑓𝑡 to the items that correlate well with 𝑢0are faster. This is explained by the fact that for higher values of 𝛽 ,

the stochastic recommender system of (1) turns to a Top-1 recom-

mender system, and is therefore more likely to present the users

with items that they already liked. This leads to a stronger rein-

forcement of their preferences and increases the filter bubble effect.

Visually, this difference in the recommender system for different

values of 𝛽 can be seen in Figure 6 (see also Fig. 14 in Appendix B)

where we plot 𝑓𝑡 , for two-dimensional items, for 𝑡 = 0 and 𝑡 = 100.

We use two different values of 𝛽 : 𝛽 = 0.5 and 𝛽 = 2 and we initialize

with 𝑢0 > 0 (coordinate-wise positive). It is evident that higher 𝛽

leads to significantly higher concentration in the areas correspond-

ing to items with high initial scores, i.e., items with large positive

values in both coordinates (since 𝑢0 > 0).

Effect of user-sensitivity 𝜸 . We consider 𝛽 = 1 and varying

values for𝛾 . In Figure 7 we include the relevant plots for the uniform

distribution and defer the results for the rest of the datasets to

Figure 13 in Appendix B. Users with higher 𝛾 , i.e., the ones that are

more prone to reinforce their previous opinion after been presented

with an item, are the ones for whom the norm of their preferences

vector diverges the fastest and they also experience the larger echo

chamber and filter bubble effects. This implies that while there is a

vulnerability in the recommender system, how much these effects

will manifest is ultimately up to the user, matching findings in [3].

Figure 7: Effect of user sensitivity𝜸 . On the left we plot ∥𝒖𝒕 ∥,in the middle the average probability for likable (solid) andnon-likable items (dashed) and on the right the probabilitymass on the items that correlate well with 𝒖0. Themodel sizeis 𝒅 = 15 and 𝜷 = 1. More sensitive users experience largermovements in 𝒇𝒕 .

Figure 8: On the left we see the total probability that is as-signed to the Type-2, low-prevalence items. The differentlines correspond to different prevalence of such items in thesystem ranging from 1% to 0.1%. On the right we plot theprobability on the Type-1 items that have high cosine sim-ilarity, i.e., they correlate well, with the initial preferencevector 𝒖0. These items act as a “bridge” for the probability toconcentrate to the low-prevalence items.

4.2 Mitigation strategiesConsider the synthetic mixture distribution that we defined in the

beginning of the section to model two different types of content (e.g.

regular vs problematic). We will show experimentally, that even if

the prevalence of one of the two types is very low, the recommender

system is able to discover items of this type efficiently and present

them to a user that shows interest for them. While this is generally

a positive outcome, it can become an issue if the system contains

problematic content. In this section, we will discuss two strategies,

global and personalized demotion, to alleviate preference amplifi-

cation, and in Section 4.3, we show the effect of such mitigation on

the user’s experience for borderline problematic content in a real

large-scale recommender system.

Experimental Setup. Our basic mixture distribution consists of

100,000 items. They can be of Type-1 sampled as𝑥 ∼ Uniform( [−1, 0.8]𝑑 )with probability𝛼 = 0.99, or Type-2 sampled as𝑥 ∼ Uniform( [0.8, 1]𝑑 )with probability 𝛼 = 0.01. We initialize 𝑢0 > 0 (coordinate-wise

positive), to focus on users that have high correlation with the low-

prevalence Type-2 content in the system, hence they are interested

in these items the most. We repeat each simulation 10 times.

Global Action. In this case, we actively remove items to reduce

the prevalence of the Type-2 content in the system. We use two

metrics to evaluate how the recommender system adapts to that:

(1) Probability mass on Type-2 items.(2) Probability mass on Type-1 items that have high cosine simi-

larity with the initial preference vector 𝑢0. These are Type-1items that the system initially believes that the user likes.

In Figure 8 we adjust the count of Type-2 items in the system so that

their prevalence, would be 𝛼% for 𝛼 ∈ {1, 0.75, 0.5, 0.25, 0.1}. It canbe seen that the probability mass of having a low-prevalence item

presented to the user keeps increasing as the user interacts with

the system more, no matter how low the actual prevalence of thatcontent is. This indicates that matrix factorization recommender

systems are able to surface items if they are relevant to the user.

Additionally, we focus on Type-1 items that appear attractive to

the user in the beginning and observe that they act as a “bridge”

for the probability mass of 𝑓𝑡 to transfer to the low-prevalence

Type-2 items. Intuitively, in a real platform these could correspond

to borderline problematic items that introduce the user to more

problematic content.

Personalized Action. A recommender system applying person-

alized demotion avoids showing the user excessive amounts of

a particular type of content without explicitly removing it from

the system. In other words, it specifically targets the personalized

recommendation system 𝑓𝑡 of a user, and demotes the selection

probability of certain item types. A way to achieve this is to use

more reluctant updates of the preference vector 𝑢𝑡 upon receiving

user feedback. For example, a decaying learning rate schedule can

be employed in (2) to slow down the preference adjustment and

discourage “overfitting” to the users interests, hence preventing

the creation of a filter bubble, or even a different learning rate

for each particular feature 𝑖 whose magnitude is inversely propor-

tional to (𝑢𝑡 )𝑖 . In Figure 9 we consider four different learning rate

schedules for the system update (2). In each of these cases the [𝑡is adjusted every 𝑡𝑎𝑑 𝑗 = 20 iterations. We use 𝑡//𝑡𝑎𝑑 𝑗 to denote

integer division:

(1) constant learning rate: [𝑡 = 1,

(2) decaying learning rate: 1/(1 + 𝑡//𝑡𝑎𝑑 𝑗 )(3) decaying learning rate: 1/

(1 +

√𝑡//𝑡𝑎𝑑 𝑗

)(4) feature specific learning rate: In this case the learning rate

corresponds to the matrix [𝑡 = 𝑑𝑖𝑎𝑔[ (

1

(𝑢𝑡 )𝑖)𝑑𝑖=1

], yielding

different update for each coordinate. In particular the values

for large coordinates are modified at a slower rate.

It can be seen that more reluctant updates in the preferences

vector 𝑢𝑡 mitigate the echo chamber and filter bubble effects, since

the system is not actively changing the content that presents to the

user, hence keeping it more diverse despite feedback loops.

4.3 Mitigation in Facebook recommendionsWe now present an experiment that we conducted to study miti-

gation in a real-world large-scale video recommender system in

Facebook. While this recommender system is complex and beyond

the scope of this work, we describe a few relevant details here. Users

Figure 9: Evolution of the user preferences vector 𝒖𝒕 andthe recommender system for different learning rates. Morereluctant updates in the preferences vector 𝒖𝒕 mitigate theecho chamber and filter bubble effects.

Figure 10: The difference in the average impressions of bor-derline problematic content viewed by treatment and con-trol groups. As time progresses, users that received the treat-ment see progressively less problematic content.

can engage with videos in several ways including liking, comment-

ing and flagging them as being “problematic”. Videos have several

features. The most relevant to our experiment is the probability

that the video contains borderline nudity. We note that this content

is mild and does not violate any enforcement standards; however, it

may be considered problematic especially after repeated exposure.

We first identify users that are repeatedly exposed to such con-

tent. Specifically, using a classifier, we compute, for each user, the

percentage of problematic videos (classifier output above a thresh-

old) that they were exposed to. If it averages over 25% for a week,

we reduce their exposure to this content for future sessions, hence

attempting to reduce the echo chamber effect proactively and in-

crease the diversity of the content recommended to them.

In the previous section, we illustrated personalized mitigation

by altering the learning rate of the recommender system. However,

changing the parameters of the recommender system is infeasible

for a system in production. Instead, we adopt a simple personalized

mitigation by showing lesser borderline content. This is achieved

by lowering the score of borderline problematic content (an instan-

tiation of 𝑓𝑡 (𝑥)) by multiplying it with a factor of 0.1 for users in the

experiment. The factor of 0.1 was chosen after careful consideration

of the item score distribution.

We ran this experiment for three weeks, with 37K users in each

of the treatment and control groups, where the users in treatment

received the mitigation. Figure 10 shows the difference in the aver-

age impressions of borderline problematic content viewed by users

in control and those in treatment. At the start of the experiment,

the two groups viewed about the same amount of problematic con-

tent, however, over time, the treatment group saw less, with the

difference being statistically significant. Moreover, we observed

that the overall interaction with content (likes, comments, etc.) by

users in the treatment group increased by up to 2%. In conclusion,

this experiment shows that echo chambers related to specific item

features can create a negative user experience. Explicitly mitigating

such effects can have an overall positive impact.

5 DISCUSSION AND FUTUREWORKIn this paper we examined the ability of stochastic matrix factoriza-

tion recommender systems to reinforce the user preferences creat-

ing echo chambers and filter bubbles of content, from a theoretical

point of view. This work serves as a step towards understanding

and addressing such vulnerabilities with several future directions

to explore further. A key question is whether we can prove quan-

titative rates for the divergence of ∥𝑢𝑡 ∥, hence understanding theexact dependence of echo chamber formation on the parameters of

the system. Secondly, a natural question is what is the largest class

of true preference functions for the user 𝜋𝑡 (·) for which there is no

asymptotically stable fixed point? Finally, an interesting effect that

we need to take into account when applying mitigation strategies,

is whether there are trade-offs in user engagement with the system.

In Section 4.3, we provide preliminary evidence that this is not the

case in real systems but further investigation is required.

REFERENCES[1] Guy Aridor, Duarte Goncalves, and Shan Sikdar. 2020. Deconstructing the Filter

Bubble: User Decision-Making And Recommender Systems. In Fourteenth ACMConference on Recommender Systems (RecSys ’20).

[2] Çigdem Aslay, Antonis Matakos, Esther Galbrun, and Aristides Gionis. 2018.

Maximizing the Diversity of Exposure in a Social Network. In IEEE InternationalConference on Data Mining, ICDM 2018, Singapore, November 17-20, 2018.

[3] E. Bakshy, S. Messing, and L. A. Adamic. 2015. Exposure to ideologically diverse

news and opinion on Facebook.. In Science. 348:1130–1132.[4] S. Banisch and E. Olbrich. 2019. Opinion polarization by learning from social

feedback. The Journal of Mathematical Sociology 43, 2 (2019), 76–103.

[5] Andrea Barraza-Urbina and Dorota Glowacka. 2020. Introduction to Bandits in

Recommender Systems. In Fourteenth ACM Conference on Recommender Systems(RecSys ’20). Association for ComputingMachinery, New York, NY, USA, 748–750.

[6] Stephen Bonner and Flavian Vasile. 2018. Causal embeddings for recommendation.

In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018.[7] Djallel Bouneffouf, Amel Bouzeghoub, and Alda Gançarski. 2012. A Contextual

Bandit Algorithm for Mobile Context-Aware Recommender System. NeuralInformation Processing Systems.

[8] Jiawei Chen, Hande Dong, XiangWang, Fuli Feng, MengWang, and Xiangnan He.

2020. Bias and Debias in Recommender System: A Survey and Future Directions.

CoRR abs/2010.03240 (2020). arXiv:2010.03240

[9] Uthsav Chitra and Christopher Musco. 2020. Analyzing the Impact of Filter

Bubbles on Social Network Polarization. In Proceedings of the 13th InternationalConference on Web Search and Data Mining (Houston, TX, USA) (WSDM ’20).

[10] Matteo Cinelli, Gianmarco De Francisci Morales, Alessandro Galeazzi, Walter

Quattrociocchi, and Michele Starnini. 2020. Echo Chambers on Social Media: A

comparative analysis. CoRR abs/2004.09603 (2020). arXiv:2004.09603

[11] Mark A. Davenport, Yaniv Plan, Ewout van den Berg, and Mary Wootters. 2012.

1-Bit Matrix Completion. CoRR abs/1209.3672 (2012). arXiv:1209.3672

[12] SarahDean,Mihaela Curmei, and Benjamin Recht. 2020. Designing Recommender

Systems with Reachability in Mind. In Participatory Approaches to Machine Learn-ing workshop at ICML.

[13] Sarah Dean, Sarah Rich, and Benjamin Recht. 2020. Recommendations and user

agency: the reachability of collaboratively-filtered information. In FAT* ’20: ACMConference on Fairness, Accountability, and Transparency, Barcelona, Spain.

[14] Facebook. 2020. Community Standards Enforcement Report. (2020). https:

//transparency.facebook.com/community-standards-enforcement

[15] Kiran Garimella, Gianmarco De Francisci Morales, Aristides Gionis, and Michael

Mathioudakis. 2018. Political Discourse on Social Media: Echo Chambers, Gate-

keepers, and the Price of Bipartisanship. In Proceedings of the 2018 World WideWeb Conference, WWW 2018, Lyon, France.

[16] F. Maxwell Harper and Joseph A. Konstan. 2015. TheMovieLens Datasets: History

and Context. 5, 4 (2015).

[17] Elad Hazan. 2019. Introduction to Online Convex Optimization. CoRRabs/1909.05207 (2019). arXiv:1909.05207

[18] Homa Hosseinmardi, Amir Ghasemian, Aaron Clauset, David Rothschild, Marine

Mobius, and Duncan Watts. 2020. Evaluating the scale, growth, and origins of

right-wing echo chambers on YouTube. https://arxiv.org/abs/2011.12843

[19] Olivier Jeunen. 2019. Revisiting offline evaluation for implicit-feedback recom-

mender systems. In Proceedings of the 13th ACM Conference on RecommenderSystems. 596–600.

[20] Ray Jiang, Silvia Chiappa, Tor Lattimore, András György, and Pushmeet Kohli.

2019. Degenerate Feedback Loops in Recommender Systems. In Proceedings of theAAAI/ACM Conference on AI, Ethics, and Society, AIES 2019, Honolulu, HI, USA.

[21] Marius Kaminskas and Derek Bridge. 2016. Diversity, Serendipity, Novelty, and

Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in

Recommender Systems. ACM Trans. Interact. Intell. Syst., Article 2 (Dec. 2016).[22] Tyll Krueger, Janusz Szwabiński, and Tomasz Weron. 2017. Conformity, Anti-

conformity and Polarization of Opinions: Insights from a Mathematical Model of

Opinion Dynamics. Entropy 19, 7 (Jul 2017), 371.

[23] Matevz Kunaver and Tomaz Pozrl. 2017. Diversity in recommender systems - A

survey. Knowl. Based Syst. 123 (2017), 154–162.[24] Mark Ledwich and Anna Zaitsev. 2020. Algorithmic extremism: Examining

YouTube’s rabbit hole of radicalization. First Monday (02 2020).

[25] Y. Lei and W. Li. 2019. When Collaborative Filtering Meets Reinforcement

Learning. ArXiv abs/1902.00715 (2019).

[26] Kernel Matrix Factorization Library. 2020. https://pypi.org/project/matrix-

factorization/

[27] Antonis Matakos and Aristides Gionis. 2018. Tell me Something My Friends

do not Know: Diversity Maximization in Social Networks. In IEEE InternationalConference on Data Mining, ICDM 2018, Singapore, November 17-20, 2018. IEEEComputer Society, 327–336.

[28] Celestine Mendler-Dünner, Juan C. Perdomo, Tijana Zrnic, and Moritz Hardt.

2020. Stochastic Optimization for Performative Prediction. In Annual Conferenceon Neural Information Processing Systems 2020, NeurIPS 2020, virtual.

[29] Judith Möller, Damian Trilling, Natali Helberger, and Bram van Es. 2018. Do not

blame it on the algorithm: an empirical assessment of multiple recommender

systems and their impact on content diversity. Information, Communication &Society 21, 7 (2018), 959–977.

[30] Tien T. Nguyen, Pik-Mai Hui, F. Maxwell Harper, Loren Terveen, and Joseph A.

Konstan. 2014. Exploring the Filter Bubble: The Effect of Using Recommender

Systems on Content Diversity. In Proceedings of the 23rd International Conferenceon World Wide Web (WWW ’14).

[31] Jack Nicas. 2018. How Youtube drives people to the internet’s darkest corners.

Wall Street Journal, February 2018 (2018). https://www.wsj.com/articles/how-

youtube-drives-viewers-to-the-\internets-darkest-corners-1518020478

[32] Juan C. Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, and Moritz Hardt.

[n.d.]. Performative Prediction. In Proceedings of the 37th International Conferenceon Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event.

[33] Rocha L.E.C. Perra, N. 2019. Modelling opinion dynamics in the age of algorithmic

personalisation. Sci Rep (2019). https://doi.org/10.1038/s41598-019-43830-2

[34] Steffen Rendle and Lars Schmidt-Thieme. 2008. Online-Updating Regularized

Kernel Matrix Factorization Models for Large-Scale Recommender Systems. In

Proceedings of the 2008 ACM Conference on Recommender Systems (Lausanne,Switzerland) (RecSys ’08).

[35] David Sabin-Miller and Daniel M. Abrams. 2020. When pull turns to shove: A

continuous-time model for opinion dynamics. Phys. Rev. Research 2 (Oct 2020),

043001. Issue 4. https://doi.org/10.1103/PhysRevResearch.2.043001

[36] J. Ben Schafer, Dan Frankowski, Jonathan L. Herlocker, and Shilad Sen. 2007.

Collaborative Filtering Recommender Systems. In The Adaptive Web, Methods andStrategies of Web Personalization (Lecture Notes in Computer Science). Springer.

[37] J. Ben Schafer, Joseph A. Konstan, and John Riedl. 2001. E-Commerce Recom-

mendation Applications. Data Min. Knowl. Discov. 5, 1/2 (2001), 115–153.[38] Joan E. Solsman. 2018. Youtube’s AI is the puppet master over most of what you

watch. CNET (2018). www.cnet.com/news/youtube-ces-2018-neal-mohan/

[39] Zeynep Tufekci. 2018. Youtube, the great radicalizer. The New York Times, March(2018). www.nytimes.com/2018/03/10/opinion/sunday/youtube-politics-radical

[40] Berk Ustun, Alexander Spangher, and Yang Liu. 2019. Actionable Recourse in

Linear Classification. In Proceedings of the Conference on Fairness, Accountability,and Transparency (Atlanta, GA, USA) (FAT* ’19). Association for Computing

Machinery, New York, NY, USA, 10–19. https://doi.org/10.1145/3287560.3287566

[41] Zeng Wei, Jun Xu, Yanyan Lan, Jiafeng Guo, and Xueqi Cheng. 2017. Reinforce-

ment Learning to Rank with Markov Decision Process.

[42] Yahoo! 2006. R3 − Music ratings for User Selected and Randomly Selected songs.

https://webscope.sandbox.yahoo.com/catalog.php?datatype=r

https://arxiv.org/abs/2010.03240



https://transparency.facebook.com/community-standards-enforcement

https://transparency.facebook.com/community-standards-enforcement



https://pypi.org/project/matrix-factorization/

https://pypi.org/project/matrix-factorization/

https://www.wsj.com/articles/ how-youtube-drives-viewers-to-the-\ internets-darkest-corners-1518020478

https://www.wsj.com/articles/ how-youtube-drives-viewers-to-the-\ internets-darkest-corners-1518020478

https://doi.org/10.1038/s41598-019-43830-2

https://doi.org/10.1103/PhysRevResearch.2.043001

www.cnet.com/news/youtube-ces-2018-neal-mohan/

www. nytimes.com/2018/03/10/opinion/sunday/youtube-politics-radical

https://doi.org/10.1145/3287560.3287566

https://webscope.sandbox.yahoo.com/catalog.php?datatype=r

A OMITTED PROOFSProof of Lemma 1. In order for 𝑢𝑡 to be at an exact fixed point

we need

𝑢𝑡+1 = 𝑢𝑡 ⇒ E𝑥,𝑦[∇ℓ

(𝑢𝑡 , (𝑥,𝑦)

) ]= 0

In the following, we write ℓ𝑢 (𝑥,𝑦) instead of ℓ(𝑢𝑡 , (𝑥,𝑦)

)to sim-

plify notation. Using the definition of the negative log-likelihood

function ℓ , we have that

E𝑥,𝑦[∇ℓ𝑢 (𝑥,𝑦)

]=∑𝑥,𝑦

P[𝑥]P[𝑦 |𝑥]∇ℓ𝑢 (𝑥,𝑦)

=∑𝑥

P[𝑥](P[𝑦 = 1|𝑥]∇ℓ𝑢 (𝑥, 1)

+P[𝑦 = 0|𝑥]∇ℓ𝑢 (𝑥, 0))=∑𝑥

𝑥 · P[𝑥] ·(𝜎 (𝑢⊤𝑡 𝑥) − 𝜋 (𝑢𝑡 , 𝑥)

)= −

∑𝑥

𝑥 · P[𝑥] · 𝛾 (𝑢⊤𝑡 𝑥)

where the last line follows by the definition of a 𝛾-sensitive user.

Based on the definition of the function𝛾 we can expand the equation

as follows: ∑𝑥

𝑥 · P[𝑥] · 𝛾 · 𝛼𝑢𝑡 ,𝑥 · (𝑢⊤𝑡 𝑥) · 𝑒−(𝑢⊤𝑡 𝑥)2 = 0

Letting 𝑐𝑡,𝑥 = 𝛾 · P[𝑥] · 𝛼𝑢𝑡 ,𝑥 · 𝑒−(𝑢⊤𝑡 𝑥)2 we have:(∑

𝑥

𝑐𝑢,𝑥 · 𝑥𝑥⊤)𝑢𝑡 = 0

The above system has unique solution𝑢𝑡 = 0 iff

∑𝑥 𝑐𝑡,𝑥 ·𝑥𝑥⊤ has

full rank. Consider the𝑚 ×𝑚 diagonal matrix with values 𝑐𝑡,𝑥 in

the diagonal:𝐶𝑡 = 𝑑𝑖𝑎𝑔( [𝑐𝑡,𝑥𝑖 ]𝑚𝑖=1). We can rewrite

∑𝑥 𝑐𝑡,𝑥 · 𝑥𝑥⊤ =

𝑋𝐶𝑡𝑋⊤. Also, notice that all the terms 𝑐𝑡,𝑥 > 0 for all 𝑡, 𝑥 , hence𝐶𝑡

is positive definite.

Now if 𝑟𝑎𝑛𝑘 (𝑋𝑋⊤) = 𝑑 we have that for any 𝑣 ∈ R𝑑 s.t. 𝑣 ≠ 0:

𝑣⊤𝑋𝐶𝑡𝑋⊤𝑣 = (𝑋⊤𝑣)⊤𝐶𝑡 (𝑋⊤𝑣) > 0,

since 𝑋⊤𝑣 = 0 has a unique solution 𝑣 = 0 because 𝑟𝑎𝑛𝑘 (𝑋⊤) = 𝑑as well. This implies that 𝐴𝑡 is positive definite and hence, also full

rank: 𝑟𝑎𝑛𝑘 (𝑋𝐶𝑡𝑋⊤) = 𝑑 , completing the proof of the lemma.

□

Proof of Lemma 3. Let 𝑠𝑡 = (𝑠𝑡𝑥 )𝑥 be the vector of the scores of

all the items at time 𝑡 . Using the notation and the derivations of

the proof of Lemma 1 we have:

𝑠𝑡+1 = 𝑋⊤𝑢𝑡+1

= 𝑋⊤ (𝐼𝑑 +∑𝑥

𝑐𝑡,𝑥𝑥𝑥⊤)𝑢𝑡

= 𝑋⊤ (𝐼𝑑 + 𝑋𝐶𝑡𝑋⊤)𝑢𝑡= 𝑋⊤𝑋

(𝐼𝑚 +𝐶𝑡

)𝑋⊤𝑢𝑡 =

(𝐼𝑚 +𝐶𝑡

)𝑠𝑡

where we used the fact that 𝑋𝑋⊤ = 𝐼𝑑 . Now, since 𝐶𝑡 is diagonal

with positive diagonal elements each score gets amplified, meaning

that it keeps the same sign and its absolute value increases, and the

lemma follows. □

B ADDITIONAL PLOTSB.1 Model Size 𝒅

Figure 11: Effect of different model sizes 𝒅 for the uniform(first row), andmixture distribution (second row), theMovie-lens dataset (third row) and the Yahoo dataset (fourth row).

We experiment with different model sizes 𝑑 ∈ {5, 15, 30, 60} forfixed 𝛽 = 1 and 𝛾 = 0.5. We repeat the simulation for each model

size 10 times and we report the mean and the standard deviations of

our metrics for all four datasets in Figure 11. It is evident from the

plots, the effect of model size heavily depends on the distribution

of the items in the system. If the distribution is not heavily skewed

towards some particular direction, then higher model sizes lead

to faster divergence in the user preferences but the effect is not

particularly strong. However, in the case of the mixture distribution

where by construction we initialize 𝑢0 > 0 the initial scores of

the Type-2 items are very high. As a result all the mass is already

concentrated there leading to essentially no movement in 𝑢𝑡 .

B.2 Stochastic Gradient DescentBy analyzing the gradient descent update in (2) we can study the

expected trajectory of the vector 𝑢𝑡 and user-system interaction.

However, this update is difficult to implement and in practice the

common approach is to substitute it with SGD updates. In that case,

Figure 13: Effect of gamma for the mixture distribution(first row), the Movielens dataset (second row) and the Ya-hoo dataset (third row). With the 𝛾 parameter controllingthe amount of preference reinforcement that a user exhibits,it is expected that in all cases higher values for 𝛾 lead tofaster divergence of the user preferences as well as higherseparation between likable and not likable items.

Figure 14: Evolution of the recommender system 𝒇𝒕 for a userthat is interested in low-prevalence content. Consider the two-dimensional item distribution where the first feature is uni-form in [-1, 1], while the second feature is uniform in [-1,0.8] with probability 90% and uniform in (0.8, 1] with theremaining 10%. Hence, all item for which 𝒙2 > 0.8 are signif-icantly more rare in the system. We consider a user with 𝒖0whose second coordinate is positive. The two pictures on theleft show the distribution 𝒇𝒕 in the beginning and after 100steps of interaction with the user for 𝜷 = 0.5. Brighter col-ors indicate higher probability mass. The two pictures onthe right correspond to 𝜷 = 2. In the case where 𝜷 = 0.5 theinitial surface is virtually identical to the actual distributionof items in the system, while in the case where 𝜷 = 2 there isnoticeable difference. At 𝒕 = 100, in both cases the probabil-ity mass concentrates on the items for which 𝒙2 is large, andthe effect is much more pronounced with higher 𝜷 . In par-ticular, for 𝜷 = 2, already after 100 iterations the mass startsconcentrating in the low-prevalence items, while for 𝜷 = 0.5themass concentrates on the barrier between thehighpreva-lence and the low prevalence items, without being able toefficiently surface the low prevalence items yet.

the vector 𝑢𝑡 evolves as:

𝑢𝑡+1 = 𝑢𝑡 − [𝑡 ·1

𝑘

𝑘∑𝑖=1

∇ℓ(𝑢𝑡 , (𝑥𝑡𝑖 , 𝑦

𝑡𝑖 ))

where 𝑘 can be thought as the number of items that are presented

to the user at each iteration. The user responds with feedback

𝑦𝑡𝑖∈ {0, 1} to each of them. We set 𝑘 = 30 for the experiments. For

the synthetic distributions we run the experiments with 20 different

initializations and for each initialization we run it for 15 times. For

the real datasets we just run SGD updates for the 50 selected users

as before. We report the mean and the standard deviations of the

run for diffrent values of the parameter 𝛽 in Figure 12. As before

we run the simulation for 200 different iterations for the synthetic

and for 750 for the real datasets and we use [ = 0.1.

As in the case of the the expected loss (Figures 3 - 5) the norm

of 𝑢𝑡 diverges in all cases and the user’s preference get polarized,

with the noise of SGD having limited effect. On the other hand, we

see that the filter-bubble phenomenon, i.e. how fast does the rec-

ommender system converge into showing the user only items that

they like is less evident here and the respective standard deviations

are very large, indicating that the trajectory of 𝑓𝑡 is very sensitive

to the initial conditions 𝑢0.

Figure 12: SGD with different 𝛽 parameters for the uni-form distribution (first row), the mixture distribution (sec-ond row), Movielens (third row) and Yahoo (fourth row).

preference amplification in recommender systems

Documents