boolean functions whose fourier transform is concentrated ... · jarett schwartz for comments on an...

TEL-AVIV UNIVERSITYRAYMOND AND BEVERLY SACKLER

FACULTY OF EXACT SCIENCESBLAVATNIK SCHOOL OF COMPUTER SCIENCE

Boolean functions whose Fouriertransform is concentrated on pair-wise

disjoint subsets of the inputs

Thesis submitted in partial fulfillment of the requirements for the M.Sc.degree in the School of Computer Science, Tel-Aviv University

by

Aviad Rubinstein

The research for this thesis has been carried out at Tel-Aviv Universityunder the supervision of Prof. Muli Safra

October 2012

Acknowledgements

I would like to thank:

Muli for teaching me so much about math and life and for giving more than I could haveasked for;

Jarett Schwartz for comments on an earlier version of this thesis;

Shai Vardi for help with editing;

and my family and friends for everything else.

Abstract

We consider Boolean functions f : ±1m → ±1 that are close to a sum of independentfunctions fj on mutually exclusive subsets of the variables Ij ⊆ P ([m]). We prove thatany such function is close to just a single function fk on a single subset Ik.

We also consider Boolean functions f : Rn → ±1 that are close, with respect to anyproduct distribution over Rn, to a sum of their variables. We prove that any such function isclose to one of the variables.

Both our results are independent of the number of variables, but depend on the variance off . I.e., if f is (ε · Varf)-close to a sum of independent functions or random variables, then itis O (ε)-close to one of the independent functions or random variables, respectively. We provethat this dependence on Varf is tight.

Our results are a generalization of [16], who proved a similar statement for functionsf : ±1n → ±1 that are close to a linear combination of uniformly distributed Booleanvariables.

Contents

1 Introduction 31.1 The long code and related works . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Our results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Preliminaries 92.1 Fourier transform over ±1n . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 L2-squared semi-metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Related Works 133.1 Related Works on Low Influence Functions . . . . . . . . . . . . . . . . . . . 133.2 Related Works on Almost Linear Functions . . . . . . . . . . . . . . . . . . . 14

4 Our Main Results 174.1 From our results to the FKN Theorem . . . . . . . . . . . . . . . . . . . . . . 19

5 High-Level Outline of the Proof 20

6 Proofs 226.1 From variance of absolute value to variance of absolute value of sum: proof of

lemma 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226.2 From variance of absolute value of sum to variance: proof of lemma 12 . . . . 236.3 Proof of the main theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.4 Proof of the extension to FKN Theorem . . . . . . . . . . . . . . . . . . . . . 26

7 Proofs of technical claims 287.1 Convexity of the variance function: proof of claim 16 . . . . . . . . . . . . . . 287.2 Expected squared distance: proof of claim 17 . . . . . . . . . . . . . . . . . . 297.3 Constant absolute value: proof of claim 18 . . . . . . . . . . . . . . . . . . . . 31

8 Tightness of results 348.1 Tightness of the main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348.2 Tightness of lemma 13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

1

9 Discussion 369.1 Hypercontractivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369.2 Open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

9.2.1 Interesting Conjectures . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2

Chapter 1

Introduction

We consider n-variate Boolean functions f : ±1n → ±1. The study of Boolean functionshas prospered in recent decades thanks to its importance in many areas of mathematics andtheoretical computer science.

For example, Boolean functions can be used to model voting mechanisms. Consider apopulation of n voters, where each can vote in favor (+1) or against (−1) some action, party,candidate, etc.. A voting mechanism f takes these voters’ preferences (a binary string of lengthn) and aggregates them into a single communal decision. Examples of interesting voting mech-anisms include:

• dictatorship - the outcome is determined by one voter’s vote (corresponding to f (X) =xi);

• majority - the outcome is decided in accordance with the majority of the population(corresponding to f (X) = sign (

∑xi));

• and tribes [5] - the population is divided into smaller “tribes”; the decision is accepted ifat least one tribe is unanimously in favor of accepting.

How do we compare mechanisms? Which of these mechanism is the best? Can we come upwith a better mechanism? In attempts to improve our understanding of these questions, severalproperties of voting mechanisms have been studied, such as:

• balance - we say that a mechanism is balanced if, given that the votes are chosen uni-formly at random, each outcome (+1 or −1) is equally likely;

• stability - we say that a mechanism is stable if independent random changes in voters’votes are unlikely to change the outcome of the vote;

• the influence of the ith voter is the likelihood that a change in her vote will change thefinal outcome of the mechanism;

• and junta - we say that a mechanism is a junta if its outcome depends on only a smallnumber of votes.

Using deep insights from Fourier analysis, interesting conclusions can be drawn about thesevoting mechanisms. For example, it can be mathematically proven that dictatorship is the bal-anced voting mechanism that maximizes stability. On the other hand, tribes is the balanced

3

function that minimizes the maximal individual influence [22]. Finally, majority achieves bothproperties, as it maximizes stability amongst all balanced functions with low maximal individ-ual influence [32].

Results on Boolean functions also find many applications in learning theory, where oneoften wishes to learn the behavior of an unknown Boolean function. A practical example maybe training a Boolean classifier that decides whether a given n-bit string (e-mail) should beclassified as positive (an important email) or negative (spam).

Boolean functions come up again in random graph theory: The adjacency matrix of thegraph can be viewed as a |V |2-bit string, and one can ask a variety of Boolean-valued ques-tions about the graph’s properties, such as is the graph connected? is it k-colorable? is itλ-expanding? etc.

1.1 The long code and related worksOne of the most important historical driving forces in the study of Boolean functions has beentheir applications to testing of error correcting codes [7]. In particular, the long code [4] can beviewed as evaluations of dictatorships functions. Each codeword in the long code correspondsto the evaluation of a dictatorship f (X) = xi on all the points on the n-dimensional Booleanhypercube. Indeed, the the long code is highly inefficient - since there are only n possibledictatorships, the long code encodes log n bits of information in a 2n-bit codeword. Despite itslow rate, the long code is an important tool in many results on hardness of approximation andprobabilistically checkable proofs (such as [4, 19, 18, 27, 11, 26, 9, 8, 25, 2]).

The great virtue of the long code is that it is a locally testable code: It is possible to distin-guish, with high probability, between a legal codeword and a string that is far from any legalcodeword, by querying just a few random bits of the string. Naturally, this is a highly desirableproperty when constructing probabilistically checkable proofs, which are proofs that must beverified by reading a few random bits of the proof.

To see that the long code is indeed locally testable, recall that every word is encoded bythe evaluations of a dictatorship. A naive approach to testing dictatorship would first find thedictator variable and then go through all the other variables to make sure that the function doesnot depend on them. Each query to a Boolean function can give at most one bit of informationabout the function. Since there are n candidate variables, finding the dictator variable requiresat least log n queries. Clearly, this naive strategy could not be implemented using just a fewqueries.

A function is called anti-symmetric (sometimes also “odd” or “folded”) if it satisfies f (X) =−f (−X) for every input X . For example, the dictatorship f (X) = xi is anti-symmetric.Observe that any anti-symmetric function is in particular balanced: For uniformly distributedBoolean random inputs, X and −X are equally likely, so the probability that f (X) = 1 is thesame as the probability that f (−X) = 1, which by anti-symmetry is equal to the probabilitythat f (X) = −1.

Instead of using naive testing for dictatorship, recall that dictatorship is the stablest balancedBoolean function; thus in particular it is the stablest anti-symmetric Boolean function. Stabilityand anti-symmetry are local properties that one can test by querying the function at only twopoints. A simple test for anti-symmetry and stability takes a random point X and a closepoint Y , and verifies that f (X) = −f (−Y ). If f is stable, then f (X) = f (Y ) with high

4

probability; if f is anti-symmetric then f (Y ) = −f (−Y ) for all Y . Thus a stable anti-symmetric function would pass the test with high probability, whereas any function that is farfrom stable and anti-symmetric would not.

Since our test only probabilistically approximates the stability, we cannot guarantee thatthe function tested is an exact dictatorship. However, a theorem of Bourgain says that it issufficient to know that an anti-symmetric function has a relatively high stability to concludethat it depends almost completely on a small number of variables.

Theorem. (Bourgain’s Junta Theorem, Informal [7]) Every balanced Boolean function withhigh stability is close to a junta.

In many applications of the long code, knowing that the codeword is close to a junta suffices,because it allows us to list-decode the codeword. In general, list-decoding means computing ashort list of candidate codewords which are close to a given string; in the context of the longcode, list-decoding means producing a short list of candidate variables with high influence thatmay be close to dictators.

Another property, closely related to stability, is the total influence of a function, i.e. thesum of the influences of its variables. Intuitively the total influence answers the question “onaverage, how many different bits / voters can I change, such that each change (separately)would cause a change in the output of the function / mechanism?” Hence the total influence issometimes also called average sensitivity.

Unlike stability, the total influence is not a local property of a function in the same senseof property testing. In fact even for monotone functions, approximating the total influencerequires more than a few queries [34]. Rather the total influence is local in the sense that it canbe forced to be low using local constraints (e.g. see the construction in [11]).

Among balanced functions, dictatorships minimize the total influence. For some applica-tions of the long code (e.g. [11, 8, 25, 2]) the total influence rather than the stability is boundedin order to show that a function is close to dictatorship. The following theorem by Friedgut canbe used:

Theorem. (Friedgut’s Lemma, Informal [13]) Every balanced Boolean function with low totalinfluence is close to a junta.

Every function over the n-dimensional Boolean hypercube can also be represented as apolynomial in the variables x1, x2, . . . , xn. The dictator function, f (X) = xi is the onlybalanced Boolean function that is linear1. Using local queries, it is also possible to estimatewhether a Boolean function is close to a linear one. These properties also can be used bylong-code testers [9] together with the following theorem by Friedgut, Kalai, and Naor:

Theorem. (FKN Theorem, Informal [16]) Every balanced Boolean function that is almostlinear is almost a dictatorship.

Intuitively, one may expect such results to be true, because a linear combination that is“well-spread” among many independent variables (i.e. far from dictatorship of one variable)should be distributed similarly to a “bell-curved” Gaussian; in particular, it should be far from

1Throughout the paper we allow “linear” functions to have a nonzero constant term. In other contexts thesefunctions are called “affine” to distinguish from functions that do not have a constant term..

5

the ±1 distribution of a Boolean function which is bimodal, i.e. has two distinct modes or“peaks” at −1 and +1.

Often, it is easy to show that a function behaves “nicely” in a local sense. The three theo-rems mentioned above, as well as many others in the field, share the common intuitive theme,which is proving that if a Boolean function satisfies a local property (high stability, low totalinfluence, close to linear, etc.), then it must be simple in a global sense. This ability to deducefrom the local behaviour of a function to its global properties is useful in many applications insocial choice theory, learning theory, graph theory, complexity theory, and more.

1.2 Our resultsIn this work extend the intuition from the FKN Theorem, that a well-spread sum of independentvariables must be far from Boolean. In particular we ask the following questions:

1. What happens when the variables are not uniformly distributed over ±1? In particular,we consider variables which are not even Boolean or symmetric.

In a social choice setting, it may be intuitive to consider a mechanism that takes intoaccount how strong is each voter’s preference. For example, in some countries the elec-tions are known to be highly influenced by the donations the candidates manage to collect(“argentocracy”).

In the context of computational complexity, Boolean analysis theorems that considernon-uniform distributions have proven very useful. In particular, Dinur and Safra use thep-biased long code in their proof of NP-hardness of approximation of the Vertex Coverproblem [11]. In the p-biased long code each codeword corresponds to a dictatorship, ina population where each voter independently chooses −1 with probability 0 < p < 1

2

and +1 with probability 1 − p. Friedgut’s extension of his Lemma to such non-uniformproduct distributions was key to Dinur and Safra’s proof.

In this work we prove that even when the variables are not uniformly distributed over±1, every Boolean function that is close to their sum must be close to one of them:

Theorem. (Theorem 9 for balanced functions, Informal) Every balanced Boolean func-tion that is almost a linear combination of independent variables (not necessarily Booleanor symmetric) is almost a dictatorship.

2. What happens when rather than a sum of variables, we have a sum of functions overindependent subsets of Boolean variables?

In a social choice setting, it may be intuitive to consider a situation where the populationis divided into tribes; each tribe has an arbitrarily complex internal mechanism, but theoutcomes of all the tribes must be aggregated into one communal decision by a simple(i.e. almost linear) mechanism.

Our main hope is that this setting will also find interesting applications in computationaltheory settings where such special structures arise.

Observe that this question is tightly related to the previous question because each ar-bitrary function over a subset of Boolean variables can be viewed as an arbitrarily-distributed random variable.

6

In this work we prove that any balanced function that is close to a sum of functions overindependent subsets of its variables is almost completely determined by a function on asingle subset:

Theorem. (Corollary 10 for balanced functions, Informal) Every balanced Boolean func-tion that is close to a sum of functions on mutually exclusive subsets of the population isclose to a dictatorship by one subset of the population.

As we will see later, the precise statement of the FKN Theorem does not require the Booleanfunction to be balanced. If we do not require the function to be linear, there is an obviousexception to the theorem - constant functions, f (X) = 1 and f (X) = −1, are not dictator-ships but are considered linear. The general statement of the FKN Theorem says that a Booleanfunction that is almost linear is either almost a dictatorship or almost a constant function. Moreprecisely, it says that the distance2 of any Boolean function from the nearest linear (not neces-sarily Boolean) function is smaller by at most a constant multiplicative factor than the distancefrom either a dictatorship or a constant function.

One may hope to extend this relaxation to non-Boolean random variables or subsets ofBoolean random variables. E.g. we would like to claim that the distance of any Booleanfunction from a sum of functions on mutually exclusive subsets of the variables is at most thedistance from a function on a single subset or a constant function. However, it turns out thatthis is not the case - in Lemma 11 we show that this naive extension of the FKN Theorem isfalse!

The variance of a Boolean function measures how far it is from a constant (either −1 or1). For example, the variance of any balanced Boolean function is 1, whereas any constantfunction has a variance of 0. In order to extend our results to non-balanced Boolean functions,we have to correct for the low variance. In Theorem 9 and Corollary 10 we prove that the abovetwo theorems extend to non-balanced functions relatively to the variance:

Theorem. (Theorem 9, Informal) Every Boolean function that is (ε× variance)-close to a sumof independent variables (not necessarily Boolean or symmetric) is ε-close to a dictatorship.

Theorem. (Corollary 10, Informal) Every balanced Boolean function that is (ε × variance)-close to a sum of functions on mutually exclusive subsets of the population is ε-close to adictatorship by one subset of the population.

Intuitively these amendments to our main theorems mean that in order to prove that aBoolean function is close to a dictatorship, we must show that it is very close to linear.

Finally, in Lemma 11 we show that this dependence on the variance is necessary and tight.

1.3 OrganizationWe begin with some preliminaries in section 2. In section 3 we give a brief survey of relatedworks. In section 4 we formally state our results. In section 5 we give an intuitive sketch ofthe proof strategy. The interesting ingredients of the proof appear in section 6, whereas someof the more tedious case analyses are postponed to section 7. Tightness for some of the results

2For the ease of introduction, we use the word “distance” in an intuitive manner throughout this section.However, formally we will use the squared-L2 semidistance. See Section 2.2 for more details.

7

is shown in section 8. Finally, in section 9 we make some concluding comments and discusspossible extensions.

8

Chapter 2

Preliminaries

2.1 Fourier transform over ±1n

Every function over the n-dimensional Boolean hypercube can also be represented as a poly-nomial in the n-tuple of variables X = (x1, x2, . . . , xn). Since each variable can take only twopossible values, it suffices to consider only the zeroth and first powers of each variable. In par-ticular, notice that for xi ∈ ±1 we have that for any even power x2k

i = 1k = 1, while for anyodd power x2k+1 = xi. Thus, we can represent any function f : ±1n → R as a multilinearpolynomial x1, x2, . . . , xn, i.e. a linear combination of monomials, called characters, of theform χS (X) =

∏i∈S xi for every S subset of [n] = 1, 2, . . . , n. We denote the coefficient of

the χS by f (S); this is called the Fourier coefficient of the subset S. Thus we have the Fourierrepresentation of f :

f (X) =∑S⊆[n]

f (S)χS (X) =∑S⊆[n]

f (S)∏i∈S

xi

The main tool in analysis of Boolean functions is the study of their Fourier coefficients. Allthe terms mentioned in the introduction with respect to voting mechanisms can be redefined interms of the Fourier coefficients; for example, a function is balanced iff its empty character hasa zero coefficient (f (φ) = 0).

The Fourier representation is particularly useful because the Fourier characters form anorthonormal basis for the real- (or complex-) valued functions over ±1n:

〈χS, χT 〉 = EX[χS (X) ¯χT (X)

]= EX [χS4T (X)] =

1 S = T

0 S 6= T

The Fourier coefficients are given by the projection of the function to this basis:

f (S) = EX∈±1n [f (X) · χS (X)]

Note that by Parseval’s Identity, for every Boolean function, the squares of the Fouriercoefficients sum to one.

Fact 1. For f : ±1n → ±1, ∑S⊆[n]

f (S)2 = 1

9

This fact is particularly useful because it means that the Fourier weights,f (S)

S⊆[n]

define a probability distribution over P ([n]).

2.2 L2-squared semi-metricThroughout the paper, we define “closeness” of random variables using the squared L2-norm:

‖X − Y ‖22 = E

[(X − Y )2]

It is important to note that this is a semi-metric as it does not satisfy the triangle inequality.Instead, we will use the 2-relaxed triangle inequality:

Fact 2.‖X − Y ‖2

2 + ‖Y − Z‖22 ≥

1

2‖X − Z‖2

2

Proof.

‖X − Y ‖22 + ‖Y − Z‖2

2 ≥1

2(‖X − Y ‖2 + ‖Y − Z‖2)2 ≥ 1

2‖X − Z‖2

2

Although it is not a metric, the squared L2-norm has some advantages when analyzingBoolean functions. In particular, when comparing two Boolean functions, the squared L2-norm does satisfy the triangle inequality because it is simply four times the Hamming distance,and also twice the L1-norm (“Manhattan distance”): ‖f − g‖2

2 = 4 · Pr [f 6= g] = 2 ‖f − g‖1.Additionally, the squared L2-norm behaves “nicely” with respect to the Fourier transform:

Fact 3.‖f − g‖2

2 =∑(

f (S)− g (S))2

Proof.

‖f − g‖22 =

∑(f − g (S)

)2

=∑(

f (S)− g (S))2

2.3 VarianceThe variance of random variable X is defined as

VarX = E[X2]− (EX)2

Observe that for a function f the variance can also be defined in terms of its Fourier coefficients,

Fact 4.Varf =

∑S 6=φ

f (S)2

10

Proof.

Varf = E[f 2]− (Ef)2

= E

(∑S

f (S)χS

)2−(E

∑S

f (S)χS

)2

=

(∑S,T

f (S) f (T ) EχSχT

)−

(∑S

f (S) EχS

)2

=

(∑S

f (S)2

)−(f (φ)

)2

Another useful way to define the variance is the expected squared distance between tworandom evaluations:

Fact 5. For any random variable X,

VarX =1

2· Ex1,x2∼X×X (x1 − x2)2

Proof.

Ex1,x2∼X×X (x1 − x2)2 = Ex21 + Ex2

2 − 2Ex1x2 = 2(EX2 − (EX)2) = 2VarX

We can also view the variance as the L2-squared semidistance from the expectation

Fact 6.VarX = ‖X − EX‖2

2

Proof.

E[X2]− (EX)2 = E

[X2]− 2E [XEX] + E

[(EX)2] = E

[(X − EX)2]

Recall also that the expectation EX minimizes this semi-distance ‖X − EX‖22:

Fact 7.VarX = min

E∈R‖X − E‖2

2

Proof. Differentiate twice with respect to E:

d

dE‖X − E‖2

2 = 2 (EX − E)

d2

dE2‖X − E‖2

2 = −2

11

Finally, using the 2-relaxed triangle inequality (fact 2), we can bound the difference invariance in terms of the L2-squared semimetric:

Fact 8.Varf ≥ 1

2Varg − ‖f − g‖2

2

Proof.

Varf + ‖f − g‖22 = ‖f − Ef‖2

2 + ‖f − g‖22 ≥

1

2‖g − Ef‖2

2 ≥1

2‖g − Eg‖2

2

12

Chapter 3

Related Works

3.1 Related Works on Low Influence FunctionsThe influence of a variable xi on a function is defined as the expected variance of f restrictedto a fixed assignment on all other variables:

Infi [f ] = Ex[n]\i

[Varxi

[f (X) | x[n]\i

]]In other words, one may think of the influence of the ith coordinate on f as the likelihood of aflip in the assignment to xi to cause a flip the value of f .

The total influence of the function f (sometimes also called average sensitivity) is the sumof the individual influences of all its variables:

Inf [f ] =n∑i=1

Infi [f ]

When f is a Boolean function, the squared coefficients of its Fourier transform sum to 1, andthus define a distribution. An alternative and equivalent definition of the total influence of aBoolean f is the expected degree of a character of f , with respect to the distribution defined bythe squared coefficients:

Inf [f ] = ES∼f(S)2 |S|

Intuitively, a Boolean function with a low total influence has a low degree “on average”.A sequence of works shows the intuitive statement that functions with low total influence are(almost) only dependent on a small number of coordinates. A function that depends on only kvariables is called a k-junta.

The seminal theorem of Friedgut [13] states that a Boolean function with total influence kmust be ε-close to a eO(k/ε)-junta, i.e. a function that depends on eO(k/ε) variables:

Theorem. (Friedgut’s Lemma [13]) For any Boolean function f : ±1n → ±1, any integerk > 0, and real number ε > 0, if Inf [f ] ≤ k, then f is ε-close to a eO(k/ε)-junta

Bourgain [7] has a strong variant of this theorem where the premise assumes that the func-tion is O

(k−

12−γ)

-close to k-degree function:

13

Theorem. (Bourgain’s Junta Theorem [7]) For any Boolean function f : ±1n → ±1, anyinteger k > 0, and real numbers γ, ε > 0, there exists a constant cγ,ε s.t. if∑

|S|<k

f (S)2 < cγ,εk− 1

2−γ

then f is ε-close to a eO(k2)ε

-junta.

(The exact parameters of Bourgain’s Junta Theorem have been slightly improved in theworks of Khot and Naor, Hatami, and Kindler and O’Donnell [24, 20, 28].)

The theorems of Friedgut and Bourgain have many applications in the last decade, includingNP-hardness [11] and UG-hardness [26, 8, 25, 2] of approximation, embeddability theory [26,30], and learning theory [33].

There are also several results proving different variants of these statements. Kindler andSafra, and later also Hatami [29, 27, 20], generalize Bourgain’s theorem to general p-biasedproduct distributions, with somewhat weaker parameters. Dinur, Kindler, Friedgut, and O’Donnell[10] prove a statement similar to Bourgain’s theorem for functions with a continuous boundedrange [−1, 1]. Sachdeva and Tulsiani [35] prove a generalization of Friedgut’s theorem, wherethe domain is the product of an arbitrary graph Gn, rather than the Boolean hypercube ±1n.In a recent result, Kindler and O’Donnell [28] prove a variant of Bourgain’s theorem for func-tions of variables sampled from a Gaussian distribution; they also provide a new proof of Bour-gain’s theorem using “elementary” methods.

3.2 Related Works on Almost Linear FunctionsA particularly interesting case of low degree functions are those that are close to linear. In theirseminal paper [16], Friedgut, Kalai, and Naor prove that if a Boolean function is ε-close tolinear, then it must be (K · ε)-close to a dictatorship or a constant function.

Theorem. (FKN Theorem [16]) Let f : ±1n → ±1 be a Boolean function, and supposethat f ’s Fourier transform is concentrated on the first two levels:∑

|S|≤1

f 2 (S) ≥ 1− ε

Then for some universal constant K:

1. either f is (K · ε)-close to a constant function; i.e. for some σ ∈ ±1

‖f − σ‖22 ≤ K · ε

2. or f is (K · ε)-close to a dictatorship; i.e. there exists k ∈ [n] and σ ∈ ±1 s.t. f :

‖f − σ · xk‖22 ≤ K · ε

14

While the original motivation for proving the FKN Theorem was applications in socialchoice theory [23], it has since found important applications in other fields, such as in Dinur’scombinatorial proof of the PCP theorem [9].

There are also many works on generalizations on the FKN Theorem. Alon et al. [1] andGhandehari and Hatami [17] prove generalizations for functions with domain Znr for r ≥ 2.Friedgut [14] proves a similar theorem that also holds for Boolean functions of higher degreesand over non-uniform distributions; however, this theorem requires bounds on the expecta-tion of the Boolean function. In [31], Montanaro and Osborne prove quantum variants of theFKN Theorem for any “quantum Boolean functions”, i.e. any unitary operator f s.t. f 2 is theidentity operator. Falik and Friedgut [12] prove a representation-theory version of the FKNTheorem, for functions which are close to a linear combination of an irreducible representationof elements of the symmetric group.

The FKN Theorem is an easy corollary once the following proposition is proven: If theabsolute value of the linear combination of Boolean variables has a small variance, then it mustbe concentrated on a single variable. Formally,

Proposition. (FKN Proposition [16]) Let (Xi)ni=1 be a sequence of independent variables with

supports ±ai s.t.∑a2i = 1. For some universal constant K, if

Var

∣∣∣∣∣∑i

Xi

∣∣∣∣∣ ≤ ε

then for some k ∈ [n]ak > 1−K · ε

Intuitively, this proposition says that if the variance was spread among many of the vari-ables, i.e. the “weights” (ai) were somewhat evenly distributed, then one would expect thesum of such independent variables to be closer to a Gaussian rather than a bimodal distributionaround ±1.

This proposition has been generalized in several ways in a sequence of fairly recent worksby Wojtaszczyk and Jendrej et al. ([36], [21]), which of our particular interest to us. Jendrej etal. prove extensions of the FKN Proposition to the following cases:

1. The case where Xi’s are independent symmetric

Theorem. ([21]) Let (Xi)ni=1 be a sequence of independent symmetric variables. Then

there exists an universal constant K, s.t. for some k ∈ [n]

infE∈R

Var

∣∣∣∣∣∑i

Xi + E

∣∣∣∣∣ ≥ Var∑

i 6=kXi

K

2. The case where all the Xi’s are identically distributed

15

Theorem. ([21]) Let (Xi)ni=1 be a sequence of i.i.d. variables which are not constant

a.s.. Then there exists a KX , which depends only on the distribution from which the Xi’sare drawn, s.t. for any sequence of real numbers (ai)

ni=1, for some k ∈ [n]

infE∈R

Var

∣∣∣∣∣E +∑i

aiXi

∣∣∣∣∣ ≥∑

i 6=k a2i

KX

Remark. Although it may be intuitive to think of “almost linear” as a stricter condition than“low total influence”, observe that these conditions are incomparable. In fact there are functionswhich are ε-close to linear, yet have total influence of Ω (εn). For example, consider a functionf that is determined by a xor of Ω (n) variables with probability ε, and by a single variable withprobability 1− ε:

f =

log 1ε∧

i=1

xi

∧ n/2⊕

j=1

yj

∨¬ log 1

ε∧i=1

xi

∧ (z)

In the other direction, one can of course consider the xor of two variables, which has a totalinfluence of 2, yet it is Ω (1)-far from linear. Nonetheless results such as Friedgut’s Lemma[13] extend trivially to Boolean functions which are close to low-influence functions becausethey only claim that the Boolean function is close to a junta.

16

Chapter 4

Our Main Results

In this work we consider the following relaxation of linearity in the premise of the FKN The-orem: Given a partition of the variables Ij and a function fj (not necessarily Boolean orsymmetric) on the variables in each subset, we look at the premise that the Boolean function fis close to a linear combination of the fj’s. Our main result (Corollary 10) states, loosely, thatany such f must be close to being completely dictated by its restriction fk to the variables in asingle subset Ik of the partition.

While making a natural generalization of the well-known FKN Theorem, our work also hasa surprising side: In the FKN Theorem and similar results, if a function is ε-close to linear thenit is (K · ε)-close to a dictatorship, for some constant K. We prove that while this is true in thepartition case for balanced functions, it does not hold in general. In particular, we require f tobe (ε · Varf)-close to linear in the fj’s, in order to prove that it is only (K · ε)-close to beingdictated by some fk. We show (Lemma 11) that this dependence on Varf is tight.

Our first result is a somewhat technical theorem, generalizing the FKN Proposition. Weconsider the sum

∑ni=1 Xi of a sequence of independent random variables. In particular, we do

not assume that the variables are Boolean, symmetric, balanced, or identically distributed. Ourmain technical theorem, which generalizes the FKN Proposition, states that if this sum doesnot “behave like” any single variable Xk, then it is also far from Boolean. In other words, if asum of independent random variables is close to a Boolean function then most of its variancecomes from only one variable.

We show that∑Xi is far from Boolean by proving a lower bound on the variance of its

absolute value, Var |∑Xi|. Note that for any Boolean function f , |f | = 1 everywhere, and

thus Var |f | = 0. Therefore the lower bound on Var |∑Xi| is in fact also a lower bound on the

(semi-)distance from the nearest Boolean function:

Var∣∣∣∑Xi

∣∣∣ ≤ minf is Boolean

∥∥∥f −∑Xi

∥∥∥2

2

By saying that the sum∑n

i=1Xi “behaves like” a single variable Xk, we mean that theirdifference is almost a constant function; i.e. that

minc∈R

∥∥∥∥∥Xk −∑i

Xi − c

∥∥∥∥∥2

2

=

∥∥∥∥∥∑i 6=k

Xi − E∑i 6=k

Xi

∥∥∥∥∥2

2

= Var∑i 6=k

Xi

is small.

17

Furthermore, the definition of “small” depends on the expectation and variance of the sumof the sequence, which we denote by E and V

E = E∑

Xi =∑

EXi

V = Var∑

Xi =∑

VarXi

Formally, our main technical theorem states that

Theorem 9. Let (Xi)ni=1 be a sequence of independent (not necessarily symmetric) random

variables, and let E and V be the expectation and variance of their sum, respectively. Then forsome universal constant K2 ≤ 13104 we have

Var

∣∣∣∣∣∑i

Xi

∣∣∣∣∣ ≥ V · Var∑

i 6=kXi

K2 (V + E2)

The main motivation for proving this theorem is that it implies the following generalizationof the FKN Theorem:

Intuitively, while the FKN Theorem holds for Boolean functions that are almost linear inindividual variables, we generalize to functions that are almost linear with respect to a partitionof the variables.

Formally, let f : ±1m → ±1 and let I1, . . . , In be a partition of [m]; denote by fj therestriction of f to each subset of variables:

fj =∑

φ 6=S⊆Ij

f (S)χS

Our main corollary states that if f behaves like the sum of the fj’s then it behaves like somesingle fk:

Corollary 10. Let f , Ij’s, and fj’s be as defined above. Suppose that f is concentrated oncoefficients that do not cross the partition, i.e.:∑

S : ∃j, S⊆Ij

f 2 (S) ≥ 1− (ε · Varf)

Then for some k ∈ [n], f is close to fk + f (φ):∥∥∥f − fk − f (φ)∥∥∥2

2≤ K2 · ε

In particular, notice that it implies that f is concentrated on the variables in a single subsetIk.

Unlike the FKN Theorem and many similar statements, it does not suffice to assume that fis ε-close to linear. Our main results require a dependence on the variance of f . We prove inSection 8.1 that this dependence is tight up to a constant factor by constructing an example forwhich Varf = o (1) and f is (ε · Varf)-close to linear with respect to a partition, but f is stillΩ (ε)-far from being dictated by any subset.

18

Lemma 11. Corollary 10 is tight up to a constant factor. In particular, the division by Varf isnecessary.

More precisely, there exists a series of functions f (m) : ±12m → ±1 and a partition(I

(m)1 , I

(m)2

)s.t. the restrictions

(f

(m)1 , f

(m)2

)of f (m) to variables in I(m)

j satisfy∑S : ∃j, S⊆I(m)

j

f 2 (S) = 1−O(2−m · Varf

)but for every j ∈ 1, 2∥∥∥∥f (m) − f (m)

j − f (m)j (φ)

∥∥∥∥2

2

= θ(2−m

)= ω

(2−m · Varf

)4.1 From our results to the FKN TheoremWe claim that our results generalize the FKN Theorem. For a constant variance, the FKNTheorem indeed follows immediately from Corollary 10 (for some worse constant KFKN ≥K2

Varf ). However, because the premise of Corollary 10 depends on the variance, it may not beobvious how to obtain the FKN Theorem for the general case, where the variance may go tozero. Nonetheless, we note that thanks to an observation by Guy Kindler [15, 36] the FKNTheorem follows easily once the special case of balanced functions is proven:

Given a Boolean function f : ±1n → ±1, we define a balanced Boolean functiong : ±1n+1 → ±1 that will be as close to linear as f ,

g (x1, x2, . . . , xn, xn+1) = xn+1 · f (xn+1 · x1, xn+1 · x2, . . . , xn+1 · xn)

First, notice that g is indeed balanced because:

2E [g (X;xn+1)] = E [f (X)]− E [f (−X)] = E [f (X)]− E [f (X)] = 0

Where the second equality holds because under uniform distribution taking the expectationover X is the same as taking the expectation over −X .

Observe also that every monomial f (S)χS (X) in the Fourier representation of f (X) ismultiplied by x|S|+1

n+1 in the Fourier transform of g (X;xn+1). (The |S|+1 in the exponent comesfrom |S| for all the variables that appear in the monomial, and another 1 for the xn+1 outsidethe function). Since xn+1 ∈ ±1, for odd |S| we have that x|S|+1

n+1 = 1, and the monomial doesnot change, i.e. g (S) = f (S); for even |S|, x|S|+1

n+1 = xn+1, so g (S ∪ n+ 1) = f (S). Inparticular, the total weight on the first and zeroth level of the Fourier representation is preservedbecause

g (i) = f (i) ; g (n+ 1) = f (φ) (4.1)

If f satisfies the premise for the FKN Theorem, i.e. if∑|S|≤1 f

2 (S) ≥ 1 − ε, then from(4.1) it is clear that the same also holds for g. From the FKN Theorem for the balanced specialcase we deduce that g is (K · ε)-close to a dictatorship, i.e. there exists k ∈ [n+ 1] such thatg (k)2 ≥ 1− (K · ε). Therefore by (4.1) f is also (K · ε)-close to either a dictatorship (whenk ∈ [n]) or a constant function (when k = n + 1). The FKN Theorem for balanced functionsfollows as a special case of our main results, and therefore this work also provides an alternativeproof for the FKN Theorem.

19

Chapter 5

High-Level Outline of the Proof

Before proving Theorem 9 for a sequence of n variables, we begin with the special case of onlytwo random variables:

Lemma 12. Let X, Y be any two independent random variables, and let E and V be theexpectation and variance of their sum, respectively. Then for some universal constant K1 ≤4368,

Var |X + Y | ≥ V ·min VarX,VarY K1 (V + E2)

(5.1)

Intuitively, in the expression on the left-hand side of (5.1) we consider the sum of twoindependent variables, which we may expect to variate more than each variable separately. Percontra, the same side of (5.1) also has the variance of the absolute value, which is in generalsmaller than just the variance (without absolute value). Lemma 12 bounds this loss of variance.

On a high level, the main idea of the proof of lemma 12 is separation to two cases, dependingon the variance of the absolute value of the random variables, relative to the original varianceof the variables (without absolute value):

1. If both |X + EY | and |Y + EX| have relatively small variance, thenX+EY and Y +EXcan be both approximated by random variables with constant absolute values. In this casewe prove the result by a case analysis.

2. If either |X + EY | or |Y + EX| has a relatively large variance, we prove an auxiliarylemma that states that the variance of the absolute value of the sum, Var |X + Y | is notmuch smaller than the variance of the absolute value of either variable (Var |X + EY | ,Var |Y + EX|):

Lemma 13. Let X, Y be any two independent random variables, and let E be the expectationof their sum. Then for some universal constant K0 ≤ 4,

Var |X + Y | ≥ max Var |X + EY | ,Var |Y + EX|K0

20

Note that in this lemma, unlike the former statements discussed so far, the terms on the right-hand side also appear in absolute value. In particular, this makes the inequality hold withrespect to the maximum of the two variances.

We find it of separate interest to note that this lemma is tight in the sense that it is necessaryto take a non-trivial constant K0 > 1:

Claim 14. A non-trivial constant is necessary for Lemma 13. More precisely, there exists twoindependent balanced random variables X, Y , such that the following inequality does not holdfor any value K0 < 4/3:

Var∣∣X + Y

∣∣ ≥ max

Var∣∣X∣∣ ,Var

∣∣Y ∣∣K0

(In particular, it is interesting to note that K0 > 1.)

Discussion and proof appear in section 8.2.

21

Chapter 6

Proofs

6.1 From variance of absolute value to variance of absolutevalue of sum: proof of lemma 13

We begin with the proof of a slightly more general form of the lemma 13:

Lemma 15. Let X, Y be any two independent balanced random variables, and let E be anyreal number. Then for some universal constant K0 ≤ 4,

Var∣∣X + Y + E

∣∣ ≥ max

Var∣∣X + E

∣∣ ,Var∣∣Y + E

∣∣K0

(Lemma 13 follows easily by taking E = E [X + Y ].)

Proof. This lemma is relatively easy to prove partly because the right-hand side contains themaximum of the two variances. Thus, it suffices to prove separately that the left-hand side is

greater or equal toVar|X+E|

K0and to

Var|Y+E|K0

. Wlog we will prove:

Var∣∣X + Y + E

∣∣ ≥ Var∣∣X + E

∣∣K0

(6.1)

Separating into two inequalities is particularly helpful, because now Y no longer appears inthe right-hand side. Using the convexity of the variance, we can reduce the proof of the aboveinequality to proving it for the special case where Y is a balanced variable with only two valuesin its support.Claim 16. For any balanced random variable Y there exists a sequence of balanced randomvariables

Yk

, each with a support of size at most two, and a probability distribution λkover the sequence, s.t.

Var∣∣X + Y + E

∣∣ ≥ Ek∼λk[Var∣∣X + Yk + E

∣∣]The proof appears in section 7.1.It follows from claim 16 that Var

∣∣X + Y + E∣∣ is in particular greater or equal to Var

∣∣X + Yk + E∣∣

for some k. Therefore, in order to prove lemma 15, it suffices to prove the lower bound

22

Var∣∣X + Y + E

∣∣ (with respect to Var∣∣X + E

∣∣) for every balanced Y with only two possiblevalues.

Recall (fact 5) that we can express the variances of∣∣X + Y + E

∣∣ and∣∣X + E

∣∣ in terms ofthe expected squared distance between two random evaluations. We use a simple case analysisto prove that adding any balanced Y with support of size two preserves (up to a factor of 1

4) the

expected squared distance between any two possible evaluations of∣∣X + E

∣∣.Claim 17. For every two possible evaluations x1, x2 in the support of

(X + E

),

E(y1,y2)∼Y×Y (|x1 + y1| − |x2 + y2|)2 ≥ 1

4(|x1| − |x2|)2

The proof appears in section 17.

Finally, in order to achieve the bound on the variances (inequality (6.1)), take the expecta-tion over all choices of (x1, x2) ∼

(X + E

)×(X + E

):

Var∣∣X + Y + E

∣∣ =1

2Ex1,x2,y1,y2 (|x1 + y1| − |x2 + y2|)2

≥ 1

8Ex1,x2 (|x1| − |x2|)2

=1

4· Var

∣∣X + E∣∣

(Where the two equalities follow by Fact 5, and the inequality by claim 17.)

6.2 From variance of absolute value of sum to variance:proof of lemma 12

We advance to the more interesting lemma 12, where we bound the variance of the |X + Y |with respect to the minimum of VarX and VarY . Intuitively, in the expression on the left-handside of (6.2) we consider the sum of two independent variables, which we may expect to variatemore than each variable separately. Per contra, the same side of (6.2) also has the variance ofthe absolute value, which is in general smaller than just the variance (without absolute value).We will now bound this loss of variance.

Lemma. (Lemma 12) Let X, Y be any two independent random variables, and let E and Vbe the expectation and variance of their sum, respectively. Then for some universal constantK1 ≤ 4368,

Var |X + Y | ≥ V ·min VarX,VarY K1 (V + E2)

(6.2)

Proof. We change variables by subtracting the expectation of X and Y ,

X = X − EXY = Y − EY

Note that the new variables X, Y are balanced. Also observe that we are now interested inshowing a lower bound for

Var |X + Y | = Var∣∣X + Y + E

∣∣23

On a high level, the main idea of the proof is separation to two cases, depending on thevariance of the absolute value of the random variables, relative to the original variance of thevariables (without absolute value):

1. If either Var∣∣X + E

∣∣ or Var∣∣Y + E

∣∣ has a relatively large variance, we can simply applylemma 15 that states that the variance of the absolute value of the sum, Var

∣∣X + Y + E∣∣

is not much smaller than the variance of the absolute value of either variable (Var∣∣X + E

∣∣and Var

∣∣Y + E∣∣ ).

2. If both Var∣∣X + E

∣∣ and Var∣∣Y + E

∣∣ have relatively small variance, then X + E andY + E can be both approximated by random variables with constant absolute values,(i.e. random variables with supports ±dX and ±dY for some reals dX and dY ,respectively). For this case we prove the result by a case analysis.

Formally, let 0 < a < 1 be some parameter to be determined later, and denote

MXY = min

VarX,VarY

= min VarX,VarY

1. If either of the variances of the absolute values is large, i.e.

max

Var∣∣X + E

∣∣ ,Var∣∣Y + E

∣∣ ≥ a ·MXY

Then we can simply apply lemma 15 to obtain:

Var∣∣X + Y + E

∣∣ ≥ max

Var∣∣X + E

∣∣ ,Var∣∣Y + E

∣∣K0

≥ a ·MXY

K0

2. On the other hand, if both the variances of the absolute values are small, i.e.

max

Var∣∣X + E

∣∣ ,Var∣∣Y + E

∣∣ < a ·MXY (6.3)

then X + E and Y + E are almost constant in absolute value.

In particular, let the variables X ′ and Y ′ be the constant-absolute-value approximationsto X + E and Y + E, respectively:

X ′ = sign(X + E

)· E∣∣X + E

∣∣Y ′ = sign

(Y + E

)· E∣∣Y + E

∣∣From the precondition (6.3) it follows that

(X + E

)and

(Y + E

)are close to X ′ and

Y ′, respectively: ∥∥X − (X ′ − E)∥∥2

2< a ·MXY∥∥Y − (Y ′ − E)

∥∥2

2< a ·MXY

In particular, by the 2-relaxed triangle inequality (facts 2 and 8) we have that the variancesare similar:

VarX >1

2Var (X ′ − E)− a · VarX (6.4)

VarY >1

2Var (Y ′ − E)− a · VarY (6.5)

Var∣∣X + Y + E

∣∣ ≥ 1

2Var |Y ′ +X ′ − E| −

∥∥∣∣X + Y + E∣∣− |Y ′ +X ′ − E|

∥∥2

2

>1

2Var |Y ′ +X ′ − E| − 4a ·MXY (6.6)

24

Hence, it will be useful to obtain a bound equivalent to (6.2), but in terms of the approx-imating variables, X ′, Y ′. We will then use the similarity of the variances to extend thebound to X, Y and complete the proof of the lemma.

We use case analysis over the possible evaluations of X ′ and Y ′ to prove the followingclaim:

Claim 18. Let X, Y be balanced random variables and letX ′, Y ′ be the constant-absolute-value approximations of X, Y , respectively:

X ′ = sign(X + E

)· E∣∣X + E

∣∣Y ′ = sign

(Y + E

)· E∣∣Y + E

∣∣Then the variance of the absolute value of X ′ + Y ′ − E is relatively large:

Var |X ′ + Y ′ − E| ≥ Var (X ′ − E) Var (Y ′ − E)

16(VarX + E2

)The proof appears in section 7.3.

Now we use the closeness of the approximating variables X ′, Y ′ to return to a bound forthe balanced variables X, Y :

Var∣∣X + Y + E

∣∣ ≥ 1

2Var |X ′ + Y ′ − E| − 4a ·MXY

≥ Var (X ′ − E) Var (Y ′ − E)

32 (V + E2)− 4a ·MXY

≥ (1− 2a)2

128

VarX · VarYV + E2

− 4a ·MXY

≥ 1− 4a

128

VarX · VarYV + E2

− 4a ·MXY

(Where the first line follows by equation (6.6); the second line from claim 18; the thirdfrom (6.4) and (6.5); and the fourth is true because (1− 2a)2 ≥ 1− 4a.)

Combining the two cases, we have that

Var∣∣X + Y + E

∣∣ ≥ min

a ·MXY

K0

,1− 4a

128

VarX · VarYV + E2

− 4a ·MXY

Finally, to optimize the choice of the parameter a, set

a

K0

=1− 4a

128

max

VarX,VarY

V + E2− 4a

≥ 1− 4a

256

V

V + E2− 4a(

1024 + 4V

V + E2+

256

K0

)a ≥ V

V + E2

a ≥ K0

1028K0 + 256· V

V + E2

25

Therefore,

Var∣∣X + Y + E

∣∣ ≥ a ·MXY

K0

≥ 1

1028K0 + 256· V

V + E2·MXY

and thus (5.1) holds for K1 = 1028K0 + 256 ≤ 4368.

6.3 Proof of the main theoremLemma 12 bounds the variance of the absolute value of the sum of two independent variables,Var |X + Y |, in terms of the variance of each variable. In the following theorem we generalizethis claim to a sequence of n independent variables.

Theorem. (Theorem 9) Let (Xi)ni=1 be a sequence of independent (not necessarily symmetric)

random variables, and let E and V be the expectation and variance of their sum, respectively.Then for some universal constant K2 ≤ 13104 we have

Var

∣∣∣∣∣∑i

Xi

∣∣∣∣∣ ≥ V · Var∑

i 6=kXi

K2 (V + E2)

Proof. In order to generalize lemma 12 to n variables, consider the two possible cases:

1. If for every i, VarXi ≤ V/3, then we can partition [n] into two sets A,B s.t.

V

3≤∑i∈A

VarXi,∑i∈B

VarXi ≤2V

3

Thus, substituting X =∑

i∈AXi and Y =∑

i∈BXi in lemma 12, we have that for everyk

Var

∣∣∣∣∣∑i

Xi

∣∣∣∣∣ = Var

∣∣∣∣∣∑i∈A

Xi +∑i∈B

Xi

∣∣∣∣∣ ≥ V ·∑

i∈B VarXi

K1 (V + E2)≥

V ·∑i 6=kXi

3

K1 (V + E2)

2. Otherwise, if VarXk > V/3, apply lemma 12 with X = Xk and Y =∑

i 6=kXi to get:

Var

∣∣∣∣∣∑i

Xi

∣∣∣∣∣ = Var

∣∣∣∣∣Xk +∑i 6=k

Xi

∣∣∣∣∣ ≥ V3·∑

i 6=kXi

K1 (V + E2)

6.4 Proof of the extension to FKN TheoremCorollary 10, the generalization of the FKN Theorem, follows easily from Theorem 9.

26

Corollary. (Corollary 10) Let f : ±1m → ±1 be a Boolean function, (Ij)nj=1 a partition

of [m]. Also, for each Ij let fj be the restriction of f to the variables with indices in Ij . Supposethat f is concentrated on coefficients that do not cross the partition, i.e.:∑

S : ∃j, S⊆Ij

f 2 (S) ≥ 1− (ε · Varf)

Then for some k ∈ [n], f is close to fk + f (φ):∥∥∥f − fk − f (φ)∥∥∥2

2≤ K2 · ε

Proof. From the premise it follows that f is ε ·Varf -close to the sum of the fj’s and the emptycharacter: ∥∥∥∥∥f −∑

j

fj − f (φ)

∥∥∥∥∥2

2

≤ ε · Varf

Since f is Boolean, this implies in particular that

Var

∣∣∣∣∣∑j

fj + f (φ)

∣∣∣∣∣ ≤ ε · Varf

Thus by the main theorem, for some k ∈ [n]

Var∑

j fj · Var∑

j 6=k fj

K2 ·(

Var∑

j fj + f (φ)2) ≤ ε · Varf

Var∑j 6=k

fj ≤ K2 · ε

27

Chapter 7

Proofs of technical claims

7.1 Convexity of the variance function: proof of claim 16Recall that we are trying to bound the variance of

∣∣X + Y + E∣∣ for balanced X and Y from

below. In order for Y to be balanced, its support must be of size at least two1. We claim thatbecause of the convexity property of the variance function, considering a random variable Y ofthis minimal support size suffices for proving a lower bound on Var

∣∣X + Y + E∣∣.

Claim. (Claim 16) For any balanced random variable Y there exists a set of balanced randomvariables

Yα

, each with a support of size at most two, and a probability distribution gY overit, s.t.

Var∣∣X + Y + E

∣∣ ≥ Eα∼gY[Var∣∣X + Yα + E

∣∣]Proof. Let fY be the probability density function of Y . For every a, b in the support of Ys.t. sign (a) 6= sign (b), let Ya,b be the unique balanced variable with support a, b. We showthat the distribution of Y is equal to a mixture of the distributions of Ya,b.

Define the random variable Z to be sampled from the distribution of Ya,b with probabilitydefined by the density function

gY (a, b) =(|a|+ |b|) · fY (a) fY (b)∫

c≥0fY (c) |c| dc

Because Y is balanced, it follows that also

gY (a, b) =(|a|+ |b|) · fY (a) fY (b)∫

c<0fY (c) |c| dc

1We assume for simplicity of notation that 0 is not in the support of Y ; the result for the general case followstrivially.

28

Observe that Y and Z have the same distribution:

fZ (a) =

∫b : sign(b)6=sign(a)

gY (a, b) · Pr[Ya,b = a

]db

=


(|a|+ |b|) · fY (a) fY (b)∫c : sign(c) 6=sign(a)

fY (c) |c| dc· |b||a|+ |b|

db

= fY (a) ·


fY (b) |b| db∫c : sign(c) 6=sign(a)

fY (c) |c| dc= fY (a)

Therefore,

Var∣∣X + Y + E

∣∣ = Var∣∣X + Z + E

∣∣= E

(∣∣X + Z + E∣∣− E

∣∣X + Z + E∣∣)2

= Ea,b∼gY[E(∣∣X + Ya,b + E

∣∣− E∣∣X + Z + E

∣∣)2]

≥ Ea,b∼gY[E(∣∣X + Ya,b + E

∣∣− E∣∣X + Ya,b + E

∣∣)2]

= Ea,b∼gY[Var∣∣X + Ya,b + E

∣∣]

7.2 Expected squared distance: proof of claim 17We use case analysis to prove that adding any balanced Y with support of size two preserves(up to a factor of 1

4) the expected squared distance between any two possible evaluations of∣∣X + E

∣∣.Claim. (Claim 17) For every two possible evaluations x1, x2 in the support of

(X + E

),

E(y1,y2)∼Y×Y (|x1 + y1| − |x2 + y2|)2 ≥ 1

4(|x1| − |x2|)2

Proof. Denote

pY = Pr[Y ≥ 0

]Because Y is balanced, its two possible values must be of the form

dpY, −d

1−pY

, for some

d ≥ 0. Assume wlog that x1 ≥ 0 and |x1| ≥ |x2|.We divide to cases based on the value of pY (see also figure 7.1):

1. If pY ≥ 12

then with probability at least 14

both evaluations of Y are non-negative, inwhich case the distance between |x1| and |x2| can only increase:

Ey1,y2 (|x1 + y1| − |x2 + y2|)2 ≥ Pr [y1, y2 ≥ 0] ·(x1 +

d

pY−∣∣∣∣x2 +

d

pY

∣∣∣∣)2

≥ Pr [y1, y2 ≥ 0] ·(x1 +

d

pY− |x2| −

d

pY

)2

≥ 1

4(|x1| − |x2|)2

29

Figure 7.1: Case analysis in the proof of claim 17The top figure corresponds to the case where both y1 and y2 are non-negative, i.e. y1 = y2 =dpY≥ 0. Notice that the distance between |x1 + y1| and |x2 + y2| is at least the distance between

|x1| and |x2|. This case occurs with probability p2Y .

The bottom figure corresponds to the case where pY < 12

and y1 = dpY

> d1−pY

= |y2|.Notice that in this case the distance also cannot decrease. In particular, when pY < 1

4, we

have y1 = dpY

> 3 d1−pY

= 3 |y2|, and therefore the distance actually increases by a significantamount. This case occurs with probability pY (1− pY ).

2. If 14≤ pY < 1

2then with probability at least 1

4, y1 is non-negative; we also use d

pY≥∣∣∣ −d1−pY

∣∣∣ implies y1 ≥ |y2|:

Ey1,y2 (|x1 + y1| − |x2 + y2|)2 ≥ Pr [y1 ≥ 0] · (|x1| − |x2|+ y1 − |y2|)2

≥ Pr [y1 ≥ 0] · (|x1| − |x2|)2

≥ 1

4(|x1| − |x2|)2

3. Else, if pY < 14, we need to sum over the possible signs of y1, y2, and use the fact that d

pY

30

is much greater than∣∣∣ −d1−pY

∣∣∣:Ey1,y2 (|x1 + y1| − |x2 + y2|)2 ≥ Pr [y1 ≥ 0, y2 ≥ 0] · (|x1| − |x2|)2 +

Pr [y1 ≥ 0, y2 < 0] ·(|x1| − |x2|+

(d

pY− d

1− pY

))2

+

Pr [y1 < 0, y2 < 0] ·(|x1| − |x2| −

2d

1− pY

)2

≥ p2Y (|x1| − |x2|)2 +

pY (1− pY )

((|x1| − |x2|)2 + 2

(d

pY− d

1− pY

)(|x1| − |x2|)

)+

1

4(1− pY )2

((|x1| − |x2|)2 − 2

2d

1− pY(|x1| − |x2|)

)≥ 1

4(|x1| − |x2|)2 + 2d

((1− 2 · 1

4

)(1− pY )− pY

)(|x1| − |x2|)

≥ 1

4(|x1| − |x2|)2

7.3 Constant absolute value: proof of claim 18We use a brute-force case analysis to prove a relative lower bound on the variance in absolutevalue of a sum of two variables with constant absolute values:

Claim. (Claim 18) Let X, Y be balanced random variables and let X ′, Y ′ be the constant-absolute-value approximations of X, Y , respectively:

X ′ = sign(X + E

)· E∣∣X + E

∣∣Y ′ = sign

(Y + E

)· E∣∣Y + E

∣∣Then the variance of the absolute value of X ′ + Y ′ − E is relatively large:

Var |X ′ + Y ′ − E| ≥ Var (X ′ − E) Var (Y ′ − E)

16(VarX + E2

)Proof. Denote pX = Pr

[X + E ≥ 0

]and dX = E

∣∣X + E∣∣ (and analogously for pY , dY ).

Observe that

dX ≤√

E[(X + E

)2]

=

√Var(X + E

)+(E[X + E

])2=√

VarX + E2

Thus we can bound (1− pX) pX from below by:

VarX ′ = (2dX)2 Pr [X ′ ≤ 0] Pr [X ′ > 0] ≤ 4(VarX + E2

)(1− pX) pX

(1− pX) pX ≥VarX ′

4(VarX + E2

) (7.1)

31

Figure 7.2: Case analysis in the proof of claim 18

• With probability 2 (1− pX)2 · (1− pY ) pY both x’s are negative, and y’s are of oppositesigns. Notice that since we assume wlog E ≥ 0 and dX ≥ dY , the distance between|−dX − dY − E| and |−dX + dY − E| is the same as the distance between−dX−dY−Eand −dX + dY −E (marked by dashed line on both figures); it is therefore always 2dY .

• With probability 2p2X · (1− pY ) pY both x’s are positive, and y’s are of opposite signs.

Notice that the distance between |dX + dY − E| and |dX − dY − E| (marked by the solidlines) is either 2dY (as in the top figure) or |2dX − 2E| when dX − dY − E ≤ 0 (as inthe bottom figure).

• With probability 2 (1− pX) pX · (1− pY ) pY the x’s and y’s are of correlated signs. No-tice that the distance between |dX + dY − E| and |−dX − dY − E| (marked by the dottedlines) is either 2E (as in both figures) or 2dX + 2dY when dX +dY −E ≤ 0 (not shown).

Also, for Y ′ we haveVarY ′ = pY · (1− pY ) (2dY )2

Assume wlog that dY < dX and E > 0. Recall (fact 5) that we can write the variance interms of the expected squared distance between evaluations. Then, summing over the different

32

Chapter 8

Tightness of results

8.1 Tightness of the main resultThe premise of main result, corollary 10, requires f to be (ε · Varf)-close to a sum of indepen-dent functions. One may hope to avoid this factor of Varf and achieve a constant ratio betweenε in the premise (K · ε) in the conclusion, as in the FKN Theorem. However, we show that thedependence on Varf is necessary.

Lemma. (Lemma 11) Corollary 10 is tight up to a constant factor. In particular, the divisionby Varf is necessary.

More precisely, there exists a series of functions f (m) : ±12m → ±1 and a partition(I

(m)1 , I

(m)2

)s.t. the restrictions

(f

(m)1 , f

(m)2

)of f (m) to variables in I(m)

j satisfy∑S : ∃j, S⊆I(m)

j

f 2 (S) = 1−O(2−m · Varf

)but for every j ∈ 1, 2∥∥∥∥f (m) − f (m)

j − f (m)j (φ)

∥∥∥∥2

2

= θ(2−m

)= ω

(2−m · Varf

)Proof. By example. Let

X = ∧mi=1xi

Y = ∧mi=1yi

f = X ∨ Y

Now the variance of f is Θ (2−m):

Varf = 4 Pr [f = 1] Pr [f = −1] = 4(1−Θ

(2−m

))Θ(2−m

)= Θ

(2−m

)Also, f is O (2−2m)-close to a sum of independent functions:

f =X + Y +X · Y − 1

2

‖f − (X + Y − 1)‖22 =

∥∥∥∥(X − 1) (Y − 1)

2

∥∥∥∥2

2

= 4 · 2−2m

Yet, f is Ω (2−m)-far from any function that depends on either only the xi’s or only the yi’s.

34

8.2 Tightness of lemma 13Lemma 13 compares the variance of absolute value of sum of independent variables, to thevariance of absolute value of each variable. Since both sides of the inequality consider absolutevalues, it may seem as if we should only adding variation on the left side by summing inde-pendent variables. In particular, one may hope that the inequality should hold trivially, withK0 = 1. We show that this is not the case.

Claim. (Claim 14) A non-trivial constant is necessary for Lemma 13. More precisely, thereexists two independent balanced random variables X, Y , such that the following inequalitydoes not hold for any value K0 < 4/3:

Var∣∣X + Y

∣∣ ≥ max

Var∣∣X∣∣ ,Var

∣∣Y ∣∣K0

(In particular, it is interesting to note that K0 > 1.)

Proof. By example. Let

Pr [X = 0] =1

2

Pr [X = ±2] =1

4

Pr [Y = ±1] =1

2

Then we have thatEX = EY = 0

Pr [|X + Y | = 1] =3

4

Pr [|X + Y | = 3] =1

4

and thereforeVar |X + Y | = 3

4=

3

4Var |X|

35

Chapter 9

Discussion

In this work, we extend the FKN Theorem to Boolean functions of variables that are not uni-formly distributed over ±1. In particular, we consider variables which are neither Booleannor symmetric. More importantly, we extend the FKN Theorem to functions which are almostlinear with respect to a partition of the variables - i.e. Boolean functions that are close to asum of independent functions. Both are interesting generalizations of an important theoremand improve our understanding of the behaviour of Boolean functions.

9.1 HypercontractivityMany theorems about Boolean functions rely on hypercontractivity theorems such as the Bonami-Beckner Inequality ([6, 3]). Writing a real-valued function over ±1n as a polynomial yieldsa distribution over the monomials’ degrees 0, . . . n, where the weight of k is the sum ofrelative weights of monomials of degree k. Hypercontractivity inequalities bound the ratios be-tween norms of real-valued functions over ±1n in terms of this distribution of weights overtheir monomials’ degrees. In this work it is not clear how to use such inequalities because thefunctions in question may have an arbitrary weight on high degrees within each subset.

All of the proofs presented in this work are completely self-contained and based on ele-mentary methods. In particular, we do not use any hypercontractivity theorem. This simplicitymakes our work more accessible and intuitive. This trend is exhibited by some recent relatedworks, e.g. [36, 21, 28], that also present proofs that do not use hypercontractivity.

9.2 Open problemsIt would be interesting to extend the notion of analysis of Boolean functions over partitions ofthe variables to some of the related works mentioned in the introduction. For example, one maydefine an appropriate measure of “low degree” over partitions and extend Friedgut’s Lemma orBourgain’s Junta Theorem and ask whether functions that are close to low degree with respectto a partition of the variables are close to juntas with respect to the same partition. Alternatively,one may look at extending Corollary 10 to biased or other non-uniform distributions over theBoolean hypercube.

36

9.2.1 Interesting ConjecturesWhile the dependence on the variance is tight, it seems counter-intuitive. We believe that it ispossible to come up with a structural characterization instead.

Observe that the function used for the counter example in lemma 11 is essentially the (non-balanced) tribes function, i.e. OR of two AND’s. All the extreme examples we discovered sofar have a similar structure of an independent Boolean function on each subset of the variables(e.g. AND on a subset of the variables), and then a “central” Boolean function that takes asinputs the outputs of the independent functions (e.g. OR of all the AND’s).

We conjecture that such a composition of Boolean functions is essentially the only way toconstruct counter examples to the “naive extension” of the FKN Theorem. In other words, if aBoolean function is close to linear with respect to a partition of the variables, then it is close tothe application of a central Boolean function g on the outputs of independent Boolean functionsgj’s, one over each subset Ij . Formally,

Conjecture. Let f : ±1m → ±1 be a Boolean function, (Ij)nj=1 a partition of [m]. Sup-

pose that f is concentrated on coefficients that do not cross the partition, i.e.:∑S : ∃j, S⊆Ij

f 2 (S) ≥ 1− ε

Then there exist a “central” Boolean function g : ±1n → ±1 and a Boolean function oneach subset hj : ±1|Ij | → ±1 s.t. the composition of g with the hj’s is a good approxima-tion of f . I.e. for some universal constant K,∥∥f (X)− g

(h1

((xi)i∈I1

), h2

((xi)i∈I2

), . . . , hn

((xi)i∈In

))∥∥2

2≤ K · ε

Intuitively, this conjecture claims that the central function only needs to know one bit ofinformation on each subset in order to approximate f .

We believe that such a conjecture could have useful applications because one can often de-duce about the properties of the composition of independent functions f = g (h (xI1) , h (xI2) , . . . , h (xIn))from the properties of the composed functions g and h. For example if f , g, and h are as above,then the total influence of f is the product of the total influences of g and h.

In fact, we believe that an even stronger claim holds. It seems that for all the Booleanfunctions that are almost linear with respect to a partition of the variables, the “central” functiong is either an OR or an AND of some of the functions on the subsets hj . Formally,

Conjecture. (Stronger variant) Let f : ±1m → ±1 be a Boolean function, (Ij)nj=1 a

partition of [m]. Suppose that f is concentrated on coefficients that do not cross the partition,i.e.: ∑

S : ∃j, S⊆Ij

f 2 (S) ≥ 1− ε

Then there exist a Boolean function hj : ±1|Ij | → ±1 for the each j ∈ [n] s.t. either theOR or the AND of those hj’s is a good approximation of f . I.e. for some universal constant

37

K, ∥∥∥f (X)−ORj∈[n]

(hj

((xi)i∈Ij

))∥∥∥2

2≤ K · ε

-or-∥∥∥f (X)− ANDj∈[n]

(hj

((xi)i∈Ij

))∥∥∥2

2≤ K · ε

38

Bibliography

[1] Noga Alon, Irit Dinur, Ehud Friedgut, and Benny Sudakov. Graph products, fourier anal-ysis and spectral techniques. GAFA, 14:913–940, 2004.

[2] Nikhil Bansal and Subhash Khot. Optimal long code test with one free bit. In FOCS,pages 453–462, 2009.

[3] William Beckner. Inequalities in fourier analysis. Annals of Mathematics, 102:159–182,1975.

[4] Mihir Bellare, Oded Goldreich, and Madhu Sudan. Free bits, pcps, andnonapproximability-towards tight results. SIAM J. Comput., 27(3):804–915, 1998.

[5] Michael Ben-Or and Nathan Linial. Collective coin flipping, robust voting schemes andminima of banzhaf values. In Foundations of Computer Science, 1985., 26th AnnualSymposium on, pages 408 –416, oct. 1985.

[6] Aline Bonami. Étude des coefficients de fourier des fonctions de lp(g). Annales del’institut Fourier, 20:335–402, 1970.

[7] J. Bourgain. On the distribution of the fourier spectrum of boolean functions. IsraelJournal of Mathematics, 131:269–276, 2002. 10.1007/BF02785861.

[8] Shuchi Chawla, Robert Krauthgamer, Ravi Kumar, Yuval Rabani, and D. Sivakumar.On the hardness of approximating multicut and sparsest-cut. Computational Complex-ity, 15(2):94–114, 2006.

[9] Irit Dinur. The pcp theorem by gap amplification. In STOC, pages 241–250, 2006.

[10] Irit Dinur, Ehud Friedgut, Guy Kindler, and Ryan O’Donnell. On the fourier tails ofbounded functions over the discrete cube. In Proceedings of the thirty-eighth annualACM symposium on Theory of computing, STOC ’06, pages 437–446, New York, NY,USA, 2006. ACM.

[11] Irit Dinur and Samuel Safra. On the hardness of approximating minimum vertex cover.Annals of Mathematics, 162:2005, 2004.

[12] Dvir Falik and Ehud Friedgut. An algebraic proof of a robust social choice impossibilitytheorem. In FOCS, pages 413–422, 2011.

[13] Ehud Friedgut. Boolean functions with low average sensitivity depend on few coordinates.Combinatorica, 18(1):27–35, 1998.

39

[14] Ehud Friedgut. On the measure of intersecting families, uniqueness and stability. Combi-natorica, 28(5):503–528, 2008.

[15] Ehud Friedgut, Gil Kalai, and Assaf Naor. Boolean functions whose fourier transform isconcentrated on the first two levels. Online version.

[16] Ehud Friedgut, Gil Kalai, and Assaf Naor. Boolean functions whose fourier transform isconcentrated on the first two levels. Advances in Applied Mathematics, 29(3):427 – 437,2002.

[17] Mahya Ghandehari and Hamed Hatami. Fourier analysis and large independent sets inpowers of complete graphs. J. Comb. Theory, Ser. B, 98(1):164–172, 2008.

[18] Johan Håstad. Clique is hard to approximate within n1−o(1). In Proceedings of the 37thAnnual Symposium on Foundations of Computer Science, FOCS ’96, pages 627–, Wash-ington, DC, USA, 1996. IEEE Computer Society.

[19] Johan Håstad. Some optimal inapproximability results. J. ACM, 48(4):798–859, July2001.

[20] Hamed Hatami. A remark on bourgain’s distributional inequality on the fourier spectrumof boolean functions. Online Journal of Analytic Combinatorics, 1, 2006.

[21] Jacek Jendrej, Krzysztof Oleszkiewicz, and Jakub O. Wojtaszczyk. On some extensionsto the fkn theorem. In Phenomena in high dimensions in geometric analysis, randommatrices, and computational geometry, 2012.

[22] J. Kahn, G. Kalai, and N. Linial. The influence of variables on boolean functions. InProceedings of the 29th Annual Symposium on Foundations of Computer Science, SFCS’88, pages 68–80, Washington, DC, USA, 1988. IEEE Computer Society.

[23] Gil Kalai. A fourier-theoretic perspective for the condorcet paradox and arrow’s theo-rem. Discussion Paper Series dp280, The Center for the Study of Rationality, HebrewUniversity, Jerusalem, November 2001.

[24] Subhash Khot and Assaf Naor. Nonembeddability theorems via fourier analysis. In FOCS,pages 101–112, 2005.

[25] Subhash Khot and Oded Regev. Vertex cover might be hard to approximate to within2-epsilon. J. Comput. Syst. Sci., 74(3):335–349, 2008.

[26] Subhash Khot and Nisheeth K. Vishnoi. The unique games conjecture, integrality gap forcut problems and embeddability of negative type metrics into l1. In FOCS, pages 53–62,2005.

[27] Guy Kindler. Property Testing, PCP, and Juntas. PhD thesis, Tel-Aviv University, 2002.

[28] Guy Kindler and Ryan O’Donnell. Gaussian noise sensitivity and fourier tails. In IEEEConference on Computational Complexity, pages 137–147, 2012.

[29] Guy Kindler and Shmuel Safra. Noise-resistant boolean-functions are juntas. 2003.

40

[30] Robert Krauthgamer and Yuval Rabani. Improved lower bounds for embeddings into l1.SIAM J. Comput., 38(6):2487–2498, 2009.

[31] Ashley Montanaro and Tobias J. Osborne. Quantum boolean functions, 2009.

[32] Elchanan Mossel, Ryan O’Donnell, and Krzysztof Oleszkiewicz. Noise stability of func-tions with low influences: invariance and optimality. CoRR, abs/math/0503503, 2005.

[33] Ryan O’Donnell and Rocco A. Servedio. Learning monotone decision trees in polynomialtime. SIAM J. Comput., 37(3):827–844, 2007.

[34] Dana Ron, Ronitt Rubinfeld, Muli Safra, and Omri Weinstein. Approximating the influ-ence of monotone boolean functions in o(

√n) query complexity. In APPROX-RANDOM,

pages 664–675, 2011.

[35] Sushant Sachdeva and Madhur Tulsiani. Cuts in cartesian products of graphs. CoRR,abs/1105.3383, 2011.

[36] Jakub Onufry Wojtaszczyk. Sums of independent variables approximating a boolean func-tion. Submitted, 2010.

41

boolean functions whose fourier transform is concentrated ... · jarett schwartz for comments on an...

Documents