learning with memory and communication constraintsjsteinhardt/talks/communication.pdf · 2015. 12....

120
Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford University [email protected] July 30, 2015 *with John Duchi, Gregory Valiant, and Stefan Wager J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 1 / 20

Upload: others

Post on 28-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Learning with Memory and Communication Constraints

Jacob Steinhardt*

Stanford University

[email protected]

July 30, 2015

*with John Duchi, Gregory Valiant, and Stefan Wager

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 1 / 20

Page 2: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Motivation

Computational constraints becoming bottleneck in many systems.

Not yet a good theory of computationally-bounded statistics.Study sample complexity of resource-constrained learning algorithms.

(Cover, 1969; Hellman & Cover, 1970; Ben-David & Dichterman, 1998;Balcan et al., 2012; Berthet & Rigollet, 2013; Chandrasekaran & Jordan,2013; Duchi, Jordan, & Wainwright, 2013; Zhang et al., 2013; Zhang,Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014;Braverman et al., 2015; S. & Duchi, 2015; S., Valiant, & Wager, 2015)

This work: memory, communication.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 2 / 20

Page 3: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Motivation

Computational constraints becoming bottleneck in many systems.

Not yet a good theory of computationally-bounded statistics.

Study sample complexity of resource-constrained learning algorithms.(Cover, 1969; Hellman & Cover, 1970; Ben-David & Dichterman, 1998;Balcan et al., 2012; Berthet & Rigollet, 2013; Chandrasekaran & Jordan,2013; Duchi, Jordan, & Wainwright, 2013; Zhang et al., 2013; Zhang,Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014;Braverman et al., 2015; S. & Duchi, 2015; S., Valiant, & Wager, 2015)

This work: memory, communication.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 2 / 20

Page 4: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Motivation

Computational constraints becoming bottleneck in many systems.

Not yet a good theory of computationally-bounded statistics.Study sample complexity of resource-constrained learning algorithms.

(Cover, 1969; Hellman & Cover, 1970; Ben-David & Dichterman, 1998;Balcan et al., 2012; Berthet & Rigollet, 2013; Chandrasekaran & Jordan,2013; Duchi, Jordan, & Wainwright, 2013; Zhang et al., 2013; Zhang,Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014;Braverman et al., 2015; S. & Duchi, 2015; S., Valiant, & Wager, 2015)

This work: memory, communication.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 2 / 20

Page 5: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Motivation

Computational constraints becoming bottleneck in many systems.

Not yet a good theory of computationally-bounded statistics.Study sample complexity of resource-constrained learning algorithms.

(Cover, 1969; Hellman & Cover, 1970; Ben-David & Dichterman, 1998;Balcan et al., 2012; Berthet & Rigollet, 2013; Chandrasekaran & Jordan,2013; Duchi, Jordan, & Wainwright, 2013; Zhang et al., 2013; Zhang,Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014;Braverman et al., 2015; S. & Duchi, 2015; S., Valiant, & Wager, 2015)

This work: memory, communication.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 2 / 20

Page 6: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Motivation

Computational constraints becoming bottleneck in many systems.

Not yet a good theory of computationally-bounded statistics.Study sample complexity of resource-constrained learning algorithms.

(Cover, 1969; Hellman & Cover, 1970; Ben-David & Dichterman, 1998;Balcan et al., 2012; Berthet & Rigollet, 2013; Chandrasekaran & Jordan,2013; Duchi, Jordan, & Wainwright, 2013; Zhang et al., 2013; Zhang,Wainwright, & Jordan, 2014; Christiano, 2014; Daniely, Linial, &Shalev-Shwartz, 2014; Garg, Ma, & Nguyen, 2014; Shamir, 2014;Braverman et al., 2015; S. & Duchi, 2015; S., Valiant, & Wager, 2015)

This work: memory, communication.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 2 / 20

Page 7: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

1 Memory, Communication, and Statistical Queries

2 Memory-Constrained Sparse Regression

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 3 / 20

Page 8: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Setting

Assume: polynomial amount of i.i.d. samples (x , `(x)) ∈ X ×−1,+1, with`(x) in some concept class F .

COM(b): each sample held by a separate party, each party caninteractively broadcast up to b bits.

COM(b,k): each party gets k samples (instead of 1)

MEM(b): access data in a stream, store at most b bits of state.

Relate both classes to well-studied statistical query model:

SQ: can query E[ψ(x , `(x))] for any function ψ : X ×±1→ [−1,1];get output accurate to tolerance τ = 1/poly(n).

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 4 / 20

Page 9: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Setting

Assume: polynomial amount of i.i.d. samples (x , `(x)) ∈ X ×−1,+1, with`(x) in some concept class F .

COM(b): each sample held by a separate party, each party caninteractively broadcast up to b bits.

COM(b,k): each party gets k samples (instead of 1)

MEM(b): access data in a stream, store at most b bits of state.

Relate both classes to well-studied statistical query model:

SQ: can query E[ψ(x , `(x))] for any function ψ : X ×±1→ [−1,1];get output accurate to tolerance τ = 1/poly(n).

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 4 / 20

Page 10: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Setting

Assume: polynomial amount of i.i.d. samples (x , `(x)) ∈ X ×−1,+1, with`(x) in some concept class F .

COM(b): each sample held by a separate party, each party caninteractively broadcast up to b bits.

COM(b,k): each party gets k samples (instead of 1)

MEM(b): access data in a stream, store at most b bits of state.

Relate both classes to well-studied statistical query model:

SQ: can query E[ψ(x , `(x))] for any function ψ : X ×±1→ [−1,1];get output accurate to tolerance τ = 1/poly(n).

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 4 / 20

Page 11: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Setting

Assume: polynomial amount of i.i.d. samples (x , `(x)) ∈ X ×−1,+1, with`(x) in some concept class F .

COM(b): each sample held by a separate party, each party caninteractively broadcast up to b bits.

COM(b,k): each party gets k samples (instead of 1)

MEM(b): access data in a stream, store at most b bits of state.

Relate both classes to well-studied statistical query model:

SQ: can query E[ψ(x , `(x))] for any function ψ : X ×±1→ [−1,1];get output accurate to tolerance τ = 1/poly(n).

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 4 / 20

Page 12: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Setting

Assume: polynomial amount of i.i.d. samples (x , `(x)) ∈ X ×−1,+1, with`(x) in some concept class F .

COM(b): each sample held by a separate party, each party caninteractively broadcast up to b bits.

COM(b,k): each party gets k samples (instead of 1)

MEM(b): access data in a stream, store at most b bits of state.

Relate both classes to well-studied statistical query model:

SQ: can query E[ψ(x , `(x))] for any function ψ : X ×±1→ [−1,1];get output accurate to tolerance τ = 1/poly(n).

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 4 / 20

Page 13: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Communication

Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).

Implications of theorem:

For any constant C > 0, COM(1) = COM(C log(n)) = SQ.

Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.

Then PARITY(n) 6∈ COM(n/4).

In addition, PARITY(n) 6∈ COM(n/16,n/4).

Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20

Page 14: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Communication

Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).

Implications of theorem:

For any constant C > 0, COM(1) = COM(C log(n)) = SQ.

Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.

Then PARITY(n) 6∈ COM(n/4).

In addition, PARITY(n) 6∈ COM(n/16,n/4).

Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20

Page 15: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Communication

Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).

Implications of theorem:

For any constant C > 0, COM(1) = COM(C log(n)) = SQ.

Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.

Then PARITY(n) 6∈ COM(n/4).

In addition, PARITY(n) 6∈ COM(n/16,n/4).

Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20

Page 16: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Communication

Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).

Implications of theorem:

For any constant C > 0, COM(1) = COM(C log(n)) = SQ.

Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.

Then PARITY(n) 6∈ COM(n/4).

In addition, PARITY(n) 6∈ COM(n/16,n/4).

Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20

Page 17: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Communication

Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).

Implications of theorem:

For any constant C > 0, COM(1) = COM(C log(n)) = SQ.

Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.

Then PARITY(n) 6∈ COM(n/4).

In addition, PARITY(n) 6∈ COM(n/16,n/4).

Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20

Page 18: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Communication

Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).

Implications of theorem:

For any constant C > 0, COM(1) = COM(C log(n)) = SQ.

Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.

Then PARITY(n) 6∈ COM(n/4).

In addition, PARITY(n) 6∈ COM(n/16,n/4).

Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20

Page 19: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Communication

Theorem. If F is learnable with m samples and b bits of communication, thenit is learnable with O(bm) statistical queries of tolerance τ = Ω(1/(2bm)).

Implications of theorem:

For any constant C > 0, COM(1) = COM(C log(n)) = SQ.

Let PARITY(n) be the problem where x ∼ Uniform(0,1n) and`(x) = (−1)c>x for unknown c ∈ 0,1n.

Then PARITY(n) 6∈ COM(n/4).

In addition, PARITY(n) 6∈ COM(n/16,n/4).

Open Problem. Can PARITY(n) be solved with n2/4 bits of memory?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 5 / 20

Page 20: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Memory

Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with

O(log |F| log(m/τ)) bits of state and

O(m log |F|/τ2) samples.

Caveat: reduction is not computationally efficient.

Implications of theorem:

Let REP be the class of efficiently representable problems: log |F|= O(n).

Then SQ∩REP⊆MEM(O(n)).

k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.

If the covariates are r -sparse, then only poly(r ,k) samples are needed.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20

Page 21: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Memory

Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with

O(log |F| log(m/τ)) bits of state and

O(m log |F|/τ2) samples.

Caveat: reduction is not computationally efficient.

Implications of theorem:

Let REP be the class of efficiently representable problems: log |F|= O(n).

Then SQ∩REP⊆MEM(O(n)).

k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.

If the covariates are r -sparse, then only poly(r ,k) samples are needed.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20

Page 22: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Memory

Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with

O(log |F| log(m/τ)) bits of state and

O(m log |F|/τ2) samples.

Caveat: reduction is not computationally efficient.

Implications of theorem:

Let REP be the class of efficiently representable problems: log |F|= O(n).

Then SQ∩REP⊆MEM(O(n)).

k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.

If the covariates are r -sparse, then only poly(r ,k) samples are needed.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20

Page 23: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Memory

Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with

O(log |F| log(m/τ)) bits of state and

O(m log |F|/τ2) samples.

Caveat: reduction is not computationally efficient.

Implications of theorem:

Let REP be the class of efficiently representable problems: log |F|= O(n).

Then SQ∩REP⊆MEM(O(n)).

k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.

If the covariates are r -sparse, then only poly(r ,k) samples are needed.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20

Page 24: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Memory

Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with

O(log |F| log(m/τ)) bits of state and

O(m log |F|/τ2) samples.

Caveat: reduction is not computationally efficient.

Implications of theorem:

Let REP be the class of efficiently representable problems: log |F|= O(n).

Then SQ∩REP⊆MEM(O(n)).

k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.

If the covariates are r -sparse, then only poly(r ,k) samples are needed.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20

Page 25: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Memory

Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with

O(log |F| log(m/τ)) bits of state and

O(m log |F|/τ2) samples.

Caveat: reduction is not computationally efficient.

Implications of theorem:

Let REP be the class of efficiently representable problems: log |F|= O(n).

Then SQ∩REP⊆MEM(O(n)).

k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.

If the covariates are r -sparse, then only poly(r ,k) samples are needed.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20

Page 26: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Main Results: Memory

Theorem. If F can be learned with m statistical queries of tolerance τ , then itcan be learned with

O(log |F| log(m/τ)) bits of state and

O(m log |F|/τ2) samples.

Caveat: reduction is not computationally efficient.

Implications of theorem:

Let REP be the class of efficiently representable problems: log |F|= O(n).

Then SQ∩REP⊆MEM(O(n)).

k -sparse linear regression in d dimensions can be solved with k ·polylog(d)bits of state and d ·poly(k) samples.

If the covariates are r -sparse, then only poly(r ,k) samples are needed.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 6 / 20

Page 27: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 5

0 10

11

?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 28: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 5

0 10

11

?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 29: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 5

0 10

11

?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 30: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 5

0 10

11

?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 31: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50

10

11

?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 32: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 33: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

0

11

?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 34: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 35: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1

?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 36: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 37: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 38: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10)

= p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 39: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 40: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 41: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]︸ ︷︷ ︸statistical query

/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 42: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]︸ ︷︷ ︸statistical query

/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 43: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]︸ ︷︷ ︸statistical query

/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 44: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]︸ ︷︷ ︸statistical query

/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ .

=⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 45: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Communication

Goal: reduce communication-constrained algorithm to SQ algorithm.

Idea: use queries to estimate probability that next bit communicated is 0 or 1.

Consider intermediate state of algorithm:

party: 1 2 3 4 50 1

01

1?

c1

c2

c3

p(c3 = 1 | c1:2 = 10) = p(c1:3 = 101)/p(c1:2 = 10)

= E[I[c1:3 = 101]]︸ ︷︷ ︸statistical query

/p(c1:2 = 10)

Error: τ/p(c1:2).

E[τ/p(c1:2)] = τ ·∑c[p(c) ·1/p(c)] = 4τ (in general, 2bτ).

Cumulative error: m2bτ . =⇒ Okay as long as τ 1/(m2b)!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 7 / 20

Page 46: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Memory

Goal: represent SQ algorithm in memory-efficient way.

Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).

Algorithm is now a decision tree:

ψ.

ψ0 ψ1

ψ00 ψ01 ψ10 ψ11

0 1

0 1 0 1

...

dept

hm

Issue: naıvely remembering position in tree requires Θ(m) memory.

Can we somehow identify “important” queries?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20

Page 47: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Memory

Goal: represent SQ algorithm in memory-efficient way.

Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).

Algorithm is now a decision tree:

ψ.

ψ0 ψ1

ψ00 ψ01 ψ10 ψ11

0 1

0 1 0 1

...

dept

hm

Issue: naıvely remembering position in tree requires Θ(m) memory.

Can we somehow identify “important” queries?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20

Page 48: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Memory

Goal: represent SQ algorithm in memory-efficient way.

Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).

Algorithm is now a decision tree:

ψ.

ψ0 ψ1

ψ00 ψ01 ψ10 ψ11

0 1

0 1 0 1

...

dept

hm

Issue: naıvely remembering position in tree requires Θ(m) memory.

Can we somehow identify “important” queries?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20

Page 49: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Memory

Goal: represent SQ algorithm in memory-efficient way.

Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).

Algorithm is now a decision tree:

ψ.

ψ0 ψ1

ψ00 ψ01 ψ10 ψ11

0 1

0 1 0 1

...

dept

hm

Issue: naıvely remembering position in tree requires Θ(m) memory.

Can we somehow identify “important” queries?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20

Page 50: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Memory

Goal: represent SQ algorithm in memory-efficient way.

Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).

Algorithm is now a decision tree:

ψ.

ψ0 ψ1

ψ00 ψ01 ψ10 ψ11

0 1

0 1 0 1

...

dept

hm

Issue: naıvely remembering position in tree requires Θ(m) memory.

Can we somehow identify “important” queries?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20

Page 51: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Reduction: Memory

Goal: represent SQ algorithm in memory-efficient way.

Step 1: replace queries with threshold queries (i.e., “Is E[ψ] > t?”).

Algorithm is now a decision tree:

ψ.

ψ0 ψ1

ψ00 ψ01 ψ10 ψ11

0 1

0 1 0 1

...

dept

hm

Issue: naıvely remembering position in tree requires Θ(m) memory.

Can we somehow identify “important” queries?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 8 / 20

Page 52: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Idea: Normalizing Queries

Consider threshold query (ψ, t) of tolerance τ .

SQ(ψ, t) =

1 : E[ψ] > t + τ

0 : E[ψ] < t− τ

arbitrary : otherwise

E[ψ][ ]tt− τ t + τ

01

[ ][ ]

Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .

Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).

At least one must be good.

Can always normalize queries to be good!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20

Page 53: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Idea: Normalizing Queries

Consider threshold query (ψ, t) of tolerance τ .

SQ(ψ, t) =

1 : E[ψ] > t + τ

0 : E[ψ] < t− τ

arbitrary : otherwise

E[ψ][ ]tt− τ t + τ

01

[ ][ ]

Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .

Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).

At least one must be good.

Can always normalize queries to be good!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20

Page 54: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Idea: Normalizing Queries

Consider threshold query (ψ, t) of tolerance τ .

SQ(ψ, t) =

1 : E[ψ] > t + τ

0 : E[ψ] < t− τ

arbitrary : otherwise

E[ψ][ ]tt− τ t + τ

01

[ ][ ]

Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .

Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).

At least one must be good.

Can always normalize queries to be good!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20

Page 55: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Idea: Normalizing Queries

Consider threshold query (ψ, t) of tolerance τ .

SQ(ψ, t) =

1 : E[ψ] > t + τ

0 : E[ψ] < t− τ

arbitrary : otherwise

E[ψ][ ]tt− τ t + τ

0

1

[ ][ ]

Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .

Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).

At least one must be good.

Can always normalize queries to be good!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20

Page 56: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Idea: Normalizing Queries

Consider threshold query (ψ, t) of tolerance τ .

SQ(ψ, t) =

1 : E[ψ] > t + τ

0 : E[ψ] < t− τ

arbitrary : otherwise

E[ψ][ ]tt− τ t + τ

01

[ ][ ]

Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .

Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).

At least one must be good.

Can always normalize queries to be good!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20

Page 57: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Idea: Normalizing Queries

Consider threshold query (ψ, t) of tolerance τ .

SQ(ψ, t) =

1 : E[ψ] > t + τ

0 : E[ψ] < t− τ

arbitrary : otherwise

E[ψ][ ]tt− τ t + τ

01

[ ][ ]

Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .

Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).

At least one must be good.

Can always normalize queries to be good!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20

Page 58: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Idea: Normalizing Queries

Consider threshold query (ψ, t) of tolerance τ .

SQ(ψ, t) =

1 : E[ψ] > t + τ

0 : E[ψ] < t− τ

arbitrary : otherwise

E[ψ]

[ ]

tt− τ t + τ

01

[ ][ ]

Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .

Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).

At least one must be good.

Can always normalize queries to be good!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20

Page 59: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Idea: Normalizing Queries

Consider threshold query (ψ, t) of tolerance τ .

SQ(ψ, t) =

1 : E[ψ] > t + τ

0 : E[ψ] < t− τ

arbitrary : otherwise

E[ψ]

[ ]

tt− τ t + τ

01

[ ][ ]

Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .

Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).At least one must be good.

Can always normalize queries to be good!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20

Page 60: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Idea: Normalizing Queries

Consider threshold query (ψ, t) of tolerance τ .

SQ(ψ, t) =

1 : E[ψ] > t + τ

0 : E[ψ] < t− τ

arbitrary : otherwise

E[ψ]

[ ]

tt− τ t + τ

01

[ ][ ]

Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .

Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).At least one must be good.

Can always normalize queries to be good!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20

Page 61: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Idea: Normalizing Queries

Consider threshold query (ψ, t) of tolerance τ .

SQ(ψ, t) =

1 : E[ψ] > t + τ

0 : E[ψ] < t− τ

arbitrary : otherwise

E[ψ]

[ ]

tt− τ t + τ

01

[ ][ ]

Call (ψ, t,τ) “good” if at least one of 0,1 narrows down F by factor of 12 .

Consider (ψ, t− τ/2,τ/2) and (ψ, t + τ/2,τ/2).At least one must be good.

Can always normalize queries to be good!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 9 / 20

Page 62: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Compression Scheme

Normalize all queries to be good.

At each node, color the child edge which reduces F by at least 12 :

ψ.

ψ0 ψ1

ψ00 ψ01 ψ10 ψ11

1

0

0

1 0

0

1 0

1

...

Note: any path has at most log |F| colored edges.

Can remember indices of colored edges with log |F| log(m) bits of memory!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 10 / 20

Page 63: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Compression Scheme

Normalize all queries to be good.

At each node, color the child edge which reduces F by at least 12 :

ψ.

ψ0 ψ1

ψ00 ψ01 ψ10 ψ11

1

0

0

1 0

0

1 0 1

...

Note: any path has at most log |F| colored edges.

Can remember indices of colored edges with log |F| log(m) bits of memory!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 10 / 20

Page 64: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Compression Scheme

Normalize all queries to be good.

At each node, color the child edge which reduces F by at least 12 :

ψ.

ψ0 ψ1

ψ00 ψ01 ψ10 ψ11

1

0

0

1 0

0

1 0 1

...

Note: any path has at most log |F| colored edges.

Can remember indices of colored edges with log |F| log(m) bits of memory!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 10 / 20

Page 65: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Compression Scheme

Normalize all queries to be good.

At each node, color the child edge which reduces F by at least 12 :

ψ.

ψ0 ψ1

ψ00 ψ01 ψ10 ψ11

1

0

0

1 0

0

1 0 1

...

Note: any path has at most log |F| colored edges.

Can remember indices of colored edges with log |F| log(m) bits of memory!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 10 / 20

Page 66: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Summary

COM→ SQ

Simulate conditional probabilities of messages with statistical queries.

SQ→MEMNormalize queries, store compressed representation of decision path.

Next: study sparse regression in more detail

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 11 / 20

Page 67: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Summary

COM→ SQSimulate conditional probabilities of messages with statistical queries.

SQ→MEMNormalize queries, store compressed representation of decision path.

Next: study sparse regression in more detail

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 11 / 20

Page 68: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Summary

COM→ SQSimulate conditional probabilities of messages with statistical queries.

SQ→MEM

Normalize queries, store compressed representation of decision path.

Next: study sparse regression in more detail

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 11 / 20

Page 69: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Summary

COM→ SQSimulate conditional probabilities of messages with statistical queries.

SQ→MEMNormalize queries, store compressed representation of decision path.

Next: study sparse regression in more detail

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 11 / 20

Page 70: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Summary

COM→ SQSimulate conditional probabilities of messages with statistical queries.

SQ→MEMNormalize queries, store compressed representation of decision path.

Next: study sparse regression in more detail

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 11 / 20

Page 71: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

1 Memory, Communication, and Statistical Queries

2 Memory-Constrained Sparse Regression

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 12 / 20

Page 72: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Setting

Sparse linear regression in Rd :

Y (i) = 〈w∗,X (i)〉+ ε(i)

‖w∗‖0 = k , k d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 13 / 20

Page 73: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Setting

Sparse linear regression in Rd :

Y (i) = 〈w∗,X (i)〉+ ε(i)

‖w∗‖0 = k , k d

Memory constraint:

(X (i),Y (i)) observed as read-only stream

Only keep b bits of state Z (i) between successive observations

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 13 / 20

Page 74: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Problem Statement

How much data n is needed to obtain estimator w with

E[‖w−w∗‖22]≤ ε?

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20

Page 75: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Problem Statement

How much data n is needed to obtain estimator w with

E[‖w−w∗‖22]≤ ε?

Classical case (no memory constraint):

Theorem (Wainwright, 2009)

log(d) . n .kε

log(d)

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20

Page 76: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Problem Statement

How much data n is needed to obtain estimator w with

E[‖w−w∗‖22]≤ ε?

Classical case (no memory constraint):

Theorem (Wainwright, 2009)

log(d) . n .kε

log(d)

Achievable with O(d) memory (Agarwal et al., 2012; S., Wager, & Liang, 2015).

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20

Page 77: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Problem Statement

How much data n is needed to obtain estimator w with

E[‖w−w∗‖22]≤ ε?

Classical case (no memory constraint):

Theorem (Wainwright, 2009)

log(d) . n .kε

log(d)

With memory constraints b:

Theorem (S. & Duchi, 2015)

db. n .

kε2

db

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20

Page 78: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Problem Statement

How much data n is needed to obtain estimator w with

E[‖w−w∗‖22]≤ ε?

Classical case (no memory constraint):

Theorem (Wainwright, 2009)

log(d) . n .kε

log(d)

With memory constraints b:

Theorem (S. & Duchi, 2015)

db. n .

kε2

db

Exponential increase if b d !

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20

Page 79: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Problem Statement

How much data n is needed to obtain estimator w with

E[‖w−w∗‖22]≤ ε?

Classical case (no memory constraint):

Theorem (Wainwright, 2009)

log(d) . n .kε

log(d)

With memory constraints b:

Theorem (S. & Duchi, 2015)

db. n .

kε2

db

[Note: up to log factors; assumes k log(d) b ≤ d ]

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 14 / 20

Page 80: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Proof Overview

Lower bound:

information-theoreticstrong data-processing inequality

W ∗ X ,Y Zd

1

main challenge: dependence between X ,Y

Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20

Page 81: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Proof Overview

Lower bound:information-theoretic

strong data-processing inequality

W ∗ X ,Y Zd

1

main challenge: dependence between X ,Y

Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20

Page 82: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Proof Overview

Lower bound:information-theoreticstrong data-processing inequality

W ∗ X ,Y Zd

1

main challenge: dependence between X ,Y

Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20

Page 83: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Proof Overview

Lower bound:information-theoreticstrong data-processing inequality

W ∗ X ,Y Zdb

1bd

main challenge: dependence between X ,Y

Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20

Page 84: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Proof Overview

Lower bound:information-theoreticstrong data-processing inequality

W ∗ X ,Y Zdb

1bd

main challenge: dependence between X ,Y

Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20

Page 85: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Proof Overview

Lower bound:information-theoreticstrong data-processing inequality

W ∗ X ,Y Zdb

1bd

main challenge: dependence between X ,Y

Upper bound:

count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20

Page 86: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Proof Overview

Lower bound:information-theoreticstrong data-processing inequality

W ∗ X ,Y Zdb

1bd

main challenge: dependence between X ,Y

Upper bound:count-min sketch + `1-regularized dual averaging

more regularization→ easier sketching problem

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20

Page 87: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Proof Overview

Lower bound:information-theoreticstrong data-processing inequality

W ∗ X ,Y Zdb

1bd

main challenge: dependence between X ,Y

Upper bound:count-min sketch + `1-regularized dual averagingmore regularization→ easier sketching problem

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 15 / 20

Page 88: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Lower Bound Construction

Split coordinates into k blocks of size d/k

w∗ in each block: single non-zero coordinate J, ±δ with equal probability

Direct sum argument: reduce to k = 1

J = 2

dk

Estimation to testing:

E[‖w∗− w‖22]≥ δ 2

2P[J 6= J]

Looking ahead: bound KL between Pj and base distribution P0

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 16 / 20

Page 89: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Lower Bound Construction

Split coordinates into k blocks of size d/k

w∗ in each block: single non-zero coordinate J, ±δ with equal probability

Direct sum argument: reduce to k = 1

J = 2

dk

Estimation to testing:

E[‖w∗− w‖22]≥ δ 2

2P[J 6= J]

Looking ahead: bound KL between Pj and base distribution P0

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 16 / 20

Page 90: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Lower Bound Construction

Split coordinates into k blocks of size d/k

w∗ in each block: single non-zero coordinate J, ±δ with equal probability

Direct sum argument: reduce to k = 1

J = 2

dk

Estimation to testing:

E[‖w∗− w‖22]≥ δ 2

2P[J 6= J]

Looking ahead: bound KL between Pj and base distribution P0

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 16 / 20

Page 91: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Lower Bound Construction

Split coordinates into k blocks of size d/k

w∗ in each block: single non-zero coordinate J, ±δ with equal probability

Direct sum argument: reduce to k = 1

J = 2

dk

Estimation to testing:

E[‖w∗− w‖22]≥ δ 2

2P[J 6= J]

Looking ahead: bound KL between Pj and base distribution P0

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 16 / 20

Page 92: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Lower Bound Construction

Split coordinates into k blocks of size d/k

w∗ in each block: single non-zero coordinate J, ±δ with equal probability

Direct sum argument: reduce to k = 1

J = 2

dk

Estimation to testing:

E[‖w∗− w‖22]≥ δ 2

2P[J 6= J]

Looking ahead: bound KL between Pj and base distribution P0

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 16 / 20

Page 93: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Some Information Theory

Let X ∼ Uniform(±1d )

Let Pj(Z (1:n)) be distribution conditioned on J = j

Let P0(Z (1:n)) be distribution with Y independent of X

Assouad’s method:

P[J 6= J]≥ 12−

√√√√ 1d

d

∑j=1

Dkl(P0(Z (1:n)) || Pj(Z (1:n))

)

Xj : −1 +1

Key fact: (Y ,Xj) independent of X¬j under Pj

Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20

Page 94: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Some Information Theory

Let X ∼ Uniform(±1d )

Let Pj(Z (1:n)) be distribution conditioned on J = j

Let P0(Z (1:n)) be distribution with Y independent of X

Assouad’s method:

P[J 6= J]≥ 12−

√√√√ 1d

d

∑j=1

Dkl(P0(Z (1:n)) || Pj(Z (1:n))

)

Xj : −1 +1

Key fact: (Y ,Xj) independent of X¬j under Pj

Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20

Page 95: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Some Information Theory

Let X ∼ Uniform(±1d )

Let Pj(Z (1:n)) be distribution conditioned on J = j

Let P0(Z (1:n)) be distribution with Y independent of X

Assouad’s method:

P[J 6= J]≥ 12−

√√√√ 1d

d

∑j=1

Dkl(P0(Z (1:n)) || Pj(Z (1:n))

)

Xj : −1 +1

Key fact: (Y ,Xj) independent of X¬j under Pj

Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20

Page 96: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Some Information Theory

Let X ∼ Uniform(±1d )

Let Pj(Z (1:n)) be distribution conditioned on J = j

Let P0(Z (1:n)) be distribution with Y independent of X

Assouad’s method:

P[J 6= J]≥ 12−

√√√√ 1d

d

∑j=1

Dkl(P0(Z (1:n)) || Pj(Z (1:n))

)

Xj : −1 +1

Key fact: (Y ,Xj) independent of X¬j under Pj

Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20

Page 97: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Some Information Theory

Let X ∼ Uniform(±1d )

Let Pj(Z (1:n)) be distribution conditioned on J = j

Let P0(Z (1:n)) be distribution with Y independent of X

Assouad’s method:

P[J 6= J]≥ 12−

√√√√ 1d

d

∑j=1

Dkl(P0(Z (1:n)) || Pj(Z (1:n))

)

Xj : −1 +1

Key fact: (Y ,Xj) independent of X¬j under Pj

Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20

Page 98: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Some Information Theory

Let X ∼ Uniform(±1d )

Let Pj(Z (1:n)) be distribution conditioned on J = j

Let P0(Z (1:n)) be distribution with Y independent of X

Assouad’s method:

P[J 6= J]≥ 12−

√√√√ 1d

d

∑j=1

Dkl(P0(Z (1:n)) || Pj(Z (1:n))

)

Xj : −1 +1

Key fact: (Y ,Xj) independent of X¬j under Pj

Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20

Page 99: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Some Information Theory

Let X ∼ Uniform(±1d )

Let Pj(Z (1:n)) be distribution conditioned on J = j

Let P0(Z (1:n)) be distribution with Y independent of X

Assouad’s method:

P[J 6= J]≥ 12−

√√√√ 1d

d

∑j=1

Dkl(P0(Z (1:n)) || Pj(Z (1:n))

)

Xj : −1 +1

Key fact: (Y ,Xj) independent of X¬j under Pj

Intuition: Dkl (P0 || Pj ) small unless Z stores info about Xj ; need to storemajority of Xj to make average Dkl large.

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 17 / 20

Page 100: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.

PropositionFor any z,

Dkl (P0(Z | z) || Pj(Z | z))

Plug into Assouad:1d

d

∑j=1

Dkl (P0 || Pj)

≤ 4δ 2

d

d

∑j=1

I(Xj ;Z ,Y | Z )

Only get 4δ 2bd bits per round!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20

Page 101: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.

PropositionFor any z,

Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2 I(Xj ;Z | Y , Z = z)︸ ︷︷ ︸

mutual information

Plug into Assouad:1d

d

∑j=1

Dkl (P0 || Pj)

≤ 4δ 2

d

d

∑j=1

I(Xj ;Z ,Y | Z )

Only get 4δ 2bd bits per round!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20

Page 102: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.

PropositionFor any z,

Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)

≤ 4δ2I(Xj ;Z ,Y | Z = z)

Plug into Assouad:1d

d

∑j=1

Dkl (P0 || Pj)

≤ 4δ 2

d

d

∑j=1

I(Xj ;Z ,Y | Z )

Only get 4δ 2bd bits per round!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20

Page 103: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.

PropositionFor any z,

Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)

≤ 4δ2I(Xj ;Z ,Y | Z = z)

Plug into Assouad:1d

d

∑j=1

Dkl (P0 || Pj)

≤ 4δ 2

d

d

∑j=1

I(Xj ;Z ,Y | Z )

Only get 4δ 2bd bits per round!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20

Page 104: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.

PropositionFor any z,

Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)

≤ 4δ2I(Xj ;Z ,Y | Z = z)

Plug into Assouad:1d

d

∑j=1

Dkl (P0 || Pj) ≤4δ 2

d

d

∑j=1

I(Xj ;Z ,Y | Z )

Only get 4δ 2bd bits per round!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20

Page 105: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.

PropositionFor any z,

Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)

≤ 4δ2I(Xj ;Z ,Y | Z = z)

Plug into Assouad:1d

d

∑j=1

Dkl (P0 || Pj) ≤4δ 2

d

d

∑j=1

I(Xj ;Z ,Y | Z )

≤ 4δ 2

dI(X ;Z ,Y | Z )

Only get 4δ 2bd bits per round!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20

Page 106: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.

PropositionFor any z,

Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)

≤ 4δ2I(Xj ;Z ,Y | Z = z)

Plug into Assouad:1d

d

∑j=1

Dkl (P0 || Pj) ≤4δ 2

d

d

∑j=1

I(Xj ;Z ,Y | Z )

≤ 4δ 2

dI(X ;Z ,Y | Z )︸ ︷︷ ︸

b+O(1)

Only get 4δ 2bd bits per round!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20

Page 107: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Strong Data-Processing Inequality

Focus on a single index Z = Z (i), with z = z(1:i−1) fixed.

PropositionFor any z,

Dkl (P0(Z | z) || Pj(Z | z)) ≤ 4δ2I(Xj ;Z | Y , Z = z)

≤ 4δ2I(Xj ;Z ,Y | Z = z)

Plug into Assouad:1d

d

∑j=1

Dkl (P0 || Pj) ≤4δ 2

d

d

∑j=1

I(Xj ;Z ,Y | Z )

≤ 4δ 2

dI(X ;Z ,Y | Z )︸ ︷︷ ︸

b+O(1)

Only get 4δ 2bd bits per round!

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 18 / 20

Page 108: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Upper Bound

Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:

w(i) = argminw

〈θ (i),w〉+ λ

√n‖w‖1 +

12η‖w‖2

2

,

θ(i) =

i−1

∑i ′=1

x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).

Hard part: determine support of w(i).

Need to distinguish |θj | ≥ λ√

n (signal) from |θj | ≈√

n (noise)

Can use count-min sketch, memory usage ≈ d log(d)λ 2

=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20

Page 109: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Upper Bound

Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:

w(i) = argminw

〈θ (i),w〉+ λ

√n‖w‖1 +

12η‖w‖2

2

,

θ(i) =

i−1

∑i ′=1

x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).

Hard part: determine support of w(i).

Need to distinguish |θj | ≥ λ√

n (signal) from |θj | ≈√

n (noise)

Can use count-min sketch, memory usage ≈ d log(d)λ 2

=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20

Page 110: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Upper Bound

Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:

w(i) = argminw

〈θ (i),w〉+ λ

√n‖w‖1 +

12η‖w‖2

2

,

θ(i) =

i−1

∑i ′=1

x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).

Hard part: determine support of w(i).

Need to distinguish |θj | ≥ λ√

n (signal) from |θj | ≈√

n (noise)

Can use count-min sketch, memory usage ≈ d log(d)λ 2

=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20

Page 111: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Upper Bound

Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:

w(i) = argminw

〈θ (i),w〉+ λ

√n‖w‖1 +

12η‖w‖2

2

,

θ(i) =

i−1

∑i ′=1

x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).

Hard part: determine support of w(i).

Need to distinguish |θj | ≥ λ√

n (signal) from |θj | ≈√

n (noise)

Can use count-min sketch, memory usage ≈ d log(d)λ 2

=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20

Page 112: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Upper Bound

Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:

w(i) = argminw

〈θ (i),w〉+ λ

√n‖w‖1 +

12η‖w‖2

2

,

θ(i) =

i−1

∑i ′=1

x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).

Hard part: determine support of w(i).

Need to distinguish |θj | ≥ λ√

n (signal) from |θj | ≈√

n (noise)

Can use count-min sketch, memory usage ≈ d log(d)λ 2

=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20

Page 113: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Upper Bound

Solve `1-regularized dual averaging problem (Xiao, 2010), λ 1:

w(i) = argminw

〈θ (i),w〉+ λ

√n‖w‖1 +

12η‖w‖2

2

,

θ(i) =

i−1

∑i ′=1

x(i ′)(y(i ′)−〈w(i ′),x(i ′)〉).

Hard part: determine support of w(i).

Need to distinguish |θj | ≥ λ√

n (signal) from |θj | ≈√

n (noise)

Can use count-min sketch, memory usage ≈ d log(d)λ 2

=⇒ regularization decreases computation; seen before in `2 case(Shalev-Shwartz & Zhang, 2013; Bruer et al., 2014)

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 19 / 20

Page 114: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Discussion

Summary:

Upper and lower bounds on memory-constrained regression

Lower bound: extend data processing inequality to handle covariates

Upper bound: use `1-regularizer to reduce to sketching

Future work:

Close the gap (kd/bε vs kd/bε2)

Weaken upper bound assumptions

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20

Page 115: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Discussion

Summary:

Upper and lower bounds on memory-constrained regression

Lower bound: extend data processing inequality to handle covariates

Upper bound: use `1-regularizer to reduce to sketching

Future work:

Close the gap (kd/bε vs kd/bε2)

Weaken upper bound assumptions

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20

Page 116: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Discussion

Summary:

Upper and lower bounds on memory-constrained regression

Lower bound: extend data processing inequality to handle covariates

Upper bound: use `1-regularizer to reduce to sketching

Future work:

Close the gap (kd/bε vs kd/bε2)

Weaken upper bound assumptions

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20

Page 117: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Discussion

Summary:

Upper and lower bounds on memory-constrained regression

Lower bound: extend data processing inequality to handle covariates

Upper bound: use `1-regularizer to reduce to sketching

Future work:

Close the gap (kd/bε vs kd/bε2)

Weaken upper bound assumptions

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20

Page 118: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Discussion

Summary:

Upper and lower bounds on memory-constrained regression

Lower bound: extend data processing inequality to handle covariates

Upper bound: use `1-regularizer to reduce to sketching

Future work:

Close the gap (kd/bε vs kd/bε2)

Weaken upper bound assumptions

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20

Page 119: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Discussion

Summary:

Upper and lower bounds on memory-constrained regression

Lower bound: extend data processing inequality to handle covariates

Upper bound: use `1-regularizer to reduce to sketching

Future work:

Close the gap (kd/bε vs kd/bε2)

Weaken upper bound assumptions

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20

Page 120: Learning with Memory and Communication Constraintsjsteinhardt/talks/communication.pdf · 2015. 12. 3. · Learning with Memory and Communication Constraints Jacob Steinhardt* Stanford

Discussion

Summary:

Upper and lower bounds on memory-constrained regression

Lower bound: extend data processing inequality to handle covariates

Upper bound: use `1-regularizer to reduce to sketching

Future work:

Close the gap (kd/bε vs kd/bε2)

Weaken upper bound assumptions

J. Steinhardt (Stanford) Resource-Constrained Learning July 30, 2015 20 / 20