optimal quantum sample complexity of learning algorithms · 1/18/2017  · 1/ 23 optimal quantum...

109
1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald de Wolf)

Upload: others

Post on 16-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

1/ 23

OptimalQuantum Sample Complexity

of Learning Algorithms

Srinivasan Arunachalam

(Joint work with Ronald de Wolf)

Page 2: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 3: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 4: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 5: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 6: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 7: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 8: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 9: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

2/ 23

Machine learning

Classical machine learning

Grand goal: enable AI systems to improve themselves

Practical goal: learn“something” from given data

Recent success: deep learning is extremely good at imagerecognition, natural language processing, even the game of Go

Why the recent interest? Flood of available data, increasingcomputational power, growing progress in algorithms

Quantum machine learning

What can quantum computing do for machine learning?

The learner will be quantum, the data may be quantum

Some examples are known of reduction in time complexity:

clustering (Aımeur et al. ’06)principal component analysis (Lloyd et al. ’13)perceptron learning (Wiebe et al. ’16)recommendation systems (Kerenidis & Prakash ’16)

Page 10: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C (Unknown)

Distribution D : 0, 1n → [0, 1] (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 11: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C (Unknown)

Distribution D : 0, 1n → [0, 1] (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 12: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C (Unknown)

Distribution D : 0, 1n → [0, 1] (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 13: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C (Unknown)

Distribution D : 0, 1n → [0, 1] (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 14: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

3/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target concept c : some function c ∈ C (Unknown)

Distribution D : 0, 1n → [0, 1] (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 15: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target Concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 16: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target Concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 17: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target Concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 18: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

4/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target Concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D

Page 19: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target Concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D.

Formally: A theory of the learnable (Valiant’84)

Using i.i.d. labeled examples, learner for C should output hypothesish that is Probably Approximately Correct

Error of h w.r.t. target c : errD(c , h) = Prx∼D [c(x) 6= h(x)]

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

Page 20: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target Concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D.

Formally: A theory of the learnable (Valiant’84)

Using i.i.d. labeled examples, learner for C should output hypothesish that is Probably Approximately Correct

Error of h w.r.t. target c : errD(c , h) = Prx∼D [c(x) 6= h(x)]

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

Page 21: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target Concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D.

Formally: A theory of the learnable (Valiant’84)

Using i.i.d. labeled examples, learner for C should output hypothesish that is Probably Approximately Correct

Error of h w.r.t. target c : errD(c , h) = Prx∼D [c(x) 6= h(x)]

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

Page 22: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

5/ 23

Probably Approximately Correct (PAC) learning

Basic definitions

Concept class C: collection of Boolean functions on n bits (Known)

Target Concept c : some function c ∈ C. (Unknown)

Distribution D : 0, 1n → [0, 1]. (Unknown)

Labeled example for c ∈ C: (x , c(x)) where x ∼ D.

Formally: A theory of the learnable (Valiant’84)

Using i.i.d. labeled examples, learner for C should output hypothesish that is Probably Approximately Correct

Error of h w.r.t. target c : errD(c , h) = Prx∼D [c(x) 6= h(x)]

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

Page 23: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 24: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 25: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 26: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 27: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 28: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 29: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

6/ 23

Complexity of learning

Recap

Concept: some function c : 0, 1n → 0, 1Concept class C: set of concepts

An algorithm (ε, δ)-PAC-learns C if:

∀c ∈ C ∀D : Pr[ errD(c , h) ≤ ε︸ ︷︷ ︸Approximately Correct

] ≥ 1− δ︸ ︷︷ ︸Probably

How to measure the efficiency of the learning algorithm?

Sample complexity: number of labeled examples used by learner

Time complexity: number of time-steps used by learner

This talk: focus on sample complexity

No need for complexity-theoretic assumptions

No need to worry about the format of hypothesis h

Page 30: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1

Let M be the |C| × 2n Boolean matrix whose c-th row is the truth tableof concept c : 0, 1n → 0, 1VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Page 31: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1Let M be the |C| × 2n Boolean matrix whose c-th row is the truth tableof concept c : 0, 1n → 0, 1

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Page 32: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1Let M be the |C| × 2n Boolean matrix whose c-th row is the truth tableof concept c : 0, 1n → 0, 1VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1d

These d column indices are shattered by C

Page 33: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

7/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1Let M be the |C| × 2n Boolean matrix whose c-th row is the truth tableof concept c : 0, 1n → 0, 1VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Page 34: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

8/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1M is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Table : VC-dim(C) = 2

Concepts Truth tablec1 0 1 0 1c2 0 1 1 0c3 1 0 0 1c4 1 0 1 0c5 1 1 0 1c6 0 1 1 1c7 0 0 1 1c8 0 1 0 0c9 1 1 1 1

Page 35: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

9/ 23

Vapnik and Chervonenkis (VC) dimension

VC dimension of C ⊆ c : 0, 1n → 0, 1M is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Table : VC-dim(C) = 2

Concepts Truth tablec1 0 1 0 1c2 0 1 1 0c3 1 0 0 1c4 1 0 1 0c5 1 1 0 1c6 0 1 1 1c7 0 0 1 1c8 0 1 0 0c9 1 1 1 1

Table : VC-dim(C) = 3

Concepts Truth tablec1 0 1 1 0c2 1 0 0 1c3 0 0 0 0c4 1 1 0 1c5 1 0 1 0c6 0 1 1 1c7 0 0 1 1c8 0 1 0 1c9 0 1 0 0

Page 36: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of CM is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Fundamental theorem of PAC learning

Suppose VC-dim(C) = d

Blumer-Ehrenfeucht-Haussler-Warmuth’86:

every (ε, δ)-PAC learner for C needs Ω(

dε + log(1/δ)

ε

)examples

Hanneke’16: there exists an (ε, δ)-PAC learner for C using

O(

dε + log(1/δ)

ε

)examples

Page 37: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of CM is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Fundamental theorem of PAC learning

Suppose VC-dim(C) = d

Blumer-Ehrenfeucht-Haussler-Warmuth’86:

every (ε, δ)-PAC learner for C needs Ω(

dε + log(1/δ)

ε

)examples

Hanneke’16: there exists an (ε, δ)-PAC learner for C using

O(

dε + log(1/δ)

ε

)examples

Page 38: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of CM is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Fundamental theorem of PAC learning

Suppose VC-dim(C) = d

Blumer-Ehrenfeucht-Haussler-Warmuth’86:

every (ε, δ)-PAC learner for C needs Ω(

dε + log(1/δ)

ε

)examples

Hanneke’16: there exists an (ε, δ)-PAC learner for C using

O(

dε + log(1/δ)

ε

)examples

Page 39: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

10/ 23

VC dimension characterizes PAC sample complexity

VC dimension of CM is the |C| × 2n Boolean matrix whose c-th row is the truth table of c

VC-dim(C): largest d s.t. the |C| × d rectangle in M contains 0, 1dThese d column indices are shattered by C

Fundamental theorem of PAC learning

Suppose VC-dim(C) = d

Blumer-Ehrenfeucht-Haussler-Warmuth’86:

every (ε, δ)-PAC learner for C needs Ω(

dε + log(1/δ)

ε

)examples

Hanneke’16: there exists an (ε, δ)-PAC learner for C using

O(

dε + log(1/δ)

ε

)examples

Page 40: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

11/ 23

Quantum PAC learning

Do quantum computers provide an advantage for PAC learning?

Quantum data

Bshouty-Jackson’95: Quantum example is a superposition

|Ec,D 〉 =∑

x∈0,1n

√D(x) |x , c(x)〉

Measuring this (n + 1)-qubit state gives a classical example,so quantum examples are at least as powerful as classical

Page 41: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

11/ 23

Quantum PAC learning

Do quantum computers provide an advantage for PAC learning?

Quantum data

Bshouty-Jackson’95: Quantum example is a superposition

|Ec,D 〉 =∑

x∈0,1n

√D(x) |x , c(x)〉

Measuring this (n + 1)-qubit state gives a classical example,so quantum examples are at least as powerful as classical

Page 42: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

11/ 23

Quantum PAC learning

Do quantum computers provide an advantage for PAC learning?

Quantum data

Bshouty-Jackson’95: Quantum example is a superposition

|Ec,D 〉 =∑

x∈0,1n

√D(x) |x , c(x)〉

Measuring this (n + 1)-qubit state gives a classical example,so quantum examples are at least as powerful as classical

Page 43: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

11/ 23

Quantum PAC learning

Do quantum computers provide an advantage for PAC learning?

Quantum data

Bshouty-Jackson’95: Quantum example is a superposition

|Ec,D 〉 =∑

x∈0,1n

√D(x) |x , c(x)〉

Measuring this (n + 1)-qubit state gives a classical example,so quantum examples are at least as powerful as classical

Page 44: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

12/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for a fixed distribution)

Learning class of linear functions under uniform D:Classical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Learning DNF under uniform D:Classical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum: Polynomial-time (Bshouty-Jackson’95)

Page 45: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

12/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for a fixed distribution)

Learning class of linear functions under uniform D:

Classical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Learning DNF under uniform D:Classical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum: Polynomial-time (Bshouty-Jackson’95)

Page 46: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

12/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for a fixed distribution)

Learning class of linear functions under uniform D:Classical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Learning DNF under uniform D:Classical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum: Polynomial-time (Bshouty-Jackson’95)

Page 47: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

12/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for a fixed distribution)

Learning class of linear functions under uniform D:Classical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Learning DNF under uniform D:

Classical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum: Polynomial-time (Bshouty-Jackson’95)

Page 48: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

12/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for a fixed distribution)

Learning class of linear functions under uniform D:Classical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Learning DNF under uniform D:Classical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum: Polynomial-time (Bshouty-Jackson’95)

Page 49: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

13/ 23

Quantum PAC learning

Quantum Data

Quantum example: |Ec,D 〉 =∑

x∈0,1n√

D(x) |x , c(x)〉Quantum examples are at least as powerful as classical examples

Quantum is indeed more powerful for learning! (for a fixed distribution)

Learning class of linear functions under uniform D:Classical: Ω(n) classical examples neededQuantum: O(1) quantum examples suffice (Bernstein-Vazirani’93)

Learning DNF under uniform D:Classical: Best known upper bound is quasi-poly. time (Verbeugt’90)Quantum Polynomial-time (Bshouty-Jackson’95)

But in the PAC model,learner has to succeed for all D!

Page 50: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

14/ 23

Quantum sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10 improved first term to d1−η

ε for all η > 0

Page 51: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

14/ 23

Quantum sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10 improved first term to d1−η

ε for all η > 0

Page 52: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

14/ 23

Quantum sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10 improved first term to d1−η

ε for all η > 0

Page 53: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

15/ 23

Quantum sample complexity = Classical sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10 improved first term to d1−η

ε for all η > 0

Our result: Tight lower bound

We show: Ω(

dε + log(1/δ)

ε

)quantum examples are necessary

Two proof approaches

Information theory: conceptually simple, nearly-tight bounds

Optimal measurement: tight bounds, some messy calculations

Page 54: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

15/ 23

Quantum sample complexity = Classical sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10 improved first term to d1−η

ε for all η > 0

Our result: Tight lower bound

We show: Ω(

dε + log(1/δ)

ε

)quantum examples are necessary

Two proof approaches

Information theory: conceptually simple, nearly-tight bounds

Optimal measurement: tight bounds, some messy calculations

Page 55: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

15/ 23

Quantum sample complexity = Classical sample complexity

Quantum upper bound

Classical upper bound O(

dε + log(1/δ)

ε

)carries over to quantum

Best known quantum lower bounds

Atici & Servedio’04: lower bound Ω(√

dε + d + log(1/δ)

ε

)Zhang’10 improved first term to d1−η

ε for all η > 0

Our result: Tight lower bound

We show: Ω(

dε + log(1/δ)

ε

)quantum examples are necessary

Two proof approaches

Information theory: conceptually simple, nearly-tight bounds

Optimal measurement: tight bounds, some messy calculations

Page 56: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 57: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 58: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 59: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,

but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 60: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 61: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement,

then Popt ≥ Ppgm ≥ P2opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 62: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm

≥ P2opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 63: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 64: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 65: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 66: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 67: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C.

Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 68: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sd

Let E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 69: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 70: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C:

cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 71: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

16/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: if Popt is the success probability of the optimalmeasurement, then Popt ≥ Ppgm ≥ P2

opt (Barnum-Knill’02)

How does learning relate to identification?

Quantum PAC: Given |ψc 〉 = |Ec,D 〉⊗T , learn c approximately

Goal: show T ≥ d/ε, where d = VC-dim(C)

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Page 72: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

17/ 23

Pick concepts cz ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Suppose VC (C) = d + 1 and s0, . . . , sd is shattered by C, i.e.,|C| × (d + 1) rectangle of s0, . . . , sd contains 0, 1d+1

Concepts Truth tablec ∈ C s0 s1 · · · sd−1 sd · · · · · ·

c1 0 0 · · · 0 0 · · · · · ·c2 0 0 · · · 1 0 · · · · · ·c3 0 0 · · · 1 1 · · · · · ·...

......

. . ....

... · · · · · ·c2d−1 0 1 · · · 1 0 · · · · · ·

c2d 0 1 · · · 1 1 · · · · · ·c2d+1 1 0 · · · 0 1 · · · · · ·

......

.... . .

...... · · · · · ·

c2d+1 1 1 · · · 1 1 · · · · · ·...

......

. . ....

... · · · · · ·

c(s0) = 0

Among c1, . . . , c2d, pick 2k concepts that correspond to codewords ofE : 0, 1k → 0, 1d on s1, . . . , sd

Page 73: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

17/ 23

Pick concepts cz ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Suppose VC (C) = d + 1 and s0, . . . , sd is shattered by C, i.e.,|C| × (d + 1) rectangle of s0, . . . , sd contains 0, 1d+1

Concepts Truth tablec ∈ C s0 s1 · · · sd−1 sd · · · · · ·

c1 0 0 · · · 0 0 · · · · · ·c2 0 0 · · · 1 0 · · · · · ·c3 0 0 · · · 1 1 · · · · · ·...

......

. . ....

... · · · · · ·c2d−1 0 1 · · · 1 0 · · · · · ·

c2d 0 1 · · · 1 1 · · · · · ·c2d+1 1 0 · · · 0 1 · · · · · ·

......

.... . .

...... · · · · · ·

c2d+1 1 1 · · · 1 1 · · · · · ·...

......

. . ....

... · · · · · ·

c(s0) = 0

Among c1, . . . , c2d, pick 2k concepts that correspond to codewords ofE : 0, 1k → 0, 1d on s1, . . . , sd

Page 74: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

17/ 23

Pick concepts cz ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Suppose VC (C) = d + 1 and s0, . . . , sd is shattered by C, i.e.,|C| × (d + 1) rectangle of s0, . . . , sd contains 0, 1d+1

Concepts Truth tablec ∈ C s0 s1 · · · sd−1 sd · · · · · ·

c1 0 0 · · · 0 0 · · · · · ·c2 0 0 · · · 1 0 · · · · · ·c3 0 0 · · · 1 1 · · · · · ·...

......

. . ....

... · · · · · ·c2d−1 0 1 · · · 1 0 · · · · · ·

c2d 0 1 · · · 1 1 · · · · · ·c2d+1 1 0 · · · 0 1 · · · · · ·

......

.... . .

...... · · · · · ·

c2d+1 1 1 · · · 1 1 · · · · · ·...

......

. . ....

... · · · · · ·

c(s0) = 0

Among c1, . . . , c2d, pick 2k concepts that correspond to codewords ofE : 0, 1k → 0, 1d on s1, . . . , sd

Page 75: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

18/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: Popt ≥ Ppgm ≥ P2opt (Barnum-Knill’02)

How does learning relate to identification?

Given |ψcz 〉 = |Ecz ,D 〉⊗T , learn cz approximately. Show T ≥ d/ε

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Learning cz approximately (wrt D) is equivalent to identifying z!

Page 76: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

18/ 23

Proof approach: Pretty Good Measurement

State identification: Ensemble E = (pz , |ψz 〉)z∈[m]

Given state |ψz 〉 ∈ E with prob pz Goal: identify z

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement

Crucial property: Popt ≥ Ppgm ≥ P2opt (Barnum-Knill’02)

How does learning relate to identification?

Given |ψcz 〉 = |Ecz ,D 〉⊗T , learn cz approximately. Show T ≥ d/ε

Suppose s0, . . . , sd is shattered by C. Fix a nasty distribution D:

D(s0) = 1− 16ε, D(si ) = 16ε/d on s1, . . . , sdLet E : 0, 1k → 0, 1d be a good error-correcting codes.t. k ≥ d/4 and dH(E (y),E (z)) ≥ d/8

Pick concepts czz∈0,1k ⊆ C: cz(s0) = 0, cz(si ) = E (z)i ∀ i

Learning cz approximately (wrt D) is equivalent to identifying z!

Page 77: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

19/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner (i.e.,

measurement) that identifies z from |ψcz 〉 = |Ecz ,D 〉⊗T withprobability ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 78: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

19/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner (i.e.,

measurement) that identifies z from |ψcz 〉 = |Ecz ,D 〉⊗T withprobability ≥ 1− δ

Goal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 79: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

19/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner (i.e.,

measurement) that identifies z from |ψcz 〉 = |Ecz ,D 〉⊗T withprobability ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 80: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

19/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner (i.e.,

measurement) that identifies z from |ψcz 〉 = |Ecz ,D 〉⊗T withprobability ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 81: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

19/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner (i.e.,

measurement) that identifies z from |ψcz 〉 = |Ecz ,D 〉⊗T withprobability ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm

≥ P2opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 82: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

19/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner (i.e.,

measurement) that identifies z from |ψcz 〉 = |Ecz ,D 〉⊗T withprobability ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 83: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

19/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner (i.e.,

measurement) that identifies z from |ψcz 〉 = |Ecz ,D 〉⊗T withprobability ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤

· · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 84: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

19/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner (i.e.,

measurement) that identifies z from |ψcz 〉 = |Ecz ,D 〉⊗T withprobability ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 85: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

19/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner (i.e.,

measurement) that identifies z from |ψcz 〉 = |Ecz ,D 〉⊗T withprobability ≥ 1− δGoal: Show T ≥ d/ε

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 86: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

20/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner that identifiesz from |ψcz 〉 = |Ecz ,D 〉⊗T with probability ≥ 1− δ

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 87: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

20/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner that identifiesz from |ψcz 〉 = |Ecz ,D 〉⊗T with probability ≥ 1− δ

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 88: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

20/ 23

Sample complexity lower bound via PGM

Recap

Learning cz approximately (wrt D) is equivalent to identifying z!

If sample complexity is T , then there is a good learner that identifiesz from |ψcz 〉 = |Ecz ,D 〉⊗T with probability ≥ 1− δ

Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have Ppgm ≥ P2

opt ≥ (1− δ)2

Ppgm ≤ · · · 4-page calculation · · · ≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)

Page 89: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 90: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ C

In realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 91: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 92: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 93: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 94: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 95: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error

≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 96: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 97: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 98: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 99: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 100: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

21/ 23

Agnostic learning

Lets get real!

So far, examples were generated according to a target concept c ∈ CIn realistic situations we could have “noisy” examples for the targetconcept, or maybe no fixed target concept even exists

How do we model this? Agnostic learning

Unknown distribution D on (x , `) generates examples

Suppose “best” concept in C has error OPT = minc∈C

Pr(x,`)∼D

[c(x) 6= `]

Goal of the agnostic learner: output h ∈ C with error ≤ OPT + ε

What about sample complexity?

Classical sample complexity: Θ(

dε2 + log(1/δ)

ε2

)[VC74,Tal94]

No quantum bounds known before (unlike PAC model)

We show the quantum examples do not reduce sample complexity

Page 101: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

22/ 23

Conclusion and future work

Conclusion

PAC and agnostic: Quantum examples are no better than classical

We also studied the model with random classification noise andshow that quantum examples are no better than classical

Future work

Quantum machine learning is still young! Don’t have convincingexamples where quantum significantly improve machine learning

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Page 102: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

22/ 23

Conclusion and future work

Conclusion

PAC and agnostic: Quantum examples are no better than classical

We also studied the model with random classification noise andshow that quantum examples are no better than classical

Future work

Quantum machine learning is still young! Don’t have convincingexamples where quantum significantly improve machine learning

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Page 103: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

22/ 23

Conclusion and future work

Conclusion

PAC and agnostic: Quantum examples are no better than classical

We also studied the model with random classification noise andshow that quantum examples are no better than classical

Future work

Quantum machine learning is still young! Don’t have convincingexamples where quantum significantly improve machine learning

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Page 104: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

22/ 23

Conclusion and future work

Conclusion

PAC and agnostic: Quantum examples are no better than classical

We also studied the model with random classification noise andshow that quantum examples are no better than classical

Future work

Quantum machine learning is still young! Don’t have convincingexamples where quantum significantly improve machine learning

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Page 105: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

22/ 23

Conclusion and future work

Conclusion

PAC and agnostic: Quantum examples are no better than classical

We also studied the model with random classification noise andshow that quantum examples are no better than classical

Future work

Quantum machine learning is still young! Don’t have convincingexamples where quantum significantly improve machine learning

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Page 106: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

22/ 23

Conclusion and future work

Conclusion

PAC and agnostic: Quantum examples are no better than classical

We also studied the model with random classification noise andshow that quantum examples are no better than classical

Future work

Quantum machine learning is still young! Don’t have convincingexamples where quantum significantly improve machine learning

Theoretically, one could consider more optimistic PAC-like modelswhere learner need not succeed ∀c ∈ C and ∀D

Page 107: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

23/ 23

Buffer 1: Proof approach via Information theory

Suppose s0, . . . , sd is shattered by C. By definition:∀a ∈ 0, 1d ∃c ∈ C s.t. c(s0) = 0, and c(si ) = ai ∀ i ∈ [d ]

Fix a nasty distribution D:

D(s0) = 1− 4ε, D(si ) = 4ε/d on s1, . . . , sd.Good learner produces hypothesis h s.t.h(si ) = c(si ) = ai for ≥ 3

4 of is

Think of c as uniform d-bit string A, approximated by h ∈ 0, 1dthat depends on examples B = (B1, . . . ,BT )

1 I (A : B) ≥ I (A : h(B)) ≥ Ω(d) [because h ≈ A]2 I (A : B) ≤

∑Ti=1 I (A : Bi ) = T · I (A : B1) [subadditivity]

3 I (A : B1) ≤ 4ε [because prob of useful example is 4ε]

This implies Ω(d) ≤ I (A : B) ≤ 4Tε, hence T = Ω( dε )

For analyzing quantum examples, only step 3 changes:

I (A : B1) ≤ O(ε log(d/ε)) ⇒ T = Ω( dε

1log(d/ε) )

Page 108: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

24/ 23

Buffer 2: Proof approach in detail

Suppose we’re given state |ψi 〉 with prob pi , i = 1, . . . ,m. Goal:learn i

Optimal measurement could be quite complicated,but we can always use the Pretty Good Measurement.

This has POVM operators

Mi = piρ−1/2|ψi 〉〈ψi |ρ−1/2, where ρ =

∑i pi |ψi 〉〈ψi |

Success probability of PGM: PPGM =∑

i piTr(Mi |ψi 〉〈ψi |)

Crucial property (BK’02): if POPT is the success probablity of theoptimal POVM, then POPT ≥ PPGM ≥ P2

OPT

Let G be the m ×m Gram matrix of the vectors√

pi |ψi 〉,then PPGM =

∑i

√G (i , i)2

Page 109: Optimal Quantum Sample Complexity of Learning Algorithms · 1/18/2017  · 1/ 23 Optimal Quantum Sample Complexity of Learning Algorithms Srinivasan Arunachalam (Joint work with Ronald

25/ 23

Buffer 3: Analysis of PGM

For the ensemble |ψcz 〉 : z ∈ 0, 1k with uniform probabilitiespz = 1/2k , we have PPGM ≥ (1− δ)2

Let G be the 2k × 2k Gram matrix of the vectors√

pz |ψcz 〉, then

PPGM =∑

z

√G (z , z)2

Gxy = g(x ⊕ y). Can diagonalize G using Hadamard transform, and

its eigenvalues will be 2k g(s). This gives√

G∑z

√G (z , z)2 ≤ · · · 4-page calculation · · · ≤

≤ exp(T 2ε2/d +√

Tdε− d − Tε)

This implies T = Ω(d/ε)