learning unitaries with gradient descent optimizationqist2019/slides/3rd/...gradient descent...

14
Learning Unitaries with gradient descent optimization Reevu Maity (Oxford) In progress with Bobak Kiani (MIT), Zi-Wen Liu (Perimeter), Seth Lloyd (MIT) & Milad Marvian(MIT) It from Qubit 2019 June 13

Upload: others

Post on 11-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Learning Unitaries with gradient descent optimization

Reevu Maity (Oxford)

In progress with Bobak Kiani (MIT), Zi-Wen Liu (Perimeter), Seth Lloyd (MIT) & Milad Marvian(MIT)

It from Qubit 2019 June 13

Page 2: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Outline

• We consider classical optimization algorithms to learn / simulate parametrized unitary transformations generated by two Hamiltonians applied in alternation ( QAOA ).

• Gradient Descent algorithms are first order classical methods widely studied in the machine learning community for convex optimization.

• Recently, it was shown that QAOA can be used for universal quantum computation.

S. Lloyd (2018)

Any unitary can be simulated with parameters via the alternating operator method.

S. Lloyd & RM (2019)

• Aim : Study the learnability / simulability of unitaries under the alternating operator / QAOA formalism with gradient descent.

Page 3: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Problem

QAOA Unitary :

are random matrices of dimension sampled from the GUE and .

Learning problem

• Given access to a target unitary and knowledge of , can we simulate

by a sequence using gradient descent on all

parameters such that ?

What is the time complexity = minimum number of parameters + total number

of gradient descent steps ?

• Suppose is a shallow depth unitary (say, depth-4 with parameters ),

can we find a sequence such that ?

Page 4: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Non-Convex OptimizationQAOA Unitary :

• The space of the set of unitaries is in general non-convex.

• Standard gradient descent algorithms do not converge in non-convex spaces.

• Gradient descent usually gets stuck at some local critical point where

.

Page 5: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Non-Convex Optimization

• Second order optimization techniques (eg. Newton’s method : calculate Hessian and then it’s inverse) require,

a) at least time for a Hessian matrix of dimension .

b) fine tuning of hyperparameters.

• Gradient descent methods can be powerful due to their computational efficiency from the above perspectives.

Require time to calculate gradients for parameters, fine tuning not required.

• Can gradient descent optimization enable us to learn paramterized/QAOA unitaries ?

Page 6: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Results so far

QAOA Unitary :

• We find that gradient descent optimization requires at least parameters in to approximate with accuracy .where is sampled from a parameter manifold of dimension .

The rate of learning increases when gradient descent is done in overparametrized spaces with dimension .

• We propose a greedy algorithm for learning low-depth in time ≪ . However the success probability of efficient learning in non-convex spaces is not ideal.

Page 7: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Gradient Descent

Some basic equations for gradient descent optimization ,

• Loss function : =

• Gradients : ,

• Learning rate : , fixed to a certain value during the entire iteration . e.g. = 0.001

• Parameter update : ,

Aim : Optimize to a desired accuracy .

Page 8: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Learning with gradient descent

In this work,

Loss Function : = .

Aim : Learn to an accuracy .

Simulations for 32 dimension target unitaries with or 512 parameters while varying the number of learning parameters in .

Page 9: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Gradient Descent Numerics

A `transition’ occurs when gradient descent is performed in the overparameterized domain, .

The rate of learning increases as we do gradient descent on more parameters beyond .

Page 10: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Gradient Descent Numerics (Contd.)

α = rate of learning For the first 200 gradient descent steps, Loss = κ (no. of grad. descent steps)−α

The underparametrized models learn following a power law while the overparametrized models learn faster than the power law.

Underparametrized Overparametrized

Page 11: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

A Greedy Algorithm for low depth QAOA unitary

Can we learn low depth with << parameters ?

A layer =

Pseudocode : Given access to with known .

1. 𝑎0 = Initial Loss = .

2. Add a layer to with parameters . Cost function = .

3. Perform gradient descent on to obtain optimized .

𝑎1 = Updated Loss = . and 𝑎1 < 𝑎0 .

4. Add a new layer with parameters to the layer in the previous step. Updated Cost

function = .

5. Perform gradient descent on to obtain optimized

𝑎2 = Updated Loss = and 𝑎2 < 𝑎1 .

6. Repeat the above for n steps till convergence i.e. 𝑎𝑛 < ϵ.

Page 12: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Greedy Algorithm Performance

• Approximating with depth-4 corresponding to n = 2, 3, 4, 5, 6 qubits .

• Succeeds in finding a sequence with at most 20-24 parameters and

• Success probability of learning in non-convex spaces is not ideal , between 0.1 and 0.15 .

• Usually gets stuck at some local critical point or saddle point .

Can we learn low depth with << parameters ?

Page 13: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

Learning with random local circuits A general learning setting Motivation : Study many-body dynamics / MBL .

Goal : Learn / Simulate with

without assuming knowledge of .

, are random matrices sampled from GUE.

Result : Simulates depth-4 when gradient

descent is done on all parameters.

Can the local circuit model simulate low depth

with << parameters ? Can it simulate

Haar random unitaries ?1 2 3 4 5 6

𝑈1

𝑉1

𝑈2

𝑉2

𝑈1 𝑉1𝑈2 𝑉2 …𝑈𝑁 𝑉𝑁

𝑡11 𝑡2

1 𝑡31

𝑡12 𝑡2

2 𝑡32

𝜏11 𝜏2

1 𝜏31 𝜏4

1

𝜏12 𝜏2

2 𝜏42𝜏3

2

Page 14: Learning Unitaries with gradient descent optimizationqist2019/slides/3rd/...Gradient Descent Numerics A `transition’ occurs when gradient descent is performed in the overparameterized

RemarksQAOA Unitary :

• Numerical simulations of the learnability of with at least parameters by gradient descent.

• A greedy algorithm for simulating short depth with << parameters. Success probability is not ideal.

In progress

• A rigorous justification of the requirement of more than parameters for learning

. Investigate the distribution of critical points in the loss function landscape.

• A local circuit model algorithm that can efficiently simulate low depth with

higher success probability than the greedy one.

• Noise resilience of simulating constant depth QAOA unitaries in NISQ devices.