arxiv:2011.04890v1 [quant-ph] 10 nov 2020 · 2020. 11. 11. · arxiv:2011.04890v1 [quant-ph] 10 nov...

13
Quantum reservoir computing: a reservoir approach toward quantum machine learning on near-term quantum devices Keisuke Fujii 1, * and Kohei Nakajima 2, 1 Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan. 2 Graduate School of Information Science and Technology, The University of Tokyo, Bunkyo-ku, 113-8656 Tokyo, Japan (Dated: November 11, 2020) Quantum systems have an exponentially large degree of freedom in the number of particles and hence provide a rich dynamics that could not be simulated on conventional computers. Quantum reservoir computing is an approach to use such a complex and rich dynamics on the quantum systems as it is for temporal machine learning. In this chapter, we explain quantum reservoir computing and related approaches, quantum extreme learning machine and quantum circuit learning, starting from a pedagogical introduction to quantum mechanics and machine learning. All these quantum machine learning approaches are experimentally feasible and effective on the state-of-the-art quantum devices. I. INTRODUCTION Over the past several decades, we have enjoyed expo- nential growth of computational power, namely, Moore’s law. Nowadays even smart phone or tablet PC is much more powerful than super computers in 1980s. Even though, people are still seeking more computational power, especially for artificial intelligence (machine learn- ing), chemical and material simulations, and forecasting complex phenomena like economics, weather and climate. In addition to improving computational power of conven- tional computers, i.e., more Moore’s law, a new genera- tion of computing paradigm has been started to be inves- tigated to go beyond Moore’s law. Among them, natural computing seeks to exploit natural physical or biological systems as computational resource. Quantum reservoir computing is an intersection of two different paradigms of natural computing, namely, quantum computing and reservoir computing. Regarding quantum computing, the recent rapid ex- perimental progress in controlling complex quantum sys- tems motivates us to use quantum mechanical law as a new principle of information processing, namely, quan- tum information processing [2, 3]. For example, certain mathematical problems, such as integer factorisation, which are believed to be intractable on a classical com- puter, are known to be efficiently solvable by a sophis- ticatedly synthesized quantum algorithm [4]. Therefore, considerable experimental effort has been devoted to real- ising full-fledged universal quantum computers [5, 6]. In the near feature, quantum computers of size > 50 qubits with fidelity > 99% for each elementary gate would ap- pear to achieve quantum computational supreamcy beat- ing simulation on the-state-of-the-art classical supercom- puters [7, 8]. While this does not directly mean that a quantum computer outperforms classical computers for * [email protected] k [email protected] a useful task like machine learning, now applications of such a near-term quantum device for useful tasks includ- ing machine leanring has been widely explored. On the other hand, quantum simulators are thought to be much easier to implement than a full-fledged universal quan- tum computer. In this regard, existing quantum sim- ulators have already shed new light on the physics of complex many-body quantum systems [9–11], and a re- stricted class of quantum dynamics, known as adiabatic dynamics, has also been applied to combinatorial opti- misation problems [12–15]. However, complex real-time quantum dynamics, which is one of the most difficult tasks for classical computers to simulate [16–18] and has great potential to perform nontrivial information process- ing, is now waiting to be harnessed as a resource for more general purpose information processing. Physical reservoir computing, which is the main sub- ject throughout this book, is another paradigm for ex- ploiting complex physical systems for information pro- cessing. In this framework, the low-dimensional input is projected to a high-dimensional dynamical system, which is typically referred to as a reservoir, generating tran- sient dynamics that facilitates the separation of input states [19]. If the dynamics of the reservoir involve both adequate memory and nonlinearity [20], emulating non- linear dynamical systems only requires adding a linear and static readout from the high-dimensional state space of the reservoir. A number of different implementations of reservoirs have been proposed, such as abstract dy- namical systems for echo state networks (ESNs) [21] or models of neurons for liquid state machines [22]. The im- plementations are not limited to programs running on the PC but also include physical systems, such as the surface of water in a laminar state [23], analogue circuits and optoelectronic systems [24–29], and neuromorphic chips [30]. Recently, it has been reported that the mechani- cal bodies of soft and compliant robots have also been successfully used as a reservoir [31–36]. In contrast to the refinements required by learning algorithms, such as in deep learning [37], the approach followed by reservoir arXiv:2011.04890v1 [quant-ph] 10 Nov 2020

Upload: others

Post on 24-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

Quantum reservoir computing: a reservoir approach toward quantum machinelearning on near-term quantum devices

Keisuke Fujii1, ∗ and Kohei Nakajima2, †

1Graduate School of Engineering Science, Osaka University,1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan.2Graduate School of Information Science and Technology,

The University of Tokyo, Bunkyo-ku, 113-8656 Tokyo, Japan(Dated: November 11, 2020)

Quantum systems have an exponentially large degree of freedom in the number of particles andhence provide a rich dynamics that could not be simulated on conventional computers. Quantumreservoir computing is an approach to use such a complex and rich dynamics on the quantum systemsas it is for temporal machine learning. In this chapter, we explain quantum reservoir computing andrelated approaches, quantum extreme learning machine and quantum circuit learning, starting froma pedagogical introduction to quantum mechanics and machine learning. All these quantum machinelearning approaches are experimentally feasible and effective on the state-of-the-art quantum devices.

I. INTRODUCTION

Over the past several decades, we have enjoyed expo-nential growth of computational power, namely, Moore’slaw. Nowadays even smart phone or tablet PC is muchmore powerful than super computers in 1980s. Eventhough, people are still seeking more computationalpower, especially for artificial intelligence (machine learn-ing), chemical and material simulations, and forecastingcomplex phenomena like economics, weather and climate.In addition to improving computational power of conven-tional computers, i.e., more Moore’s law, a new genera-tion of computing paradigm has been started to be inves-tigated to go beyond Moore’s law. Among them, naturalcomputing seeks to exploit natural physical or biologicalsystems as computational resource. Quantum reservoircomputing is an intersection of two different paradigmsof natural computing, namely, quantum computing andreservoir computing.

Regarding quantum computing, the recent rapid ex-perimental progress in controlling complex quantum sys-tems motivates us to use quantum mechanical law as anew principle of information processing, namely, quan-tum information processing [2, 3]. For example, certainmathematical problems, such as integer factorisation,which are believed to be intractable on a classical com-puter, are known to be efficiently solvable by a sophis-ticatedly synthesized quantum algorithm [4]. Therefore,considerable experimental effort has been devoted to real-ising full-fledged universal quantum computers [5, 6]. Inthe near feature, quantum computers of size > 50 qubitswith fidelity > 99% for each elementary gate would ap-pear to achieve quantum computational supreamcy beat-ing simulation on the-state-of-the-art classical supercom-puters [7, 8]. While this does not directly mean that aquantum computer outperforms classical computers for

[email protected]† k [email protected]

a useful task like machine learning, now applications ofsuch a near-term quantum device for useful tasks includ-ing machine leanring has been widely explored. On theother hand, quantum simulators are thought to be mucheasier to implement than a full-fledged universal quan-tum computer. In this regard, existing quantum sim-ulators have already shed new light on the physics ofcomplex many-body quantum systems [9–11], and a re-stricted class of quantum dynamics, known as adiabaticdynamics, has also been applied to combinatorial opti-misation problems [12–15]. However, complex real-timequantum dynamics, which is one of the most difficulttasks for classical computers to simulate [16–18] and hasgreat potential to perform nontrivial information process-ing, is now waiting to be harnessed as a resource for moregeneral purpose information processing.

Physical reservoir computing, which is the main sub-ject throughout this book, is another paradigm for ex-ploiting complex physical systems for information pro-cessing. In this framework, the low-dimensional input isprojected to a high-dimensional dynamical system, whichis typically referred to as a reservoir, generating tran-sient dynamics that facilitates the separation of inputstates [19]. If the dynamics of the reservoir involve bothadequate memory and nonlinearity [20], emulating non-linear dynamical systems only requires adding a linearand static readout from the high-dimensional state spaceof the reservoir. A number of different implementationsof reservoirs have been proposed, such as abstract dy-namical systems for echo state networks (ESNs) [21] ormodels of neurons for liquid state machines [22]. The im-plementations are not limited to programs running on thePC but also include physical systems, such as the surfaceof water in a laminar state [23], analogue circuits andoptoelectronic systems [24–29], and neuromorphic chips[30]. Recently, it has been reported that the mechani-cal bodies of soft and compliant robots have also beensuccessfully used as a reservoir [31–36]. In contrast tothe refinements required by learning algorithms, such asin deep learning [37], the approach followed by reservoir

arX

iv:2

011.

0489

0v1

[qu

ant-

ph]

10

Nov

202

0

Page 2: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

2

computing, especially when applied to real systems, isto find an appropriate form of physics that exhibits richdynamics, thereby allowing us to outsource a part of thecomputation.

Quantum reservoir computing (QRC) was born in themarriage of quantum computing and physical reservoircomputing above to harness complex quantum dynamicsas a reservoir for real-time machine learning tasks [38].Since the idea of QRC has been proposed in Ref. [38],its proof-of-principle experimental demonstration for nontemporal tasks [39] and performance analysis and im-provement [40–42] has been explored. The QRC ap-proach to quantum tasks such as quantum tomographyand quantum state preparation has been recently gar-nering attention [71–73]. In this book chapter, we willprovide a broad picture of QRC and related approachesstarting from a pedagogical introduction to quantum me-chanics and machine learning.

The rest of this paper is organized as follows. In Sec II,we will provide a pedagogical introduction to quantummechanics for those who are not familiar to it and fix ournotation. In Sec III, we will briefly mention to severalmachine learning techniques like, linear and nonlinear re-gressions, temporal machine learning tasks and reservoircomputing. In Sec IV, we will explain QRC and relatedapproaches, quantum extreme learning machine [39] andquantum circuit learning [43]. The former is a frame-work to use quantum reservoir for non temporal tasks,that is, the input is fed into a quantum system, and gen-eralization or classification tasks are performed by a lin-ear regression on a quantum enhanced feature space. Inthe latter, the parameters of the quantum system is fur-ther fine-tuned via the gradient descent by measuring ananalytically obtained gradient, just like the back propa-gation for feedforward neural networks. Regarding QRC,we will also see chaotic time series predictions as demon-strations. Sec. V is devoted to conclusion and discussion.

II. PEDAGOGICAL INTRODUCTION TOQUANTUM MECHANICS

In this section, we would like to provide a pedagogicalintroduction to how quantum mechanical systems workfor those who are not familiar to quantum mechanics.If you already familiar to quantum mechanics and itsnotations, please skip to Sec. III.

A. Quantum state

A state of a quantum system is described by a statevector,

|ψ〉 =

c1...cd

(1)

on a complex d-dimensional system Cd, where the symbol|·〉 is called ket and indicates a complex column vector.Similarly, 〈·| is called bra and indicates a complex rowvector, and they are related complex conjugate,

〈ψ| = |ψ〉† =(c∗1 · · · c∗d

). (2)

With this notation, we can writte an inner product oftwo quantum state |ψ〉 and |φ〉 by 〈ψ|φ〉. Let us definean orthogonal basis

|1〉 =

10......0

, ... |k〉 =

0...10...

, ... |d〉 =

0.........d

, (3)

a quantum state in the d-dimensional system can be de-scribed simply by

|ψ〉 =

d∑i=1

ci|i〉. (4)

The state is said to be a superposition state of |i〉. The co-efficients {ci} are complex, and called complex probabilityamplitudes. If we measure the system in the basis {|i〉},we obtain the measurement outcome i with a probability

pi = |〈i|ψ〉|2 = |ci|2, (5)

and hence the complex probability amplitudes have to benormalized as follows

|〈ψ|ψ〉|2 =

d∑i=1

|ci|2 = 1. (6)

In other words, a quantum state is represented as a nor-malized vector on a complex vector space.

Suppose the measurement outcome i corresponds to acertain physical value ai, like energy, magnetization andso on, then the expectation value of the physical valuableis given by ∑

i

aipi = 〈ψ|A|ψ〉 ≡ 〈A〉, (7)

where we define an hermitian operator

A =∑i

ai|i〉〈i|, (8)

which is called observable, and has the information of themeasurement basis and physical valuable.

The state vector in quantum mechanics is similar toa probability distribution, but essentially different formit, since it is much more primitive; it can take complexvalue and is more like a square root of a probability. Theunique features of the quantum systems come from thisproperty.

Page 3: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

3

B. Time evolution

The time evolution of a quantum system is determinedby a Hamiltonian H, which is a hermitian operator actingon the system. Let us denote a quantum state at timet = 0 by |ψ(0)〉. The equation of motion for quantummechanics, so-called Schrodinger equation, is given by

i∂

∂t|ψ(t)〉 = H|ψ(t)〉. (9)

This equation can be formally solved by

|ψ(t)〉 = e−iHt|ψ(0)〉. (10)

Therefore the time evolution is given by an operatore−iHt, which is a unitary operator and hence the normof the state vector is preserved, meaning the probabilityconservation. In general, the Hamiltonian can be timedependent. Regarding the time evolution, if you are notinterested in the continuous time evolution, but in justits input and output relation, then the time evolution isnothing but a unitary operator U

|ψout〉 = U |ψin〉. (11)

In quantum computing, the time evolution U is some-times called quantum gate.

C. Qubits

The smallest nontrivial quantum system is a two-dimensional quantum system C2, which is called quantumbit or qubit:

α|0〉+ β|1〉, (|α|2 + |β|2 = 1). (12)

Suppose we have n qubits. The n-qubit system is de-fined by a tensor product space (C2)⊗n of each two-dimensional system as follows. A basis of the systemis defined by a direct product of a binary state |xk〉 withxk ∈ {0, 1},

|x1〉 ⊗ |x2〉 ⊗ · · · ⊗ |xn〉, (13)

which is simply denoted by

|x1x2 · · ·xn〉. (14)

Then a state of the n-qubit system can be described as

|ψ〉 =∑

x1,x2,...,xn

αx1,x2,...,xn |x1x2 · · ·xn〉. (15)

The dimension of the n-qubit system is 2n, and hencethe tensor product space is nothing but a 2n-dimensionalcomplex vector space C2n . The dimension of the n-qubitsystem increases exponentially in the number n of thequbits.

D. Density operator

Next, I would like to introduce operator formalism ofthe above quantum mechanics. This describes an exactlythe same thing but sometimes the operator formalismwould be convenient. Let us consider an operator ρ con-structed from the state vector |ψ〉:

ρ = |ψ〉〈ψ|. (16)

If you chose the basis of the system {|i〉} for the matrixrepresentation, then the diagonal elements of ρ corre-sponds the probability distribution pi = |ci|2 when thesystem is measured in the basis {|i〉}. Therefore the op-erator ρ is called a density operator. The probabilitydistribution can also be given in terms of ρ by

pi = Tr[|i〉〈i|ρ], (17)

where Tr is the matrix trace. An expectation value of anobservable A is given by

〈A〉 = Tr[Aρ]. (18)

The density operator can handle a more general situationwhere a quantum state is sampled form a set of quantumstates {|ψk〉} with a probability distribution {qk}. In thiscase, if we measure the system in the basis {|i〉〈i|}, theprobability to obtain the measurement outcome i is givenby

pi =∑k

qkTr[|i〉〈i|ρk], (19)

where ρk = |ψk〉〈ψk|. By using linearity of the tracefunction, this reads

pi = Tr[|i〉〈i|∑k

qkρk]. (20)

Now we interpret that the density operator is given by

ρ =∑k

qk|ψk〉〈ψk|. (21)

In this way, a density operator can represent classicalmixture of quantum states by a convex mixture of den-sity operators, which is convenient in many cases. Ingeneral, a positive and hermitian operator ρ being sub-ject to Tr[ρ] = 1 can be a density operator, since it canbe interpreted as a convex mixture of quantum states viaspectral decomposition:

ρ =∑

λi|λi〉〈λi|, (22)

where {|λi〉} and {λi} are the eigenstates and eigen-vectors respectively. Because of Tr[ρ] = 1, we have∑i λi = 1.From its definition, the time evolution of ρ can be given

by

ρ(t) = e−iHtρ(0)eiHt (23)

Page 4: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

4

or

ρout = UρinU†. (24)

Moreover, we can define more general operations for thedensity operators. For example, if we apply unitary op-erators U and V with probabilities p and (1− p), respec-tively, then we have

ρout = pUρU† + (1− p)V ρV †. (25)

As another example, if we perform the measurement ofρ in the basis {|i〉}, and we forget about the measure-ment outcome, then the state is now given by a densityoperator ∑

i

Tr[|i〉〈i|ρ]|i〉〈i| =∑i

|i〉〈i|ρ|i〉〈i|. (26)

Therefore if we define a map from a density operator toanother, which we call superoperator,

M(· · · ) =∑i

|i〉〈i|(· · · )|i〉〈i|, (27)

the above non-selective measurement (forgetting aboutthe measurement outcomes) is simply written by

M(ρ). (28)

In general, any physically allowed quantum operation Kthat maps a density operator to another can be repre-sented in terms of a set of operators {Ki} being subject

to K†iKi = I with an identity operator I:

K(ρ) =∑i

KiρK†i . (29)

The operators {Ki} are called Kraus operators.

E. Vector representation of density operators

Finally, we would like to introduce a vector represen-tation of the above operator formalism. The operatorsthemselves satisfy axioms of the linear space. Moreover,we can also define an inner product for two operators,so-called Hilbert-Schmidt inner product, by

Tr[A†B]. (30)

The operators on the n-qubit system can be spanned bythe tensor product of Pauli operators {I,X, Y, Z}⊗n,

P (i) =

n⊗k=1

σi2k−1i2k . (31)

where σij is the Pauli operators:

I = σ00 =

(1 00 1

), X = σ10 =

(0 11 0

),

Z = σ01 =

(1 00 −1

), Y = σ11 =

(0 −ii 0

). (32)

Since the Pauli operators constitute a complete basis onthe operator space, any operator A can be decomposedinto a linear combination of P (i),

A =∑i

aiP (i). (33)

The coefficient ai can be calculated by using the Hilbert-Schmidt inner product as follows:

ai = Tr[P (i)A]/2n, (34)

by virtue of the orthogonality

Tr[P (i)P (j)]/2n = δi,j . (35)

The number of the n-qubit Pauli operators {P (i)} is 4n,and hence a density operator ρ of the n-qubit system canbe represented as a 4n-dimensional vector

r =

r00...0...

r11...1

, (36)

where r00...0 = 1/2n because of Tr[ρ] = 1. Moreover,because P (i) is hermitian, r is a real vector. The super-operator K is a linear map for the operator, and hencecan be represented as a matrix acting on the vector r:

ρ′ = K(ρ)⇔ r′ = Kr, (37)

where the matrix element is given by

Kij = Tr[P (i)K (P (j))]/2n. (38)

In this way, a density operator ρ and a quantum oper-ation K on it can be represented by a vector r and amatrix K, respectively.

III. MACHINE LEARNING AND RESERVOIRAPPROACH

In this section, we briefly introduce machine learningand reservoir approaches.

A. Linear and nonlinear regression

A supervised machine learning is a task to construct amodel f(x) from a given set of teacher data {x(j), y(j)}and to predict the output of an unknown input x. Sup-pose x is a d-dimensional data, and f(x) is one dimen-sional, for simplicity. The simplest model is linear regres-sion, which models f(x) as a linear function with respectto the input:

f(x) =

d∑i=1

wixi + w0. (39)

The weights {wi} and bias w0 are chosen such that an er-ror between f(x) and the output of the teacher data, i.e.

Page 5: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

5

loss, becomes minimum. If we employ a quadratic loss

function for given teacher data {{x(j)i }, y(j)}, the prob-lem we have to solve is as follows:

min{wi}

∑j

(

d∑i=0

wix(j)i − y(j))2, (40)

where we introduced a constant node x0 = 1. This cor-responds to solving a superimposing equations:

y = Xw, (41)

where yj = y(j), Xji = x(j)i , and wi = wi. This can be

solved by using the Moore-Penrose pseudo inverse X+,which can be defined from the singular value decomposi-tion of X = UDV T to be

X+ = V DUT . (42)

Unfortunately, the linear regression results in a poorperformance in complicated machine learning tasks, andany kind of nonlinearity is essentially required in themodel. A neural network is a way to introduce non-linearity to the model, which is inspired by the humanbrain. In the neural network, the d-dimensional inputdata x is fed into N -dimensional hidden nodes with anN × d input matrix W in:

W inx. (43)

Then each element of the hidden nodes is now processedby a nonlinear activation function σ such as tanh, whichis denoted by

σ(W inx). (44)

Finally the output is extracted by an output weight W out

(1×N dimensional matrix):

W outσ(W inx). (45)

The parameters in W in and W out are trained such thatthe error between the output and teacher data becomesminimum. While this optimization problem is highlynonlinear, a gradient based optimization, so-called backpropagation, can be employed. To improve a representa-tion power of the model, we can concatenate the lineartransformation and the activation function as follows:

W outσ(W (l) · · ·σ

(W (1)σ(W inx)

)), (46)

which is called multi-layer perceptron or deep neural net-work.

B. Temporal task

The above task is not a temporal task, meaning thatthe input data is not sequential but given simultaneouslylike the recognition task of images for hand written lan-guage, pictures and so on. However, for a recognitionof spoken language or prediction of time series like stock

market, which are called temporal tasks, the network hasto handle the input data that is given in a sequential way.To do so, the recurrent neural network feeds the previ-ous states of the nodes back into the states of the nodesat next step, which allows the network to memorize thepast input. In contrast, the neural network without anyrecurrency is called a feedforward neural network.

Let us formalize a temporal machine learning task withthe recurrent neural network. For given input time series{xk}Lk=1 and target time series {yk}Lk=1, a temporal ma-chine learning is a task to generalize a nonlinear function,

yk = f({xj}kj=1). (47)

For simplicity, we consider one-dimensional input andoutput time series, but their generalization to a multi-dimensional case is straightforward. To learn the non-linear function f({xj}kj=1), the recurrent neural networkcan be employed as a model. Suppose the recurrent neu-ral network consists of m nodes and is denoted by m-dimensional vector

r =

r1...rm

. (48)

To process the input time series, the nodes evolve by

r(k + 1) = σ[Wr(k) +W inxk], (49)

where W is an m ×m transition matrix and W in is anm × 1 input weight matrix. Nonlinearity comes fromthe nonlinear function σ applied on each element of thenodes. The output time series from the network is definedin terms of a 1×m readout weights by

yk = W outr(k). (50)

Then the learning task is to determine the parameters inW in, W , and W out by using the teacher data {xk, yk}Lk=1so as to minimize an error between the teacher {yk} andthe output {yk} of the network.

C. Reservoir approach

While the representation power of the recurrent neu-ral network can be improved by increasing the numberof the nodes, it makes the optimization process of theweights hard and unstable. Specifically, the back prop-agation based methods always suffer from the vanishinggradient problem. The idea of reservoir computing is toresolve this problem by mapping an input into a complexhigher dimensional feature space, i.e., reservoir, and byperforming simple linear regression on it.

Let us first see a reservoir approach on a feedforwardneural network, which is called extreme learning ma-chine [44]. The input data x is fed into a network likemulti-layer perceptron, where all weights are chosen ran-domly. The states of the hidden nodes at some layer is

Page 6: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

6

now regarded as basis functions of the input x in thefeature space:

{φ1(x), φ2(x), ..., φN (x)}. (51)

Now the output is defined as a linear combination of these∑i

wiφi(x) + w0 (52)

and hence the coefficients are determined simply by thelinear regression as mentioned before. If the dimen-sion and nonlinearity of the the basis functions are highenough, we can model a complex task simply by the lin-ear regression.

The echo state network is similar but employs the reser-voir idea for the recurrent neural network [21, 22, 45],which has been proposed before extreme learning ma-chine appeared. To be specific, the input weights W in

and weight matrix W are both chosen randomly up toan appropriate normalization. Then the learning task isdone by finding the readout weights W out to minimizethe mean square error∑

k

(yk − yk)2. (53)

This problem can be solved stably by using the pseudoinverse as we mentioned before.

For both feedforward and recurrent types, the reservoirapproach does not need to tune the internal parametersof the network depending on the tasks as long as it possessufficient complexity. Therefore, the system, to which themachine learning tasks are outsourced, is not necessarilythe neural network anymore, but any nonlinear physicalsystem of large degree of freedoms can be employed asa reservoir for information processing, namely, physicalreservoir computing [23–36].

IV. QUANTUM MACHINE LEARNING ONNEAR-TERM QUANTUM DEVICES

In this section, we will see QRC and related frame-works for quantum machine learning. Before going deepinto the temporal tasks done on QRC, we first explainhow complicated quantum natural dynamics can be ex-ploit as generalization and classification tasks. This canbe viewed as a quantum version of extreme learning ma-chine [39]. While it is an opposite direction to reser-voir computing, we will also see quantum circuit learning(QCL) [43], where the parameters in the complex dy-namics is further tuned in addition to the linear readoutweights. QCL is a quantum version of a feedforward neu-ral network. Finally, we will explain quantum reservoircomputing by extending quantum extreme learning ma-chine for temporal learning tasks.

A. Quantum extreme learning machine

The idea of quantum extreme learning machine lies inusing a Hilbert space, where quantum states live, as anenhanced feature space of the input data. Let us denotethe set of input and teacher data by {x(j), y(j)}. Supposewe have an n-qubit system, which is initialized to

|0〉⊗n. (54)

In order to feed the input data into quantum system, aunitary operation parameterized by x, say V (x), is ap-plied on the initial state:

V (x)|0〉⊗n. (55)

For example, if x is one-dimensional data and normal-ized to be 0 ≤ x ≤ 1, then we may employ the Y -basisrotation e−iθY with an angle θ = arccos(

√x):

e−iθY |0〉 =√x|0〉+

√1− x|1〉. (56)

The expectation value of Z with respect to e−iθY |0〉 be-comes

〈Z〉 = 2x− 1, (57)

and hence is linearly related to the input x. To enhancethe power of quantum enhanced feature space, the inputcould be transformed by using a nonlinear function φ:

θ = arccos(√φ(x)). (58)

The nonlinear function φ could be, for example, hyper-bolic tangent, Legendre polynomial, and so on. Forsimplicity, below we will use the simple linear inputθ = arccos(

√x).

If we apply the same operation on each of the n qubits,we have

V (x)|0〉⊗n = (√x|0〉+

√1− x|1〉)⊗n

= (1− x)n/2∑

i1,...,in

∏k

√x

1− x

ik

|i1, ..., in〉.

Therefore, we have coefficients that are nonlinear withrespect to the input x because of the tensor productstructure. Still the expectation value of the single qubitoperator Zk on the kth qubit is 2x − 1. However, if wemeasure a correlated operator like Z1Z2, we can obtain asecond order nonlinear output

〈Z1Z2〉 = (2x− 1)2 (59)

with respect to the input x. To measure a correlatedoperator, it is enough to apply an entangling unitary op-eration like CNOT gate Λ(X) = |0〉〈0| ⊗ I + |1〉〈1| ⊗X:

〈ψ|Λ1,2(X)Z1Λ1,2(X)|ψ〉 = 〈ψ|Z1Z2|ψ〉. (60)

In general, an n-qubit unitary operation U transforms theobservable Z under the conjugation into a linear combi-nation of Pauli operators:

U†Z1U =∑i

αiP (i). (61)

Page 7: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

7

FIG. 1. The expectation value 〈Z〉 of the output of a quantumcircuit as a function of the input (x0, x1).

Thus if you measure the output of the quantum circuitafter applying a unitary operation U ,

UV (x)|0〉⊗n, (62)

you can get a complex nonlinear output, which couldbe represented as a linear combination of exponentiallymany nonlinear functions. U should be chosen to be ap-propriately complex with keeping experimental feasibilitybut not necessarily fine-tuned.

To see how the output behaves in a nonlinear way withrespect to the input, in Fig. 1, we will plot the output〈Z〉 for the input (x0, x1) and n = 8, where the inputsare fed into the quantum state by the Y -rotation withangles

θ2k = k arccos(√x0) (63)

θ2k+1 = k arccos(√x1) (64)

on the 2kth and (2k+ 1)th qubits, respectively. Regard-ing the unitary operation U , random two-qubit gates aresequentially applied on any pairs of two qubits on the8-qubit system.

Suppose the Pauli Z operator is measured on eachqubit as an observable. Then we have

zi = 〈Zi〉, (65)

for each qubit. In quantum extreme learning machine,the output is defined by taking linear combination ofthese n output:

y =

n∑i=1

wizi. (66)

Now the linear readout weights {wi} are tuned so thatthe quadratic loss function

L =∑j

(y(j) − y(j))2 (67)

(a)

(b) training data leaned output

threshold at 0.5

|0i|0i|0i|0i|0i|0i|0i|0i

✓1✓2✓3✓4

✓5✓6✓7

✓0

×2 1.0

x0 x1

FIG. 2. (a) The quantum circuit for quantum extreme learn-ing machine. The box with thetak indicates Y -rotations byangles θk. The red and blue boxes correspond to X and Zrotations by random angles, Each dotted-line box representa two-qubit gate consisting of two controlled-Z gates and 8X-rotations and 4 Z-rotations. As denoted by the dashed-line box, the sequence of the 7 dotted boxes is repeated twice.The readout is defined by a linear combination of 〈Zi〉 withconstant bias term 1.0 and the input (x0, x1). (b) (Left) Thetraining data for a two-class classification problem. (Middle)The readout after learning. (Right) Prediction from the read-out with threshold at 0.5.

becomes minimum. As we mentioned previously, this canbe solved by using the pseudo inverse. In short, quantumextreme learning machine is a linear regression on a ran-domly chosen nonlinear basis functions, which come fromthe quantum state in a space of an exponentially large di-mension, namely quantum enhanced feature space. Fur-thermore, under some typical nonlinear function and uni-tary operations settings to transform the observables, theoutput in Eq. (66) can approximate any continuous func-tion of the input. This property is known as the universalapproximation property (UAP), which implies that thequantum extreme learning machine can handle a wideclass of machine learning tasks with at least the samepower as the classical extreme learning machine [75].

Here we should note that a similar approach, quan-tum kernel estimation, has been taken in Ref. [46]. Inquantum extreme learning machine, a classical featurevector φi(x) ≡ 〈Φ(x)|Zi|Φ(x)〉 is extracted from observ-ables on the quantum feature space |Φ(x)〉 ≡ V (x)|0〉⊗n.Then linear regression is taken by using the classicalfeature vector. On the other hand, in quantum kernelestimation, quantum feature space is fully employed byusing support vector machine with the kernel functionsK(x, x′) ≡ 〈Φ(x)|Φ(x′)〉, which can be estimated on aquantum computer. While classification power would bebetter for quantum kernel estimation, it requires morequantum computational costs both for learning and pre-diction in contrast to quantum extreme learning machine.

In Fig. 2, we demonstrate quantum extreme learn-ing machine for a two-class classification task of a two-

Page 8: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

8

dimensional input 0 ≤ x0, x1 ≤ 1. Class 0 and 1 aredefined to be those being subject to (x0 − 0.5)2 + (x1 −0.5)2 ≤ 0.15 and > 0.15, respectively. The linear read-out weights {wi} are learned with 1000 randomly chosentraining data and prediction is performed with 1000 ran-domly chosen inputs. The class 0 and 1 are determinedwhether or not the output y is larger than 0.5. Quan-tum extreme learning machine with an 8-qubit quantumcircuit shown in Fig. 2 (a) succeeds to predict the classwith 95% accuracy. On the other hand, a simple linearregression for (x0, x1) results in 39%. Moreover, quan-tum extreme learning machine with U = I, meaning noentangling gate, also results in poor, 42%. In this way,the feature space enhanced by quantum entangling op-erations is important to obtain a good performance inquantum extreme learning machine.

B. Quantum circuit learning

In the split of reservoir computing, dynamics of a phys-ical system is not fine-tuned but natural dynamics of thesystem is harnessed for machine learning tasks. However,if we see the-state-of-the-art quantum computing devices,the parameter of quantum operations can be finely tunedas done for universal quantum computing. Therefore itis natural to extend quantum extreme learning machineby tuning the parameters in the quantum circuit just likefeedfoward neural networks with back propagation.

Using parameterized quantum circuits for supervisedmachine leaning tasks such as generalization of nonlinearfunctions and pattern recognitions have been proposedin Refs. [43, 47], which we call quantum circuit learning.Let us consider the same situation with quantum extremelearning machine. The state before the measurement isgiven by

UV (x)|0〉⊗n. (68)

In the case of quantum extreme learning machine theunitary operation for a nonlinear transformation withrespect to the input parameter x is randomly chosen.However, the unitary operation U may also be parame-terized:

U({φk}) =∏k

u(φk). (69)

Thereby, the output from the quantum circuit with re-spect to an observable A

〈A({φk}, x)〉 = 〈0|⊗nV †(x)U({φk})†ZiU({φk})V (x)|0〉⊗n

becomes a function of the circuit parameters {φk} in ad-dition to the input x. Then the parameters {φk} is tunedso as to minimize the error between teacher data and theoutput, for example, by using the gradient just like theoutput of the feedforward neural network.

Let us define a teacher dataset {x(j), y(j)} and aquadratic loss function

L({φk}) =∑j

(〈A({φk}, x(j))〉 − y(j))2. (70)

The gradient of the loss function can be obtained as fol-lows:

∂φlL({φk}) =

∂φl

∑j

(〈A({φk}, x(j))〉 − y(j))2

=∑j

2(〈A({φk}, x(j))〉 − y(j))∂

∂φl〈A({φk}, x(j))〉.

Therefore if we can measure the gradient of the observ-able 〈A({φk}, x(j))〉, the loss function can be minimizedaccording to the gradient descent.

If the unitary operation u(φk) is given by

u(φk) = Wke−i(φk/2)Pk , (71)

where Wk is an arbitrary unitary, and Pk is a Pauli oper-ator. Then the partial derivative with respect to the lthparameter can be analytically calculated from the out-puts 〈A({φk}, x(j))〉 with shifting the lth parameter by±ε [43, 61]:

∂φl〈A({φk}, x(j))〉

=1

2 sin ε(〈A({φ1, ..., φl + ε, φl+1, ...}, x(j))〉

− 〈A({φ1, ..., φl − ε, φl+1, ...}, x(j))〉).

By considering the statistical error to measure the ob-servable 〈A〉, ε should be chosen to be ε = π/2 so as tomake the denominator maximum. After measuring thepartial derivatives for all parameters φk and calculatingthe gradient of the loss function L({φk}), the parametersare now updated by the gradient descent:

θ(m+1)l = θ

(m)l − α ∂

∂φlL({φk}). (72)

The idea of using the parameterized quantum cir-cuits for machine learning is now widespread. Afterthe proposal of quantum circuit learning based on theanalytical gradient estimation above [43] and a similaridea [47], several researches have been performed withvarious types of parameterized quantum circuits [48–52]and various models and types of machine learning includ-ing generative models [54, 55] and generative adversarialmodels [56–58]. Moreover, an expression power of the pa-rameterized quantum circuits and its advantage againstclassical probabilistic models have been investigated [59].Experimentally feasible ways to measure an analyticalgradient of the parameterized quantum circuits have beeninvestigated [60–62]. An advantage of using such a gradi-ent for the parameter optimization has been also arguedin a simple setting [63], while the parameter tuning be-comes difficult because of the vanishing gradient by anexponentially large Hilbert space [64]. Software librariesfor optimizing parameterized quantum circuits are nowdeveloping [65, 66]. Quantum machine learning on near-term devices, especially for quantum optical systems, is

Page 9: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

9

x’i (t )

input

(a) Quantum Reservoir Computing

output

virtual nodes}

true nodes

hidden nodes

LR

linear readout weights

0 0.2 0.4 0.6 0.8

1 Input

0 0.2 0.4 0.6 0.8

1 Input

0 0.2 0.4 0.6 0.8

1 Input

0 0.2 0.4 0.6 0.8

1 Input

0 0.2 0.4 0.6 0.8

1 Input

virtual node

12

34

5true node

virtual node 0

0.2

0.4

0.6

0.8

1

………

(b) virtual nodes

rl

yk

{W outl,v }l,v

FIG. 3. (a) Quantum reservoir computing. (b) Virtual nodesand temporal multiplexing.

proposed in Refs [67, 68]. Quantum circuit learning withparameterized quantum circuits has been already exper-imentally demonstrated on superconducting qubit sys-tems [46, 69] and a trapped ion system [70].

C. Quantum reservoir computing

Now we return to the reservoir approach and extendquantum extreme learning machine from non temporaltasks to temporal ones, namely, quantum reservoir com-puting [38]. We consider a temporal task, which we ex-plained in Sec. III B. The input is given by a time series{xk}Lk and the purpose is to learn a nonlinear temporalfunction:

yk = f({xj}kj ). (73)

To this end, the target time series {yk}Lk=1 is also pro-vided as teacher.

Contrast to the previous setting with non temporaltasks, we have to fed input into a quantum system se-quentially. This requires us to perform an initializa-tion process during computation, and hence the quan-tum state of the system becomes mixed state. Therefore,in the formulation of QRC, we will use the vector rep-resentation of density operators, which was explained inSec. II E.

In the vector representation of density operators, thequantum state of an N -qubit system is given by a vector

in a 4N -dimensional real vector space, r ∈ R4N . In QRC,similarly to recurrent neural networks, each element ofthe 4N -dimensional vector is regarded as a hidden nodeof the network. As we seen in Sec. II E, any physicaloperation can be written as a linear transformation ofthe real vector by a 4N × 4N matrix W :

r′ = Wr. (74)

Now we see, from Eq. (74), a time evolution similarto the recurrent neural network, r′ = tanh(Wr). How-ever, there is no nonlinearity such as tanh in each quan-tum operation W . Instead, the time evolution W canbe changed according to the external input xk, namelyWxk

, which contrasts to the conventional recurrent neu-ral network where the input is fed additivelyWr+W inxk.This allows the quantum reservoir to process the inputinformation {xk} nonlinearly, by repetitively feeding theinput.

Suppose the input {xk} is normalized such that 0 ≤xk ≤ 1. As an input, we replace a part of the qubits tothe quantum state. The density operator is given by

ρxk=I + (2xk − 1)Z

2. (75)

For simplicity, below we consider the case where only onequbit is replaced for the input. Corresponding matrix Sxk

is given by

(Sxk)ji = Tr

{P (j)

I + (2xk − 1)Z

2⊗ Trreplace[P (i)]

}/2N ,

where Trreplace indicates a partial trace with respect tothe replaced qubit. With this definition, we have

ρ′ = Trreplace[ρ]⊗ ρxk⇔ r′ = Sxk

r. (76)

The unitary time evolution, which is necessary to ob-tain a nonlinear behavior with respect to the input valu-able xk, is taken as a Hamiltonian dynamics e−iHτ fora given time interval τ . Let us denote its representationon the vector space by Uτ :

ρ′ = e−iHτρeiHτ ⇔ r′ = Uτr. (77)

Then, a unit time step is written as an input-dependinglinear transformation:

r((k + 1)τ) = UτSxkr(kτ). (78)

where r(kτ) indicates the hidden nodes at time kτ .Since the number of the hidden nodes are exponen-

tially large, it is not feasible to observe all nodes fromexperiments. Instead, a set of observed nodes {rl}Ml=1,which we call true nodes, is defined by a M × 4N matrixR,

rl(kτ) =∑i

Rliri(kτ). (79)

The number of true nodes M has to be a polynomialin the number of qubits N . That is, from exponentiallymany hidden nodes, a polynomial number of true nodesare obtained to define the output from QR (see Fig. 3(a)):

yk =∑l

W outl rl(kτ), (80)

where Wout is the readout weights, which is obtainedby using the training data. For simplicity, we take the

Page 10: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

10

single-qubit Pauli Z operator on each qubit as the truenodes, i.e.,

rl = Tr[Zlρ], (81)

so that if there is no dynamics these nodes simply providea linear output (2xk − 1) with respect to the input xk.

Moreover, in order to improve the performance we alsoperform the temporal multiplexing. The temporal mul-tiplexing has been found to be useful to extract com-plex dynamics on the exponentially large hidden nodesthrough the restricted number of the true nodes [38]. Intemporal multiplexing, not only the true nodes just af-ter the time evolution Uτ , also at each of the subdividedV time intervals during the unitary evolution Uτ to con-struct V virtual nodes, as shown in Fig. 3 (b). After eachinput by Sxk

, the signals from the hidden nodes (via thetrue nodes) are measured for each subdevided intervalsafter the time evolution by Uvτ/V (v = 1, 2, ...V ), i.e.,

r(kτ + (v/V )τ) ≡ U(v/V )τSxkr(kτ). (82)

In total, now we have N × V nodes, and the output isdefined as their linear combination:

yk =

N∑l=1

V∑v=1

W outj,v rl(kτ + (v/V )τ). (83)

By using the teacher data {yk}Lk , the linear readoutweights W out

j,v can be determined by using the pseudoinverse. In Ref. [38], the performance of QRC has beeninvestigated extensively for both binary and continuousinputs. The result shows that even if the number of thequbits are small like 5-7 qubits the performance as pow-erful as the echo state network of the 100-500 nodes havebeen reported both in short term memory and paritycheck capacities. Note that, although we do not go intodetail in this chapter, the technique called spatial multi-plexing [40], which exploits multiple quantum reservoirswith common input sequence injected, is also introducedto harness quantum dynamics as a computational re-source. Recently, QRC has been further investigated inRefs. [41, 71, 74]. Specifically, in Ref. [71], the authorsuse quantum reserovir computing to detect many-bodyentanglement by estimating nonlinear functions of dein-sity operators like entropy.

D. Emulating chaotic attractors using quantumdynamics

To see a performance of QRC, here we demonstratean emulation of chaotic attractors. Suppose {xk}Lk isa discretized time sequence being subject to a complexnonlinear equation, which might has a chaotic behavior.In this task, the target, which the network is to output,is defined to be

yk = xk+1 = f({xj}kj=1). (84)

That is, the system learns the input of the next step.Once the system successfully learns yk, by feeding theoutput into the input of the next step of the system, thesystem evolves autonomously.

Here we employ the following target time series fromchaotic attractors: (i) Lorenz attractor,

dx

dt= a(y − x), (85)

dy

dt= x(b− z)− y, (86)

dz

dt= xy − cz, (87)

with (a, b, c) = (10, 28, 8/3), (ii) the chaotic attractor ofMackey-Glass equation,

d

dtx(t) = β

x(t− τ)

1 + x(t− τ)n− γx(t) (88)

with (β, γ, n) = (0.2, 0.1, 10) and τ = 17, (iii) Rosslerattoractor,

dx

dt= −y − z, (89)

dy

dt= x+ ay, (90)

dz

dt= b+ z(x− c), (91)

with (0.2, 0.2, 5.7), and (iv) Henon map,

xt+1 = 1− 1.4xt + 0.3xt−1. (92)

Regarding (i)-(iii), the time series is obtained by us-ing the fourth-order Runge-Kutta method with step size0.02, and only x(t) is employed as a target. For the timeevolution of quantum reservoir, we employ a fully con-nected transverse-field Ising model

H =∑ij

JijXiXj + hZi, (93)

where the coupling strengths are randomly chosen suchthat Jij is distributed randomly from [−0.5, 0.5] andh = 1.0. The time interval and the number of the virtualnodes are chosen to be τ = 4.0 and v = 10 so as to ob-tain the best performance. The first 104 steps are usedfor training. After the linear readout weights are deter-mined, several 103 steps are predicted by autonomouslyevolving the quantum reservoir. The results are shownin Fig. 4 for each of (a) Lorenz attractor, (b) the chaoticattractor of Mackey-Glass system, (c) Rossler attractor,and (d) Henon map. All these results show that trainingis done well and the prediction is successful for severalhundreds steps. Moreover, the output from the quantumreservoir also successfully reconstruct the structures ofthese chaotic attractors as you can see from the delayedphase diagram.

Page 11: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

11

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

9600 9800 10000 10200 10400 10600 10800 11000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1

0.2 0.3

0.4 0.5

0.6 0.7

0.8 0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

(b) Mackey-Glass time series

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

9600 9800 10000 10200 10400 10600 10800 11000

(a) Lorenz attractor

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

9600 9800 10000 10200 10400 10600 10800 11000

(c) Rössler attractor

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1

0.2 0.3

0.4 0.5

0.6 0.7

0.8 0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

teacherQRC

(d) Hénon map

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

9600 9800 10000 10200 10400 10600 10800 11000

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1

9900 9950 10000 10050 10100

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

teacherQRC

teacherQRC

teacherQRC

timestep

timestep

timestep

timestep

timestep

xk

xk-1

xk-15

xk-15

xkxk-6

xk

xkxk-6

xk-15

FIG. 4. Demonstrations of chaotic attractor emulations. (a) Lorenz attractor. (b) Mackey-Glass system. (c) Rossler attractor.(d) Henon map. The dotted line shows the time step when the system is switched from teacher forced state to autonomousstate. In the right side, delayed phase diagrams of learned dynamics are shown.

V. CONCLUSION AND DISCUSSION

Here we reviewed quantum reservoir computing andrelated approaches, quantum extreme learning machineand quantum circuit learning. The idea of quantumreservoir computing comes from the spirit of reservoircomputing, i.e., outsourcing information processing tonatural physical systems. This idea is best suited toquantum machine learning on near-term quantum de-vices in NISQ (noisy intermediate quantum) era. Sincereservoir computing uses complex physical systems as afeature space to construct a model by the simple linearregression, this approach would be a good way to under-

stand the power of a quantum enhanced feature space.ACKNOWLEDGEMENT

KF is supported by KAKENHI No.16H02211, JSTPRESTO JPMJPR1668, JST ERATO JPMJER1601,and JST CREST JPMJCR1673. KN is supported by JSTPRESTO Grant Number JPMJPR15E7, Japan, by JSPSKAKENHI Grant Numbers JP18H05472, JP16KT0019,and JP15K16076. KN would like to acknowledge Dr.Quoc Hoan Tran for his helpful comments. This work issupported by MEXT Quantum Leap Flagship Program(MEXT Q-LEAP) Grant No. JPMXS0118067394.

[1] R.P. Feynman, Simulating physics with computers, Int.J. Theor. Phys. 21, 467 (1982).

[2] M.A. Nielsen and I. L. Chuang, Quantum computationand quantum information, (Cambridge university press

Page 12: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

12

2010).[3] K. Fujii, Quantum Computation with Topological Codes

-From Qubit to Topological Fault-Tolerance-, Springer-Briefs in Mathematical Physics (Springer-Verlag 2015).

[4] P. W. Shor, Algorithms for quantum computation: Dis-crete logarithms and factoring, In Proceedings of the 35thAnnual Symposium on Foundations of Computer Sci-ence, 124 (1994).

[5] R. Barends et al., Superconducting quantum circuits atthe surface code threshold for fault tolerance, Nature 508,500 (2014).

[6] J. Kelly et al., State preservation by repetitive error detec-tion in a superconducting quantum circuit, Nature 519,66 (2015).

[7] J. Preskill, Quantum Computing in the NISQ era andbeyond., Quantum 2, 79 (2018).

[8] S. Boixo et al., Characterizing quantum supremacy innear-term devices., Nature Physics 14, 595 (2018).

[9] J.I. Cirac and P. Zoller, Goals and opportunities in quan-tum simulation, Nat. Phys. 8, 264 (2012).

[10] I. Bloch, J. Dalibard, and S. Nascimbene, Quantum sim-ulations with ultracold quantum gases, Nat. Phys. 8, 267(2012).

[11] I. M. Georgescu, S. Ashhab, and F. Nori, Quantum sim-ulation, Rev. of Mod. Phys. 86, 153 (2014).

[12] T. Kadowaki and H. Nishimori, Quantum annealing inthe transverse Ising model, Phys. Rev. E 58, 5355 (1998).

[13] E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lund-gren, and D. Preda, A quantum adiabatic evolution al-gorithm applied to random instances of an NP-completeproblem, Science 292, 472 (2001).

[14] T. F. Rønnow, Z. Wang, J. Job, S. Boixo, S. V. Isakov, D.Wecker, J. M. Martinis, D. A. Lidar, M. Troyer Definingand detecting quantum speedup, Science 345, 420 (2014).

[15] S. Boixo, T. F. Rønnow, S. V. Isakov, Z. Wang, D.Wecker, D. A. Lidar, J. M. Martinis, and M. Troyer, Evi-dence for quantum annealing with more than one hundredqubits, Nat. Phys. 10, 218 (2014).

[16] T. Morimae, K. Fujii, and J. F. Fitzsimons, Hardnessof classically simulating the one-clean-qubit model, Phys.Rev. Lett. 112, 130502 (2014).

[17] K. Fujii, H. Kobayashi, T. Morimae, H. Nishimura, S.Tamate, and S. Tani, Power of Quantum Computationwith Few Clean Qubits, Proceedings of 43rd InternationalColloquium on Automata, Languages, and Programming(ICALP 2016), pp.13:1-13:14.

[18] K. Fujii and S. Tamate, Computational quantum-classicalboundary of complex and noisy quantum systems, Sci.Rep. 6, 25598 (2016).

[19] M. Rabinovich, R. Huerta, and G. Laurent, Transientdynamics for neural processing Science 321, 48 (2008).

[20] J. Dambre, D. Verstraeten, B. Schrauwen, and S. Mas-sar, Information processing capacity of dynamical sys-tems, Sci. Rep. 2, 514 (2012).

[21] H. Jaeger and H. Haas, Harnessing nonlinearity: predict-ing chaotic systems and saving energy in wireless commu-nication, Science 304, 78 (2004).

[22] W. Maass, T. Natschlager, and H. Markram, Real-timecomputing without stable states: a new framework forneural computation based on perturbations, Neural Com-put. 14, 2531 (2002).

[23] C. Fernando and S. Sojakka, Pattern recognition in abucket In Lecture Notes in Computer Science 2801, p.588 (Springer, 2003).

[24] L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danck-aert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso,and I. Fischer, Information processing using a single dy-namical node as complex system. Nat. Commun. 2, 468(2011).

[25] D. Woods and T. J. Naughton, Photonic neural net-works., Nat. Phys. 8, 257 (2012).

[26] L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, J.M. Gutierrez, L. Pesquera, C. R. Mirasso, and I. Fischer,Photonic information processing beyond Turing: an opto-electronic implementation of reservoir computing, OpticsExpress 20, 3241 (2012).

[27] Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B.Schrauwen, M. Haelterman, and S. Massar, Optoelec-tronic Reservoir Computing, Sci. Rep. 2, 287 (2012).

[28] D. Brunner, M. C. Soriano, C. R. Mirasso, and I. Fischer,Parallel photonic information processing at gigabyte persecond data rates using transient states, Nat. Commun.4, 1364 (2013).

[29] K. Vandoorne, P. Mechet, T. V. Vaerenbergh, M. Fiers,G. Morthier, D. Verstraeten, B. Schrauwen, J. Dambre,and P. Bienstman, Experimental demonstration of reser-voir computing on a silicon photonics chip Nat. Commun.5, 3541 (2014).

[30] A. Z. Stieg, A. V. Avizienis, H. O. Sillin, C. Martin-Olmos, M. Aono, and J. K. Gimzewski Emergent criti-cality in complex turing B-type atomic switch networks,Adv. Mater. 24, 286 (2012).

[31] H. Hauser, A. J. Ijspeert, R. M. Fuchslin, R. Pfeifer, andW. Maass, Towards a theoretical foundation for morpho-logical computation with compliant bodies Biol. Cybern.105, 355 (2011).

[32] K. Nakajima, H. Hauser, R. Kang, E. Guglielmino, D. G.Caldwell, and R. Pfeifer, Computing with a Muscular-Hydrostat System, Proceedings of 2013 IEEE Interna-tional Conference on Robotics and Automation (ICRA),1496 (2013).

[33] K. Nakajima, H. Hauser, R. Kang, E. Guglielmino, D.G. Caldwell, and R. Pfeifer, A soft body as a reservoir:case studies in a dynamic model of octopus-inspired softrobotic arm Front. Comput. Neurosci. 7, 1 (2013).

[34] K. Nakajima, T. Li, H. Hauser, and R. Pfeifer, Exploitingshort-term memory in soft body dynamics as a computa-tional resource, J. R. Soc. Interface 11, 20140437 (2014).

[35] K. Nakajima, H. Hauser, T. Li, and R. Pfeifer, Informa-tion processing via physical soft body, Sci. Rep. 5, 10487(2015).

[36] K. Caluwaerts, J. Despraz, A. Iscen, A. P. Sabelhaus,J. Bruce, B. Schrauwen, and V. SunSpiral, Design andcontrol of compliant tensegrity robots through simula-tions and hardware validation, J. R. Soc. Interface 11,20140520 (2014).

[37] Y. LeCun, Y. Bengio, and G. Hinton, Deep Learning Na-ture 521, 436 (2015).

[38] K. Fujii and K. Nakajima, Harnessing Disordered-Ensemble Quantum Dynamics for Machine Learning,Phys. Rev. Applied 8, 024030 (2017).

[39] M. Negoro et al., Machine learning with controllablequantum dynamics of a nuclear spin ensemble in a solid,arXiv:1806.10910 (2018).

[40] K Nakajima et al., Boosting computational power throughspatial multiplexing in quantum reservoir computing,Phys. Rev. Applied 11, 034021 (2019).

Page 13: arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 · 2020. 11. 11. · arXiv:2011.04890v1 [quant-ph] 10 Nov 2020. 2 computing, especially when applied to real systems, is ... and quantum

13

[41] A Kutvonen, K Fujii, and T Sagawa, Optimizing a quan-tum reservoir computer for time series prediction, SciRep 10, 14687 (2020).

[42] Q. H. Tran and K. Nakajima, Higher-order quantumreservoir computing, arXiv:2006.08999 (2020).

[43] K. Mitarai et al., Quantum circuit learning, Phys. Rev.A 98, 032309 (2018).

[44] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme learn-ing machine: theory and applications, Neurocomputing70, 489 (2006).

[45] D. Verstraeten, B. Schrauwen, M. D’Haene, and D.Stroobandt, An experimental unification of reservoircomputing methods, Neural Netw. 20, 391 (2007).

[46] V. Havlicek et al, Supervised learning with quantum en-hanced feature spaces, Nature 567, 209 (2019).

[47] E. Farhi and H. Neven, Classification with quantum neu-ral networks on near term processors, arXiv:1802.06002(2018).

[48] M. Schuld et al., Circuit-centric quantum classifiers,Phys. Rev. A 101, 032308 (2020).

[49] H. Chen et al., Universal discriminative quantum neuralnetworks, arXiv:1805.08654 (2018).

[50] W. Huggins et al., Towards Quantum Machine Learningwith Tensor Networks, Quantum Science and Technology4, 024001 (2019).

[51] I. Glasser, N. Pancotti, and J. I. Cirac, From probabilis-tic graphical models to generalized tensor networks forsupervised learning, arXiv:1806.05964 (2018).

[52] Y. Du et al., Implementable Quantum Classifier for Non-linear Data, arXiv:1809.06056 (2018).

[53] M. Benedetti et al., Adversarial quantum circuit learningfor pure state approximation, New J. Phys. 21, 043023(2019).

[54] M. Benedetti et al., A generative modeling approach forbenchmarking and training shallow quantum circuits, npjQuantum Information 5, 45 (2019).

[55] J.-G. Liu and L. Wang, Phys. Rev. A 98, 062324 (2018).[56] H. Situ et al., Quantum generative adversarial network

for generating discrete data, Information Sciences 538,193 (2020).

[57] J. Zeng et al., Learning and Inference on Generative Ad-versarial Quantum Circuits, Phys. Rev. A 99, 052306(2019).

[58] J. Romero and A. Aspuru-Guzik Variational quantumgenerators: Generative adversarial quantum machinelearning for continuous distributions, arXiv:1901.00848

(2019).[59] Y. Du et al., The expressive power of parameterized quan-

tum circuits, Phys. Rev. Research 2, 033125 (2020).[60] M. Schuld et al., Evaluating analytic gradients on quan-

tum hardware, Phys. Rev. A 99, 032331 (2019).[61] K. Mitarai and K. Fujii, Methodology for replacing indi-

rect measurements with direct measurements, Phys. Rev.Research 1, 013006 (2019).

[62] J. G. Vidal, and D. O. Theis, Calculus on parameterizedquantum circuits, arXiv:1812.06323 (2018).

[63] A. Harrow and N. John Low-depth gradient measure-ments can improve convergence in variational hybridquantum-classical algorithms, arXiv:1901.05374 (2019).

[64] J. R. McClean et al. Barren plateaus in quantum neuralnetwork training landscapes, Nature Communications 9,4812 (2018).

[65] V. Bergholm et al., PennyLane: Automatic dif-ferentiation of hybrid quantum-classical computations,arXiv:1811.04968 (2018).

[66] Z.-Y. Chen et al., VQNet: Library for a Quantum-Classical Hybrid Neural Network, arXiv:1901.09133(2019).

[67] G. R. Steinbrecher et al., Quantum optical neural net-works, npj Quantum Information 5, 60 (2019).

[68] N. Killoran et al., Continuous-variable quantum neuralnetworks, Phys. Rev. Research 1, 033063 (2019).

[69] C. M. Wilson et al., Quantum Kitchen Sinks: An algo-rithm for machine learning on near-term quantum com-puters, arXiv:1806.08321 (2018).

[70] D. Zhu et al., Training of Quantum Circuits on a HybridQuantum Computer, Science Advances 5, 9918 (2019).

[71] S. Ghosh et al., Quantum reservoir processing, npj Quan-tum Information 5, 35 (2019).

[72] S. Ghosh, T. Paterek, and T. C. H. Liew, Quantum neu-romorphic platform for quantum state preparation, Phys.Rev. Lett. 123, 260404 (2019).

[73] S. Ghosh et al., Reconstructing quantum states withquantum reservoir networks, IEEE Trans. Neural Netw.Learn. Syst., pp. 1–8 (2020).

[74] J. Chen and H. I. Nurdin, Learning Nonlinear Input-Output Maps with Dissipative Quantum Systems, Quan-tum Information Processing 18, 198 (2019).

[75] T. Goto, Q. H. Tran and K. Nakajima, Universal ap-proximation property of quantum feature maps, arXiv:2009.00298 (2020).