on the cnot-cost of gates - arxivrequired in circuit implementations of operators. the results...

28
arXiv:0803.2316v1 [quant-ph] 15 Mar 2008 On the CNOT-cost of TOFFOLI gates Vivek V. Shende [email protected] Igor L. Markov [email protected] Abstract The three-input TOFFOLI gate is the workhorse of circuit synthesis for classical logic operations on quantum data, e.g., reversible arithmetic circuits. In physical implementa- tions, however, TOFFOLI gates are decomposed into six CNOT gates and several one-qubit gates. Though this decomposition has been known for at least 10 years, we provide here the first demonstration of its CNOT-optimality. We study three-qubit circuits which contain less than six CNOT gates and implement a block-diagonal operator, then show that they implicitly describe the cosine-sine decom- position of a related operator. Leveraging the canonicity of such decompositions to limit one-qubit gates appearing in respective circuits, we prove that the n-qubit analogue of the TOFFOLI requires at least 2n CNOT gates. Additionally, our results offer a complete clas- sification of three-qubit diagonal operators by their CNOT-cost, which holds even if ancilla qubits are available. Department of Mathematics, Princeton University, Princeton, NJ 08544. Department of EECS, University of Michigan, Ann Arbor, MI 48109. 1

Upload: others

Post on 04-Apr-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

arX

iv:0

803.

2316

v1 [

quan

t-ph

] 15

Mar

200

8 On theCNOT-cost ofTOFFOLI gates

Vivek V. Shende∗

[email protected]

Igor L. Markov†

[email protected]

Abstract

The three-inputTOFFOLI gate is the workhorse of circuit synthesis for classical logicoperations on quantum data, e.g., reversible arithmetic circuits. In physical implementa-tions, however,TOFFOLI gates are decomposed into sixCNOT gates and several one-qubitgates. Though this decomposition has been known for at least10 years, we provide herethe first demonstration of itsCNOT-optimality.

We study three-qubit circuits which contain less than sixCNOT gates and implementa block-diagonal operator, then show that they implicitly describe the cosine-sine decom-position of a related operator. Leveraging the canonicity of such decompositions to limitone-qubit gates appearing in respective circuits, we provethat then-qubit analogue of theTOFFOLI requires at least 2n CNOT gates. Additionally, our results offer a complete clas-sification of three-qubit diagonal operators by theirCNOT-cost, which holds even if ancillaqubits are available.

∗Department of Mathematics, Princeton University, Princeton, NJ 08544.†Department of EECS, University of Michigan, Ann Arbor, MI 48109.

1

Page 2: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Contents

1 Introduction 3

2 Preliminaries 42.1 Notation and properties of standard quantum gates . . . . .. . . . . . . 42.2 Operators commuting withZ . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Cartan decompositions in quantum logic . . . . . . . . . . . . . .. . . . 72.4 Basic facts aboutCZ-counting . . . . . . . . . . . . . . . . . . . . . . . 9

3 Deriving gate constraints from circuit equations 10

4 TheCNOT-cost of theTOFFOLI gate 124.1 CZ counting via the demultiplexing decomposition . . . . . . . . . .. . . 144.2 Circuit constraints from the cosine-sine decomposition . . . . . . . . . . 164.3 Corollaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Three-qubit diagonal operators 20

6 Circuits with ancillae 24

7 Conclusion 25

2

Page 3: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

1 Introduction

The three-qubitTOFFOLI gate appears in key quantum logic circuits, such as those formodular exponentiation. However, in physical implementations it must be decomposedinto one- and two-qubit gates. Figure 1 reproduces the textbook circuit from [14] with sixCNOT gates, as well as Hadamard (H), T = exp(iπσz/8) andT† gates.

=

• • • T •

• • • T �������� T† ��������

�������� H �������� T† �������� T �������� T† �������� T H

Figure 1: Decomposing theTOFFOLI gate into one-qubit and sixCNOT gates.

The pursuit of efficient circuits for standard gates has a long and rich history. DiVin-cenzo and Smolin found numerical evidence [4] that five two-qubit gates are necessary andsufficient to implement theTOFFOLI. Margolus showed that a phase-modifiedTOFFOLI

gate admits a three-CNOT implementation [6, 5], whose optimality was eventually demon-strated by Song and Klappenecker [20]. Unfortunately, thisMARGOLUS gate can replaceTOFFOLI only in rare cases. The detailed case analysis used in the optimality proof from[20] does not extend easily to circuits with four or fiveCNOTs. The omnibus Barenco etal. paper offers circuits for many standard gates, including an eight-CNOT circuit for theTOFFOLI [1, Corollary 6.2], as well as a six-CNOT circuit for the controlled-controlled-σz, which differs from theTOFFOLI only by one-qubit operators [1, Section 7]. Problem4.4b of the textbook by Nielsen and Chuang asks whether the circuit of Figure 1 could beimproved. The problem was marked as unsolved, and we report the following progress.

Theorem 1 A circuit consisting ofCNOT gates and one-qubit gates which implements then-qubit TOFFOLI gate without ancillae requires at least2n CNOT gates. For n= 3, thisbound holds even when ancillae are permitted, and is achieved by the circuit of Figure 1.

Our main tool is the Cartan decomposition in its “KAK” form, which provides a Lie-theoretic generalization of the singular-value decomposition [8]. Several special caseshave previously proven useful for the synthesis and analysis of quantum circuits, notablythe two-qubitmagic decomposition[10, 11, 24, 23, 22, 16, 17],the cosine-sine decom-position [7, 2, 13, 18], andthe demultiplexing decomposition[18]. The canonicity ofthe two-qubit canonical decomposition was used previouslyto performCNOT-counting fortwo-qubit operators [16]. The magic decomposition is a two-qubit phenomenon,1 but thecosine-sine and demultiplexing decompositions hold forn-qubit operators and enjoy sim-ilar canonicity. Moreover, the components of these decompositions aremultiplexors[18]— block-diagonal operators that commute with many common circuit elements. Commu-tation properties facilitate circuit restructuring that can dramatically reduce the number of

1 While the Cartan decomposition SU(n) = SO(n) · [diagonals] ·SO(n) is general, the utility of the magicdecomposition arises from the isomorphism SU(2)×SU(2)≃ SO(4) being represented as an inner automorphismof SU(4). Such coincidental isomorphisms are few and confined to low dimensions.

3

Page 4: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

circuit topologies to be considered in proofs. These results and observations allow us toperformCNOT-counting using the Cartan decomposition in a divide-and-conquer manner.

In the remaining part of this paper, we first review basic properties of quantum gatesin Section 2 and make several elementary simplifications to reduce the complexity of thesubsequent case analysis. In particular, we pass from theCNOT and TOFFOLI gates tothe symmetric, diagonalCZ andCCZ gates, and recall circuit decompositions which yieldoperators commuting withZ andCZ gates. We also define qubit-localCZ-costs, and observethat the totalCZ-cost can be lower-bounded by half the sum of the localCZ counts for eachqubit. Though weak, this bound suffices for our purposes and we can compute it in simplecases. Further technique is developped in Section 3, where we compute matrix entries toderive constraints on gates from circuit equations. This approach was employed by Songand Klappenecker in the two-qubit case, and we generalize several of their results ton-qubit circuits.

Section 4 is the heart of the present work, in which we prove our result on theCNOT-cost of theTOFFOLI gate. It starts by motivating and outlining the methods involved,previews key intermediate results, and proves that theCNOT-cost of theTOFFOLI is 6, basedon these results. In Section 4.2, we use the canonicity of thecosine-sine decompositionderive circuit constraints. Section 4.1, motivated by [17], employs the canonicity of thedemultiplexing decomposition, captured by a spectral invariant, to lower-boundCZ gatesrequired in circuit implementations of operators. The results apply,mutatis mutandis, toCNOT-based implementations as well. Finally, in Section 4.3, wededuce as corollaries thatthe three-qubitPERES gate requires exactly 5CNOTs and then-qubitTOFFOLI gate requiresat least 2n. In Section 5, we extend our techniques to all three-qubit diagonal operators,completely classifying them according toCZ-cost. Generalizations to circuits with ancillaeare obtained in Section 6. Concluding discussion can be found in Section 7.

2 Preliminaries

We review notation and properties of useful quantum gates, then characterize operatorsthat commute with Pauli-Z gates on multiple qubits. We then review circuit decomposi-tions from [3, 13, 18]. Finally, we introduce terminology appropriate for quantifying gatecosts of unitary operators in terms of theCNOT andCZ and state elementary but usefulobservations about these costs.

2.1 Notation and properties of standard quantum gates

We write X,Y,Z for the Pauli operators, andCX,CCX for CNOT,TOFFOLI. Rotation gatesexp(iZθ) are denoted byRz(θ), and we analogously useRx,Ry.2 We work throughout onsome fixed number of qubitsN. For a one-qubit gateg and a qubitq, we denote byg(q)

the N-qubit operator implemented by applying the gateg on qubitq. Similarly, C(i)X

( j)

is the operator implemented by a controlled-X with the control on qubiti and target onqubit j. The controlled-Z being symmetric with respect to exchanging qubits, we do notdistinguish control from target in the notationCZ(i, j). We similarly denote the operator ofa controlled-controlled-Z on qubitsi, j,k by CCZ(i, j,k). In choosing qubit labels, we follow

2 We omit the factor of±1/2 used by other authors.

4

Page 5: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

CNOT andTOFFOLI CZ andCCZAdvantages With one-qubit gates added, eitherCNOT or CZ would be universal

Implement addition and multiplication SymmetricUniversal for reversible computation Fewer circuit topologiesBlock-diagonal DiagonalWith 1-qubit diagonals, implement any diagonal—Commute withX on target Commute withZ on target

Other Change direction after twoH-conjugationsproperties One can map back and forth byH-conjugation on target

Applications Circuit synthesis Circuit analysis

Table 1: Relative advantages of standard controlled gates.

throughout the convention that the high-to-low significance order of qubits is the same asthe lexicographic order of their labels.

We follow the standard but sometimes confusing convention that typeset operators acton vectors from the left, but circuit diagrams process inputs from the right.Consistentlywith the established notation for theCNOT gate, we denote theX gate by “⊕” in circuitdiagrams. We denote theZ gate by a “•” symbol, which does not lead to ambiguity in thematching notation forCZ becauseCZ is symmetric. Thus the following diagram expressesthe identityCZ(ℓ,m)

X(ℓ) = Z

(m)X

(ℓ)CZ

(ℓ,m) and rearranges gates in quantum circuits, like deMorgan’s law does in digital logic.

ℓ �������� •=

• ��������

m • • •

(1)

Another standard identity relates theX, Z, and one-qubitHADAMARD (H) gates:HXH= Z.By case analysis on control qubits, one obtains the further identitiesH(i)

C( j)X(i)

H(i) = CZ

(i, j)

andH(i)CC

( j,k)X

(i)H

(i) = CCZ(i, j,k). Despite this equivalence, we prefer theX family of gates

for some applications and theZ family for others, as summarized in Table 1.Circuits consisting entirely of one-qubit gates andCZ (respectivelyCNOT) gates will be

calledCZ-circuits (respectivelyCNOT-circuits). Using the above identities,CZ-circuits andCNOT-circuits can be interchanged at the cost of adding one-qubit H gates. It will also beconvenient to considerCZ(ℓ)-circuits, which by definition are arbitrary circuits whereallmulti-qubit gates touching qubitℓ areCZ. While these are not a subclass ofCZ-circuits, aCZ

(ℓ)-circuit can be converted into aCZ-circuit without any changes affecting qubitℓ.

2.2 Operators commuting withZ

We now recall terminology for operators commuting withZ on some qubits, but possiblynot all qubits. Further background on the circuit theory of thesequantum multiplexorscanbe found in [18].

The control-on-box notation of the following diagram indicates that the operatorUcommutes withZ(ℓ). The backslash on the bottom line indicates an arbitrary number ofqubits (a multi-qubit bus).

5

Page 6: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

\ U

These operators include the commonly-used positively and negatively controlled-U gates,although in our notationU also acts on the control qubits (and is thus “larger than the boxin which it is contained”). In general, operators which commute withZ are block-diagonal:

Observation 2 For a unitary operator Q and qubitℓ, consider the one-qubit values|0〉(ℓ)

and |1〉(ℓ) on ℓ-th input and output qubits of the operator. The following are equivalent.

• Q commutes withZ(ℓ)

• 〈0|(ℓ) Q|1〉(ℓ) = 0

• 〈1|(ℓ) Q|0〉(ℓ) = 0

• Q admits a decomposition Q= |0〉〈0|⊗Q0+ |1〉 〈1|⊗Q1, where the projectors|i〉 〈i|operate on qubitℓ and the unitary Qi operate on the qubits other thanℓ.

In an appropriate basis, the matrix ofQ is block-diagonal. Its blocks represent the “then”and “else” branches of the quantum multiplexorQ with select qubitℓ.

Notation. If Q commutes withZ(ℓ) andℓ is clear from context, we denoteQ’s diagonalblocks 〈 j|(ℓ) Q| j〉(ℓ) by Q j . Similarly, if Q commutes with withZ(ℓi) on multiple qubits

ℓ1 . . . ℓk, then for any bitstringj1 . . . jk we writeQ j1... jk for 〈 j1 . . . jk|(ℓi ...ℓk) Q| j1 . . . jk〉

(ℓi ...ℓk).

When theℓk include all the qubits,Q is diagonal and theQ j1... jk are its diagonal en-tries. In general,Q j1... jk capture diagonal blocks ofQ with respect to an ordering ofcomputational-basis vectors in which qubitsℓ1 . . . ℓk are the most significant qubits.

We now point out the following commutability.

Observation 3 Let Q,R be two gates such that for every qubitℓ, either one of them doesnot affectℓ, or both of them commute withZ(ℓ). Then QR= RQ. In picture:

\

=

\

\ Q \ Q

\ R \ R

We now recall themultiplexed rotationgates [13, 18], which generalize theRx,Ry,Rz

gates. Let∆ be a diagonal Hermitian matrix acting on the qubitsℓ1, . . . , ℓk, and fix anotherqubitm 6= ℓi . We define the operatorR(m)

z (∆) on the qubitsℓ1, . . . , ℓk,mby the conditions (1)

that it commute withZ(ℓi) for all i, and (2) for any bitstringj1 . . . jk, we haveR(m)z (∆) j1... jk =

Rz(∆ℓ1...ℓk). Explicitly, R(m)z (∆) = exp(iZ(m)∆(ℓ1...ℓk)). MultiplexedRx,Ry gates are defined

similarly. Since such operators commute withZ(ℓi), we depict them in circuit diagramswith the appropriate control-on-boxes.

It is natural to ask when an operator commuting with variousZ gates can be imple-mented in aCZ-circuit containing only gates commuting with the sameZ gates. The answeris given in terms of thepartial determinant.

6

Page 7: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Definition 4 Fix qubits ℓ1 . . . ℓk. We define the partial determinant mapdetℓ1...ℓk fromthe operators commuting withZ(ℓ1), . . . ,Z(ℓk) to the diagonal operators acting only on thequbitsℓi. It is given by(detℓ1...ℓk(U)) j1... jk = det(U j1... jk).

When computing partial determinants of a single gate or subcircuit acting onmqubits,we first tensor respective operators withI2N−m to form operators acting on allN qubits(which may affect the determinants). When applied to such “full” operators, the partialdeterminant mapping is a group homomorphism.

Proposition 5 Fix qubits ℓ1 . . . ℓk among N> k qubits. A unitary U commuting withZ

(ℓ1), . . . ,Z(ℓk) can be implemented by aCZ-circuit in which only diagonal gates operate onqubitsℓi if and only ifdetℓ1...ℓk(U) is separable (can be implemented by one-qubit gates).

Proof. (⇒). It suffices to show the separability of detℓ1...ℓk(U) for a generating set ofoperators. By definition, such a generating set is provided by CZs, one-qubit diagonals ontheℓi, and gates not affecting any of theℓi.

Note first that any diagonal gateD acting on qubitsℓ1, . . . , ℓk has partial determinantgiven by detℓ1...ℓk(D) = D2N−k

, understood as an operator on qubitsℓ1 . . . ℓk. In particular, ifD were separable, then so is detℓ1...ℓk(D). If D = CZ

(ℓi ,ℓ j ), then fromCZ2 = I andN > k we

deduce detℓ1...ℓk(CZ(ℓi ,ℓ j )) = I . The remaining gates we need to consider are:

(i) any gate not affecting qubitsℓi implementsU = Q(1..N)\(ℓ1...ℓk) for someQ.In this caseU j1... jk = Q, and furthermore detℓ1...ℓk(U) = det(Q)I .

(ii) CZ gates connecting qubitsℓi ,m /∈{ℓ1, . . . , ℓk}. We compute detℓ1...ℓk(CZ(ℓi ,m))= (Z(ℓi))2N−k−1

.(⇐). This part of the result is not used in the rest of the paper, and we therefore defer

the proof to the Appendix.�

2.3 Cartan decompositions in quantum logic

This section recalls two important operator decompositions (cosine-sineanddemultiplex-ing) and casts them as circuit decompositions. Readers willingto accept their use in ourproofs may skip to Section 2.4.

Observe that an operator can be implemented with a single one-qubit gate if and only ifit commutes with the Pauli operatorsZ andX on all other qubits. Thus to produce aCNOT-circuit for a given operatorU , one may use the following algorithmic framework.

1. DecomposeU into a circuit in which each non-CNOT gate,V,W, . . ., commutes withX andZ on more qubits thatU does.

2. Apply the algorithm recursively toV,W, . . . until one-qubit gates are reached.

As Z is self-adjoint, the requirement thatU commutes withZ(i) can be rephrased as thecondition thatU is fixed under the involutionU 7→ Z

(i)UZ(i). Given such an involution, a

fundamental Lie-theoretic result produces an operator decomposition [8]. Here we recitethe result for completeness, but do not require the reader tounderstand all terminology.

The Cartan Decomposition.Let G be a reductive Lie group, andι : G→ G an involution.Let K = {g : ι(g) = g} andA be maximal over subgroups contained in{g : ι(g) = g−1}.ThenK is reductive,A is abelian, andG = KAK.

7

Page 8: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

In order to restate decompositions of unitary operators as circuit decompositions, weemploy the notation ofset-valuedquantum gates [18]. Completely unlabelled gates (as inEquation 4) denote the set of all gates satisfying all control-on-box commutativity condi-tions imposed by the diagram, and gates labelledRx,Ry,Rz denote the appropriate set of(possibly multiplexed) rotations. An equivalence of circuits with set-valued gates meansthat if we pick an element from each set on one side, there is a way to choose elements onthe other so that the two circuits compute the same operator.The backslashed wires whichusually indicate multiple qubits may also carryzeroqubits.

The involutionφZ : U 7→ Z(ℓ)UZ

(ℓ) corresponds to thecosine-sine decomposition.3

\

=\

ℓ Ry

(2)

The involutionφY : U 7→ Y(ℓ)UY

(ℓ) yields thedemultiplexing decomposition[18].

=

Rz

\

\

(3)

The mapφY restricts to the subgroup of diagonal operators. This groupbeing abelian,theK andA factors commute, leaving the following decomposition of diagonal operators.

ℓ=

Rz

\

(4)

The involutionφY further restricts to the subgroup of multiplexedZ rotations, whichwe can demultiplex again. TheK andA factors again commute; theA factor is computedby the last 3 gates in the circuit below.

=

• •

\

Rz Rz �������� Rz ��������

(5)

To establish the existence of these decompositions, it remains to verify in each casethat the purportedK andA satisfy the appropriate properties with respect to the relevantinvolution. This can be checked after passing to the Lie algebra where it is easy. Alter-natively, explicit constructions of the cosine-sine and demultiplexing decompositions aregiven in [15] and [18], respectively.

To decompose generaln-qubit operators, Equation 2 can be applied iteratively until allremaining gates are either multiplexedRy gates or diagonal. TheRy gates can be replacedby Rz gates at the cost of introducing some one-qubit operators; the Rz and other diago-nal gates can be decomposed as described above; for details and optimizations see [13].Smaller circuits are obtained by another algorithm, which alternates cosine-sine decompo-sitions with demultiplexing decompositions; for details and optimizations, see [18].

3The terminology comes from the numerical linear algebra literature; see [15] and references therein.

8

Page 9: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

When circuit decompositions are applied recursively, somegates can be reduced bylocal circuit transformations. For example, when iteratively demultiplexing multiplexedRz

gates, someCNOTs may be cancelled as shown below.

• •

=

• •• • • •

\

Rz �������� Rz �������� Rz �������� Rz �������� �������� �������� Rz �������� Rz ��������

_ _�

_ _

_ _�

_ _

This technique produces a circuit with 2nCNOT gates for ann-ply multiplexedRz gate.

Using Equation 4, we obtain a circuit with 2n − 2 CNOT gates for an arbitraryn-qubitdiagonal operator [3]. Applying this result toCCZ gate leads to the circuit in Figure 1.

2.4 Basic facts aboutCZ-counting

The CZ-cost |U |CZ of an N-qubit operatorU is the minimum number ofCZs which ap-pear in anyN-qubit CZ-circuit for U ; we define theCNOT-cost analogously. The identityH

(i)C

( j)X(i)H

(i) = CZ(i, j) ensures that|U |CZ = |U |CNOT. The further identityH(i)

CC( j,k)

X(i)H

(i) =CCZ

(i, j,k) yields:

Observation 6 |CCZ|CZ = |CCX|CNOT ≤ 6.

By way of illustration, the following modification of the circuit in Figure 1 implementstheCCZ in terms ofCZs.

=

• • • T •

• • • TH • HT†H • H

• H • HT†H • HTH • HT†H • HTH H(6)

It shall prove more convenient to compute|CCZ|CZ rather than|CCZ|CNOT. To do so, weare going to study the number ofCZs which must touch a given qubit in anyCZ-circuitfor a given operator. More precisely, theCZ(ℓ)-cost|U |CZ;ℓ is the minimum number ofCZgates incident onℓ in anyCZ(ℓ)-circuit for U . These cost functions are related through thefollowing estimate.4

Observation 7 For any operator P,

|P|CZ ≥12 ∑

j

|P|CZ; j

4 This bound is very weak in general. Dimension-counting shows that a genericN-qubit operatorU requireson the order of 4N CZ gates [9], whereas the results of [18] imply that|U |CZ;ℓ < 6N. At best we can establish that|U |CZ ≥ N(6N−1).

9

Page 10: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Proof. EachCZ gate touches two qubits.�

As the costs|CCZ|CZ; j are the same forj = 1,2,3 (by symmetry),

|CCZ|CZ ≥32|CCZ|CZ; j (7)

We emphasize that the number of qubits,N, is an unspecified parameter in both| · |CZand | · |CZ;ℓ. In the presence of ancillae, we define|U |a

CZ:= mint |U ⊗ I⊗t

2 |CZ. Obviously|U |a

CZ≤ |U |CZ. While |U |a

CZ= |U |CZ seems unlikely to always hold, we are not aware of

any counterexamples. Indeed, we will show in Section 6 that this equality holds for alltwo-qubit operators and all three-qubit diagonal operators.

3 Deriving gate constraints from circuit equations

The circuit decompositions of Section 2.3 are essentially unique, and from this canonicityone can derive various constraints on which gates may appearin certain circuit equations.We will pursue this route in Section 4.2. However, the simplest cases are easier to treatfrom the more elementary point of view adopted by Song and Klappenecker in their clas-sification of two-qubit controlled-U operators byCNOT-cost [19]. Considering the operatorcomputed by a candidate circuit, they first focus on matrix elements which vanish if theoperator is a controlled-U . In order to produce such zero elements, the gates in the can-didate circuit must satisfy certain constraints. Below we derive a series of more generalresults forn-qubit circuits. One-qubit gates which become diagonal when multiplied byXoccur frequently; we refer to them as anti-diagonal.

Lemma 8 The following equation imposes at least one of the followingconstraints.

1 b a=

\ P Q

1. a,b are both diagonal or both anti-diagonal.

2. P takes the form d⊗P0 for some one-qubit diagonal d.

Proof.0 = 〈0|(1) aPb|1〉(1) = 〈0|a|0〉 〈0|b|1〉P0+ 〈0|a|1〉 〈1|b|1〉P1

As the coefficients do not vanish,P0 andP1 are linearly dependent. It follows thatP =d⊗P0 for some one-qubit diagonald. �

Corollary 9 If a(i)CZ

(i, j)b(i) commutes withZ(i), then a,b are both diagonal or anti-diagonal.

Corollary 10 In the situation of Lemma 8, there exist one-qubit operatorsa′,b′ which areeither diagonal or anti-diagonal, such that a′(1)Pb′(1) = Q.

Proof. Apply Lemma 8; we need consider only Case 2. Takea′ = aδbδ−1 andb′ = I ; thena′(1)Pb′(1) = a(1)Pb(1). As a′(1) = QP† commutes withZ(1), it is diagonal.�

We turn now to circuits with twoCZ gates.

10

Page 11: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Lemma 11 Suppose the following equation holds.

1 b a

=2P Q

\

Then (I) aib j is diagonal for all i, j or (II) one of P,X(2)P commutes withZ(2).

Proof. We compute:

0 = 〈0|(1) 〈i|(2) aPb|1〉(1) | j〉(2) = 〈0|(1) aib j |1〉(1) 〈i|(2) P| j〉(2)

Either〈i|(2) P| j〉(2) = 0 for somei, j, or 〈0|aib j |1〉 vanishes for alli, j. �

Corollary 12 Suppose the following equation holds.

1=

t • s • r

2M T

•S

•R

\

Then either (I) an even number of r,s, t are anti-diagonal, and the remainder diagonal, or(II) S or SX(2) commutes withZ(2).

Proof. In order to apply Lemma 11, We moveRandT to the other side.

=

t • s • r

mT† M R†

•S

\ \

The cases here will correspond to the cases of Lemma 11. Case II is preserved verbatim.For Case I, the “aib j ” which must be diagonal arerst, rsZt, rZst, rZsZt. Since(rst)†rsZt =tZt† is diagonal, we deduce that eithert or tX is diagonal. Likewise,rZst(rst)† = rZr† isdiagonal, so eitherr or rX is diagonal. Finally,rst is diagonal, so from what we know aboutr, t, eithersor sX is diagonal, and the number ofr,s, t which are not diagonal is even.�

The following reformulation will be useful later.

Corollary 13 Suppose Q commutes withZ(ℓ) and letC be aCZ(ℓ)-circuit computing Q inwhich exactly twoCZs are incident onℓ, sayCZ(ℓ,m) and CZ

(ℓ,n). Then all non-diagonalone-qubit gates may be eliminated from qubitℓ at the cost of possibly (i) replacingCZ(ℓ,n)

with CZ(ℓ,m) and (ii) adding one-qubit gates on qubits m,n.

Proof. By hypothesis,C takes the form

Q = [r ⊗R]CZ(ℓ,m)[s⊗S]CZ(ℓ,n)[t ⊗T]

wherer,s, t are subcircuits of one-qubit operators acting onℓ, andR,S,T are subcircuitscontaining no gates acting onℓ. We immediately replacer,s, t by the one-qubit operatorsthey compute. Moreover, ifm 6= n, then replaceSandT by S·SWAP(m,n) andSWAP(m,n) ·T,

11

Page 12: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

whereSWAP is the gate which exchanges qubits. The swaps will be restored and canceledat the end of the proof. We are in the situation of Lemma 11.

Case I. We are done, with the exception that ther,s, t may be anti-diagonal rather thandiagonal. In this case, Equation 1 allows the extraneousXs to be pushed through andcancelled at the cost of introducingZ gates on qubitm. The diagonal gates remainingon qubitℓ may be commuted through theCZs and conglomerated into one. Finally, thepossible swap introduced between theS,T terms may be cancelled.

Case II. Using Equation 1 and replacings by sZ if necessary, we commuteSpast one oftheCZs. We now have:

Q = [r ⊗R]CZ(ℓ,m)s(ℓ)CZ

(ℓ,m)[t ⊗ST]

Rearranging the equation,

[I ⊗R†]Q[I ⊗T†S†] = r(ℓ)CZ

(ℓ,m)s(ℓ)CZ

(ℓ,m)t(ℓ) (8)

Let V be the value of either side of the equation above. Then from the LHS we see thatVcommutes withZ(ℓ), and from the RHS we see thatV is a two-qubit operator commutingwith Z(m). ThusV is a two-qubit diagonal, and admits the following decomposition.

ℓV =

Rz(α) • •

m Rz(β ) H • H Rz(γ) H • H

Substituting this decomposition for the RHS of Equation 8 and restoring theR,S,T gatescompletes the proof.�

4 TheCNOT-cost of theTOFFOLI gate

So far we have reducedCNOT-counting for theTOFFOLI gate toCZ-counting for theCCZgate, with the latter two being diagonal and symmetric. Having derived the inequality3|CCZ|CZ;ℓ/2≤ |CCZ|CZ, we seek to determine the qubit-local costs|CCZ|CZ;ℓ.

The idea is to find an equivalence relation∼ℓ such that (i)U ∼ℓ V =⇒ |U |CZ;ℓ = |V|CZ;ℓ

and (ii) the equivalence classes of∼ℓ are easy to characterize.

Definition 14 For P,Q commuting withZ(ℓ), we write P∼ℓ Q if there exist a,b,A,B satis-fying the following equation.

ℓ b a=

\ B P A Q

(9)

The fact that| · |CZ;ℓ is constant on equivalence classes is obvious; the ability to char-acterize the equivalence classes comes from a comparison between Equation 9 and thedemultiplexing decomposition of Equation 3. We construct invariants of the equivalenceclasses in Theorem 17. The reductions of Section 4.2 providecircuit forms on which theinvariants are easy to compute; as a consequence, we arrive at a complete characterization

12

Page 13: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

of U such that|U |CZ;ℓ = 0,1,2 in Theorem 18. TheCCZ gate falls into none of these classes,and thus|CCZ|CZ;ℓ ≥ 3, and hence|CCZ|CZ ≥ 5. Unfortunately, qubit-localCZ-counting cantake us no further: one can show by construction that in fact|CCZ|CZ;ℓ = 3.

We now consider a hypothetical five-CZ circuit for theCCZ and seek a contradiction,using a divide-and-conquer strategy. There are many possible arrangements of theCZs, andwe do not deal with them case by case. Nonetheless, we fix one here for clarity.

1 •

=

f • e • d • c

2 • k • j • i • h • g

3 • o • n • m • l

(10)

We definea,b,P,Q as follows.

a=

d • c

b=

f • e

=

Pj • i • h

• n • m •

=

Qk† • g†

o† • l†

Our circuit decomposition now takes the following form.

1 b a

=2P Q

\

(11)

Up to some two-qubit diagonal fudge factors, this equation says that the cosine-sine de-composition ofb†⊗ I is Q†[a⊗ I ]P. In Section 4.2, we translate the well-known canonicityof this Cartan decomposition into constraints on the componentsa, b, P andQ. The formu-lae of Theorem 18 further strengthen these constraints in the | · |CZ;ℓ = 3 case. Specifically,we show in Theorem 22 that if|U |CZ;ℓ = 3 andC computesU using the minimum requiredthreeCZ gates incident onℓ, then all one-qubit gates onℓ are diagonal or anti-diagonal.The anti-diagonal gates can be made diagonal at the cost of introducingZ gates elsewherein the circuit.

This is the last result needed to determine theCZ-cost of theCCZ. From |CCZ|CZ;ℓ ≥ 3,we see that in any five-CZ circuit for theCCZ, two of the qubits,m,n touch exactly threeCZgates and the remaining one touches four. By Theorem 22, we can assume all one-qubitoperators onm,n are diagonal. Proposition 5 would then require detm,nCCZ= CZ

(m,n) to beseparable, which it is not.

13

Page 14: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Theorem 15 |CCZ|CZ = 6.

We show in Section 6 that the use of ancillae can not lower theCZ-cost of theCCZ.

4.1 CZ counting via the demultiplexing decomposition

We now turn to the study of qubit-localCZ-cost. To applyP∼ℓ Q =⇒ |P|CZ;ℓ = |Q|CZ;ℓ,we first seek to determine whenP ∼ℓ Q. This will be done under the assumption thatPandQ both commute withZ(ℓ).

Definition 16 Let U commute withZ(ℓ). Then theℓ-mux-spectrumℑ(ℓ)(U) is the multi-set of eigenvalues, taken with multiplicity, of U†

1U0. Two multi-sets S,T are said to becongruent, S∼= T, if there exists a nonzero scalarλ such that eitherλS= T or λS= T†.

We note that before taking theℓ-mux-spectrum ofU , it is necessary to fix the numberof qubits on whichU acts :ℑ(ℓ)(U ⊗ I) contains dimI copies ofℑ(ℓ)(U).

Theorem 17 Suppose P,Q commute withZ(ℓ). Then P∼ℓ Q ⇐⇒ ℑ(ℓ)(P) ∼= ℑ(ℓ)(Q).

Proof. (⇒). As P∼ℓ Q, there are gatesa,b,A,B such that

ℓ b a=

\ B P A Q

By Corollary 9, we may assume that eithera,b or aX,bX are diagonal. In the first case,Q0 = a0b0AP0B andQ1 = a1b1AP1B. ThusQ†

1Q0 = (a1b1)†a0b0B†P†

1 P0B, which has thesame eigenvalues as(a1b1)

†a0b0P†1 P0. Thusℑ(ℓ)(P) ∼= ℑ(ℓ)(Q).

Otherwise,a′ = aX andb′ = Xbare diagonal. NowQ†0Q1 =(a′1b′1)

†a′0b′0B†P†0 P1B, which

has the same eigenvalues as(a′1b′1)†a′0b′0P†

0 P1, whose eigenvalues in turn are the complexconjugates of those ofa′1b′1(a

′0b′0)

†P†1 P0; againℑ(ℓ)(P) ∼= ℑ(ℓ)(Q).

(⇐). By supposition, theℑ(ℓ)(P) ∼= ℑ(ℓ)(Q) We noteℑ(ℓ)(X(ℓ)PX(ℓ)) = ℑ(P)† and

ℑ((R(ℓ)z (λ )P) = e2iλ ℑ(P). Therefore we can readily find an operatorP′ ∼ℓ P such that the

ℓ-mux-spectrum ofP is identical, rather than merely congruent, to that ofQ. It remains toshow thatP′ ∼ℓ Q.

By the demultiplexing decomposition (Equation 3) there exist unitary operatorsMP,NP

and a real diagonal matrixδP, all of which operate on the qubits other thanℓ, such thatP′ = [I ⊗MP]R(ℓ)

z (δP)[I ⊗NP]. Likewise we decomposeQ= [I ⊗MQ]R(ℓ)z (δQ)[I ⊗NQ]. If we

let ∆P = exp(iδP) and∆Q = exp(iδQ), then theℓ-mux-spectra ofP′ andQ are respectivelythe entries of∆2

P and∆2Q. Sinceℑ(ℓ)(P) = ℑ(ℓ)(Q), there must exist a permutation matrix

π acting on the qubits other thanℓ such thatπ∆2Pπ† = ∆2

Q. Rearranging, we have∆†Qπ∆P =

∆Qπ∆†P. Writing K for this term,[I ⊗MQKM†

P]P′[I ⊗N†Pπ†NQ] = Q. ThusP′ ∼ℓ Q. �

We now apply Theorem 17 to prove the following result relating ℑ(ℓ)(P) and |P|CZ;ℓ.We emphasize that the number of qubits on whichP acts is an unspecified parameter inboth of these functions.

Theorem 18 Let P commute withZ(ℓ).

14

Page 15: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

• |P|CZ;ℓ = 0 iff ℑ(ℓ)(P) ∼= {1,1, . . .}.

• |P|CZ;ℓ = 1 iff ℑ(ℓ)(P) ∼= {1,−1,1,−1, . . .}

• |P|CZ;ℓ ≤ 2 iff ℑ(ℓ)(P) is congruent to some multi-set S of unit norm complex numberswhich come in conjugate pairs.

Proof. The first and second statements follow immediately from Theorem 17 and thecalculationsℑ(ℓ)(I) = {1,1, . . .} and ℑ(ℓ)(CZ(ℓ,m)) = {1,−1,1,−1, . . .}. To perform therelevant calculation for the third statement, we will use Corollary 13.

Let ℓ be the most significant qubit. Forδ a diagonal real operator acting on all qubitsbut ℓ, defineΦ(δ ) by

=

• •

Φ(δ )• Ry(δ ) •

\ \

By construction,|Φ(δ )|CZ;ℓ ≤ 2. We computeℑ(ℓ)(Φ(δ ))= {e2iδ0 ,e−2iδ0,e2iδ1,e−2iδ1, . . . ,}.(⇐) Write the entries ofSaseiφ · {eiθ0,e−iθ0,eiθ1,e−iθ1, . . .}, and letθ be the real diag-

onal operator acting on all qubits butℓ whose diagonal entries areθ0,θ1, . . .. By construc-tion, ℑ(ℓ)(Φ(θ/2)) = S, andS∼= ℑ(ℓ)(Q) by hypothesis. By Theorem 17,Φ(θ/2) ∼ℓ Qareℓ-equivalent. It follows that|Q|CZ;ℓ = |Φ(θ/2)|CZ;ℓ ≤ 2.

(⇒) By hypothesis|Q|CZ;ℓ ≤ 2. If in fact |Q|CZ;ℓ = 0,1, note by the first two statementsof the Theorem, which have been proven, theℓ-mux-spectrum ofQ has the desired prop-erty. Thus we assume|Q|CZ;ℓ = 2. Let C be a circuit in which this minimalCZ count isachieved. By Corollary 13, we can find an equivalent circuitC ′ of the following form.

=

Rz(θ) • •

Q A•

B C•

\ \

We have drawn theCZs with different lower contacts, but of course they might be thesame. Actually, we prefer the latter case, and ensure it by incorporating swaps intoB,C ifnecessary. We take a cosine-sine decomposition (see Equation 2) ofB

=

Rz(θ) • •

Q A• Ry(β ) •

C\ \ BL BR

Note that theBL andBR gates commute with theCZs. ThusQ∼ℓ Φ(β ). By Theorem 17,the ℑ(ℓ)(Q) ∼= ℑ(ℓ)(Φ(β )). But we have already seen thatℑ(ℓ)(Φ(·)) always consists ofconjugate pairs of unit-norm complex numbers.�

15

Page 16: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

4.2 Circuit constraints from the cosine-sine decomposition

This section is devoted to the study of Equation 11. We take cosine-sine decompositions ofa,b. Below,Al ,Ar ,Bl ,Br are two-qubit diagonal operators, andα ,β are 2×2 real diagonalmatrices of angular parameters.

1 b= BL

Ry(−β )BR

2

(12)

1 a= AL

Ry(α)AR

2

(13)

DefineP = ALPBR andQ = A†RQB†

L to obtain:

1 Ry(−β ) Ry(α)

=2P Q

\

(14)

We recall the standard argument used to measure the uniqueness of the KAK decom-position [8]. Throughout this discussion, we will write simply Ry(α) for R(1)

y (α(2)), andsimilarly for Ry(β ). Rearrange the equation to obtainQ†Ry(α)P = Ry(β ). Transform-ing the equation byk 7→ Z

(1)k†Z

(1), we getP†Ry(α)Q = Ry(β ). Multiplying these equa-tions yieldsP†Ry(2α)P = Ry(2β ). ThusRy(2α) andRy(2β ) have the same eigenvalues.One can check that in fact they are conjugate under an elementof the groupW gener-ated byX(2) andCZ(1,2); note that these operators commute withZ

(1). That is, there existsw ∈ W such thatwRy(2α)w† = Ry(2β ). Now let t = wRy(α)w†Ry(−β ). We have botht = Ry(ξ ) for some 2×2 real diagonal matrixξ acting on qubit 2, andt2 = I ; it followsthatt ∈ {±I ,±Z(2)}. DefiningP = P· [tw⊗ I ] andQ = Q· [w⊗ I ] reduces our equation tothe following.

1 Ry(−α) Ry(α)

=2P Q

\

(15)

By an argument similar to that given forP andQ, the operatorsP andQ both commutewith Ry(2α). Conjugation byRy(α) is an involution on the set of operators commutingwith Ry(2α); Equation 15 says thatP andQ are interchanged by this involution. In fact,this involution always has a simpler description:

Lemma 19 Equation 15 also holds for someα for which αi is an integer or half-integermultiple ofπ. Half-integers occur if and only if2αi is an odd integer multiple ofπ.

Proof. Decompose 2αi = φi + ψi (mod 2π) whereφi ∈ (−π,π), whereψi = 0 unlessφi = 0, andψi ∈ {0,π} in any event. Then any operator which commutes withRy(2α)also commutes withRy(φ/2). Thus, on operators commuting withRy(2α), conjugationby Ry(α) is the same as conjugation byRy(α −φ/2) = Ry(α −φ/2−ψ/2)Ry(ψ/2). But2(α −φ/2−ψ/2) = 0 (mod 2π). �

We also record the constraints imposed on possibleP,Q by the value ofθ = 2α .

16

Page 17: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Lemma 20 Fix distinct qubitsℓ,m. Let U be a unitary operator commuting withZ(ℓ), andlet θ be a two-by-two real diagonal matrix of angular parameters which is understood tooperate on m. Then U commutes with R(ℓ)

y (θ) if and only if one of the following holds:

1. cos(θ) is scalar, and either

(a) sin(θ) = 0.

(b) sin(θ) is a nonzero scalar and U0 = U1.

(c) Zsin(θ) is a nonzero scalar and U0 = Z(m)U1Z

(m).

2. cos(θ) is not scalar, U commutes withZ(m), and either

(a) sin(θ0) = 0 andsin(θ1) = 0.

(b) sin(θ0) = 0 andsin(θ1) 6= 0 and U01 = U11.

(c) sin(θ0) 6= 0 andsin(θ1) = 0 and U00 = U10.

(d) sin(θ0) 6= 0 andsin(θ1) 6= 0 and U0 = U1.

Proof. The (⇐) direction is trivial. For(⇒), suppose[R(ℓ)y (θ (m)),U ] = 0 and expand

using the expressionR(ℓ)y (θ (m)) = exp(iY(ℓ)θ (m)) = cos(θ)(m) + iY(ℓ) sin(θ)(m) in order to

observe thatU0 andU1 both commute with cos(θ)(m), andU0 sin(θ)(m) = sin(θ)(m)U1.Now repeatedly apply the fact that two-by-two matrices which commute with a two-by-two diagonal matrix with distinct entries are themselves diagonal.�

Finally, we translate these results back to the original operatorsP,Q.

Lemma 21 In the situation of Equation 11, at least one of the followingmust hold.

1. Either a,b are diagonal or aX(1),bX(1) are diagonal.

2. There exists a two-qubit operator U and two-qubit diagonals D,D′ such that

= D′ DP U

Similarly, there exists a two-qubit operator V and two-qubit diagonals C,C′ such that

= C′ CQ V

3. Either P or PX(2) commute withZ(2). There exist replacements a′,b′ for a,b whichare in the subgroup generated by two-qubit diagonal operators on qubits 1 and 2,C

(2)X

(1), andX(1), such that Equation 11 continues to hold.

Proof. This amounts to unwinding the above discussion in light of Lemma 20. Case Icomes from Case 1.a of the Lemma; theX appears because of the 2 inθ = 2α . Case IIcomes from Cases 1.b and 1.c. The first claim in Case III is justCase 2 of the Lemma;the possibleX here comes from thew factor in P = Ptw from the discussion above. Thesecond claim follows from Lemma 19.�

While we cannot completely characterize operators with| · |CZ;ℓ = 3, we can character-izeCZ(ℓ)-minimal circuits which compute them.

17

Page 18: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Theorem 22 Fix a qubitℓ, and suppose M commutes withZ(ℓ). Suppose|M|CZ; j = 3, andlet C be aCZ( j)-circuit exhibiting this bound. Then all one-qubit gates ofC on ℓ arediagonal or anti-diagonal.

Proof. ConsiderM,C satisfying the hypothesis. Without loss of generality,ℓ = 1 andC

takes the form1 h • g • f • e

2H

•G

•F

•E

\The CZs may have originally had different terminals, but we can incorporate swaps intoE,F,G,H to suppress this behavior. This affects neither the hypothesis nor the conclusion.

(*) DefineP by1

=•

2P G

•F

\

If PX(2) commutes withZ(2), then return to (*) and replaceG by GX(2), H by X(2)H, andh

by Z(1)h. This does not affect the conclusion, and by Equation 1, the resulting circuit stillcomputesM. We have ensured that if one ofP,PX(2) commutes withZ(2), then it isP.

Definea,b,Q by1 a

=f • e

2 •

1 b=

h • g

2 •1

=2 Q H† M E†\

Note |Q|CZ;1 = |M|CZ;1. We also haveQ = [a⊗ I ]P[b⊗ I ], hence are in the situation ofEquation 11. Lemma 21 allows us to reduce to the following cases.Case I.a,b are diagonal, oraX(1),bX(1) are diagonal. In either case, Corollary 9 applied tothe circuits defininga,b shows thate, f ,g,h are each diagonal or anti-diagonal.Case II. Q takes the form

1= C′ C

2 Q V\

The cosine-sine decomposition (see Equation 2) ofV along qubit 2 determines unitaryoperatorsR,Sand a real diagonal operatorδ such that:

2V =

Ry(δ )

\ S R

(16)

We substitute, commute theS,T outwards pastC,C′, and decompose the diagonalsC,C′.

1 • • Rz • •

2 Rz �������� Rz(θ) �������� Ry(δ ) �������� Rz(φ) �������� Rz

\ S R

18

Page 19: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Evidentlyℑ(1)(Q) depends only onθ ,δ ,φ . We calculate that, up to a global scalar multi-ple,ℑ(1)(Q) consists of the roots of the following quadratics inT:

T2−2T(cos(2θ +2φ)cos(δi)2 +cos(2θ −2φ)sin(δi)

2)+1

The equations being real, each has complex conjugate roots.By Theorem 18,|M|CZ;1 =|Q|CZ;1 = 2, contrary to hypothesis.Case III. We have already ensured thatP, rather thanPX(2), commutes withZ(2). Wereplacea,b by the a′,b′ of Lemma 21. We demultiplexP (see Equation 3) to obtain adecomposition of the following form, whereD is diagonal.

1= D2

\ P S R

The operatorsS,Rcommute pasta′,b′ to the edges of the circuit, and thus do not affecttheCZ-cost ofQ. That is,|Q|CZ;ℓ = |[a′⊗ I ]D[b′⊗ I ]|CZ;ℓ.

By construction,|P|CZ;ℓ = |D|CZ;ℓ = 1. If D = |0〉 〈0|(ℓ)⊗D0+ |1〉 〈1|(ℓ)⊗D1, Theorem18 asserts the entries ofD†

0D1 areeiθ{1,−1,1,−1, . . .}. ThusD can be written as

1

D =

Rz(−θ/2) •

2 π •π† D0\

for some permutationπ. We setN := X(1)([a′ ⊗ I ]D[b′ ⊗ I ])†

X(1)[a′ ⊗ I ]D[b′ ⊗ I ], so that

ℑ(1)([a′⊗ I ]D[b′⊗ I ]) is given by the entries of〈0|(1) N |0〉(1). EvidentlyD0 commutes pasta′ and cancels withD†

0. Applying Equation 1 to eliminateX gates, the following circuitcomputesN.

b′ Rz(−θ/2) • a′ �������� (a′)† �������� • Rz(−θ/2) �������� (b′)† ��������

π •π† π • •

π†

The condition ona′ implies that(a′)†X

(1)a′X(1) is diagonal. It follows that the subcircuitsandwiched between the twoCZs computes a diagonal operator, and so theCZs cancel.Then theπ, π† pair on the left cancel. Theπ†

Z(m)π term on the right commutes past the

(b′)†. What remains is a circuit of the form

1F

2 π •π†

\

By construction,N commutes with bothZ(1) andZ(2). It follows thatF is diagonal. Thenf = 〈0|(1) F |0〉(1) is some one-qubit diagonal acting onm. We have〈0|(1) N |0〉(1) =

π†Z

(2)π f (2). Denote by f0, f1 the entries off . Then the entries of〈0|(1) N |0〉(1) aref0, f1,− f0,− f1, and moreoverf0 will occur with the same multiplicity as− f1; likewise− f0 will occur with the same multiplicity asf1. We see that

− f0/ f1ℑ(1)([a′⊗ I ]D[b′⊗ I ])come in conjugate pairs. By Theorem 18,|[a′ ⊗ I ]D[b′⊗ I ]|CZ;1 ≤ 2. But now|M|CZ;1 =|Q|CZ;1 = |[a′⊗ I ]D[b′⊗ I ]|CZ;1, contrary to hypothesis.�

19

Page 20: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

4.3 Corollaries

The PERES gate implements a three-qubit transformation from classical reversible logicPERES

(ℓ;m;n) = C(ℓ)X

(m) ·CC(ℓ,m)X(n). As shown in [12], it can be a useful alternative to theTOFFOLI gate in reversible circuits.

Corollary 23 |PERES|CZ = 5.

Proof. As is clear from its definition, thePERES gate can be implemented by the circuitof Figure 1, save the rightmostCNOT. Thus, |PERES|CZ ≤ 5. On the other hand, it alsofollows from the definition that any circuit for thePERES can, with the addition of a singleCNOT, become a circuit for theTOFFOLI. Thus|PERES|CZ ≥ |TOFFOLI|CZ−1 = 5, and allinequalities are equalities.�

In a different direction, we consider below multiply-controlled Z gates:

Corollary 24 |(n−1)−controlled−Z|CZ ≥ 2n for any n≥ 3.

Proof. We proceed by induction onn. Suppose the Corollary is false; choose minimalfalsifying n, and a falsifying circuitC . By Theorem 15,n > 2. As before, at least threeCZ gates are incident to each qubit, and counting shows that at least one, sayℓ touchesexactly three. As before, we can assume that all one-qubit operators which appear onℓare diagonal. Form the circuitC ′ = 〈1|(ℓ) C |1〉(ℓ) by replacing every gateg of C withg′ = 〈1|(ℓ) g|1〉(ℓ). This has no effect on gates which do not touchℓ; it turns one-qubit gateson ℓ into scalars, and replacesCZ(ℓ,s) with Z

(s). At any rate,C ′ is aCZ-circuit on (n−1)qubits which computes the(n−2)-controlled-Z. We deduce by induction that it containsat least 2(n− 1) CZ gates. Adding the (at least) threeCZs incident toℓ, there are at least2n+1 totalCZs inC . �

5 Three-qubit diagonal operators

We give here a complete classification of three-qubit diagonal operators by theirCZ-cost.Throughout this section, we assume no ancillae are available and label our qubits 1, 2, 3,from most significant to least significant. We abbreviate〈i|(1) 〈 j|(2) 〈k|(3) D |i〉(1) | j〉(2) |k〉(3)

by Di jk . We also write∆(η) for the one-qubit gate given by|0〉〈0|+ |1〉 〈1|η . Define

λ1(D)=D011D000

D001D010, λ2(D)=

D101D000

D100D001, λ3(D)=

D110D000

D100D010, ξ (D)=

D111D2000

D100D010D001

Then any three-qubit diagonalD admits the expansion

D = D000·∆(

D100

D000

)(1)

·∆(

D010

D000

)(2)

·∆(

D001

D000

)(3)

·diag(1,1,1,λ1(D),1,λ2(D),λ3(D),ξ (D))

The λi(D) are multiplicative,λi(DD′) = λi(D)λi(D′), and likewise forξ . We denote byS(D) the ordered quadruple(λ1(D),λ2(D),λ3(D),ξ (D)).

Observation 25 For D,D′ three-qubit diagonal operators, S(D) = S(D′) iff S(D†D′) =(1,1,1,1) iff D†D′ is a tensor product of one-qubit diagonal operators. It follows thatS(D) = S(D′) =⇒ |D|CZ;i = |D′|CZ;i .

20

Page 21: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Observation 26 ℑ(i)(D) = {1,λ j(D)†,λk(D)†,ξ (D)†λi(D)} where{i, j,k} = {1,2,3}.

Lemma 27 A three-qubit diagonal D can be implemented in a three-qubitCZ-circuit with:

• 0 CZs on touching qubit 1 iff S(D) = (ξ ,1,1;ξ )

• 1 CZ touching qubit 1 iff S(D) = (ξ ,−1,−1, ;ξ ),(−ξ ,1,−1;ξ ),(−xi,−1,1 ξ ).

• 2 CZs touching qubit 1 iff S(D) = (a,b,c;abc),(a,b,c;ab/c),(a,b,c;ac/b).

Proof. This is just a translation of Theorem 18 using Observation 26, involving a straight-forward but tedious calculation which we omit.�

The two possibilitiesS(D) = (a,b,c;abc),(a,b,c;ab/c) are quite different, and thefollowing result helps distinguish between them.

Lemma 28 Let D be a three-qubit diagonal operator and u be a one-qubit gate. Suppose|Du(3)

CZ(1,3)|CZ;1 = 1 or |CZ(1,3)u(3)D|CZ;1 = 1. Thenλ1(D)λ2(D) = λ3(D)ξ (D).

Proof. The conclusion being stable underD → D†, we assume|Du(3)CZ

(1,3)|CZ;1 = 1. De-composeu† = eiθ Rz(α)Ry(β )Rz(γ). Thenℑ(ℓ)(A) is given by the roots of the polynomials

x2−cos(2β )(1−λ2(D))x−λ2(D)

x2−cos(2β )(λ3(D)−ξ/(D)λ1(D))x−λ3(D)ξ (D)/λ1(D)

For these to have roots either{p, p,−p,−p} or {p, p, p, p}, the two equations must havethe same constant terms – either bothp2 or both−p2. �

We turn to computingCZ-costs. These being invariant under relabelling of qubits,wewrite s(D) for (λ1(D),λ2(D),λ3(D);ξ (D)), where we ignore the order of theλi .

Observation 29 Given two three-qubit diagonals D,D′, s(D) = s(D′) if and only if thereexist one-qubit diagonals d,d′,d′′ and a wire permutationω such that D= (d⊗d′⊗d′′) ·ωDω†. Thus s(D) = s(D′) =⇒ |D|CZ = |D′|CZ.

Theorem 30 Let D be a three-qubit diagonal operator. Then there exists aCZ-circuit forD containing

• 0 CZs iff s(D) = (1,1,1;1).

• 1 CZ iff s(D) = (1,1,−1;−1).

• 2 CZs iff s(D) = (1,1,ξ ;ξ ),(1,−1,−1;1).

• 3 CZs iff s(D) = (1,1,ξ ;ξ ),(ξ ,−1,−1;ξ ),(−ξ ,1,−1;ξ ).

• 4 CZs iff s(D) = (a,b,c;ab/c).

• 5 CZs iff s(D) = (a,b,c;ab/c),(a,b,c;abc)

• 6 CZs always

21

Page 22: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Proof. We assume without loss of generality thatD takes the form diag(1,1,1,λ1,1,λ2,λ3,ξ ).We number the qubits 1,2,3 from most to least significant.

(⇐). We can assume that in factS(D) takes the form given. Our constructions will usetheCX, which may be replaced by theCZ at the cost of insertingHADAMARD gates.

Case 0.S(D) = (1,1,1;1) =⇒ D = I .Case 1.S(D) = (1,1,−1;−1) =⇒ D = CZ

(1,2).Case 2a.S(D) = (ξ ,1,1;ξ ). Fix η =

ξ ;

1

D =2 ∆(η) • •

3 ∆(η) �������� ∆(1/η) ��������

Case 2b.S(D) = (1,−1,−1;1) =⇒ D = CZ(1,3)

CZ(1,2).

Case 3a. S(D) = (ξ ,1,1;ξ ). By Case 2a, theCZ can be implemented in a circuitcontaining 2CZs. It follows that any operator that can be implemented withn > 0 CZscan be implemented withn+ 1. Thus sinceD can be implemented with 2CZs, it can beimplemented with 3.

Case 3b.S(D) = (ξ ,−1,−1;ξ ). Fix η =√

ξ ;

1

D =

2 ∆(η) • •

3 ∆(η) �������� ∆(1/η) • ��������

Case 3c.S(D) = (−ξ ,1,−1;ξ ). Fix η =√

−ξ .

1

D =

2 ∆(η) • • •

3 ∆(η) �������� ∆(1/η) ��������

Case 4.S(D) = (a,b,c;ab/c). Fix square rootsα ,β ,γ for a,b,c;

1

D =

∆(β ) • •

2 ∆(α) • •

3 ∆(αβ/γ) �������� ∆(γ/α) �������� ∆(1/γ) �������� ∆(γ/β ) ��������

Case 5a.S(D) = (a,b,c;ab/c). As D can be implemented with 4CZs, it can be imple-mented with 5.

Case 5b.S(D) = (a,b,c;abc). Fix square rootsα ,β ,γ for a,b,c;

1

D =

∆(βγ) • • •

2 ∆(αγ) • �������� • ∆(1/γ) ��������

3 ∆(αβ ) �������� ∆(1/α) �������� ∆(1/β ) ��������

22

Page 23: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Case 6.More generally, any n-qubit diagonal operator hasCZ-cost bounded by 2n−2.See [3] or Section 2.3.

(⇒).Case 0.D must be locally equivalent toI , hences(D) = (1,1,1;1).Case 1.D must be locally equivalent to someCZ, hences(D) = (1,1,−1;−1)Case 2Suppose there exists a minimal implementation ofD in which bothCZ gates

connect the same two qubits. ThenD is locally equivalent to a two-qubit diagonal; inwhich case one can computes(D) = (ξ ,1,1;ξ )

Otherwise, there is a minimal implementation ofD in which the twoCZ gates areCZ(i, j),CZ

( j,k). By Corollary 13, we may pass to an implementation with only diagonal one-qubitgates alongj; by Corollary 10, we may pass to an implementation with only diagonal one-qubit gates alongi,k as well. But thenD is locally equivalent toCZ(i, j)

CZ( j,k) and we may

computes(D) = (1,−1,−1;1).Case 3. It suffices to show that|D|CZ; j ≤ 1 for some j. For, if |D|CZ; j = 0, thenD

is a two-qubit diagonal, withs(D) = (ξ ,1,1;ξ ), and if |D|CZ; j = 1, then by Lemma 27,s(D) = (−ξ ,1,−1;ξ ) or (ξ ,−1,−1;ξ ).

Consider an implementation ofD containing threeCZs. We have|D|CZ;ℓ ≤ 1 for someℓ unless theCZs are distributed so that each qubit touches exactly two. Letj be a qubittouching the middleCZ. By Corollary 13, we can assume the circuit contains only diagonalgates on qubitj; it follows by inspection thatD ∼ j CZ

(i, j)CZ

( j,k). But we have alreadydetermined that|CZ(i, j)

CZ( j,k)|CZ; j = 1.

Case 4. Consider an implementation ofD containing fourCZs. If any qubit touchesfewer than twoCZs, we reduce to the previous case and observe that the desiredconditionon s holds. Thus suppose each qubit touches at least twoCZs. Then there are only twopossibilities for the number ofCZs touched by each qubit:(2,2,4) and(2,3,3).

For the configuration(2,2,4), say qubitsℓ,m touch twoCZs and qubitn touches four.Note that noCZs connectℓ,m. Thus we may assume by Corollary 13 all one-qubit gateson ℓ,m are diagonal. By Proposition 5, detℓ,mD is separable; this says precisely thatλℓ(D)λm(D) = λn(D)ξ (D).

For the configuration(2,3,3), say qubit 1 touches twoCZs and qubits 2,3 touch three.Then there are twoCZs connecting qubits 2 and 3, one connecting qubits 1 and 3 and oneconnecting qubits 1 and 2. By Corollary 13, we ensure that allone-qubit gates on qubit 1are diagonal. If theCZs connecting qubits 2 and 3 are outermost,D ∼ℓ CZ

(1,2)CZ

(1,3), hencecan be implemented with threeCZ s by Case 3. Otherwise, one of theCZs incident on qubit1 is outermost; without loss of generality let it beCZ(1,3). Then we have an equation ofthe formD = u(3)

CZ(1,3)A where by constructionA commutes withZ(1) and |A|CZ;1 = 1.

Lemma 28 yields the desired result.Case 5.It suffices by Lemma 27 to show that|D|CZ;ℓ ≤ 2 for someℓ. Suppose not; then

in any five-CZ implementation forD, each qubit must touch threeCZs. It follows that twoof the qubits, sayℓ,m touch exactly threeCZs, and the remaining qubit touches four. ByTheorem 22, all one-qubit gates onℓ,mare diagonal or anti-diagonal. Enough applicationsof Equation 1 will ensure that all one-qubit gates onℓ,mare in fact diagonal. Move theCZwhich connectsℓ,m to the edge of the circuit. This yieldsD = CZ

(ℓ,m)A, where|A|CZ;ℓ ≤ 2.By Lemma 27, it follows that|D|CZ;ℓ ≤ 2 as well.�

23

Page 24: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

6 Circuits with ancillae

The proofs of Theorems 15 and 30 assume that only three qubitswere present, and use thisassumption when enumerating possible circuit configurations with a given total number ofCZ gates. This dependency can be eliminated. Indeed, these cases involved so fewCZs thatone could eliminate configurations with ancillae by performing explicit checks.

More significant is the use of Proposition 5 and the characterization by Theorem 18 of|D|CZ;ℓ ≤ 2. Both of these statements are true for any fixedN, but suffer whenN is allowedto vary. For example if onlyN = 3 qubits are available, then det1,2CCZ

(1,2,3) = CZ(1,2), so

by Proposition 5, theCCZ cannot be implemented in any three-qubitCZ-circuit in which allgates commute withZ(1),Z(2). But if N = 4 qubits are present, det1,2(CCZ

(1,2,3)) = I (1,2), soCCZ

(1,2,3) ⊗ I (4) canbe implemented in a four-qubitCZ-circuit in which all one-qubit gatescommute withZ(1) andZ(2).

Similarly, for N = 3 qubits, we haveℑ(ℓ)(CCZ) = {1,1,1,−1} and thus by Theorem 18|CCZ|CZ;ℓ ≥ 3. However, forN = 4 qubits,ℑℓ(CCZ(ℓ,m,n)) = {1,1,1,−1,1,1,1,−1}, so nowTheorem 18 implies that|CCZ(1,2,3) ⊗ I (4)|CZ;1 = 2. Indeed:

=

• •

H • H • H • H •

• • •• • •

On the other hand, the propertiesℑ(ℓ)(U)∼= {1,1, . . .} andℑ(ℓ)(U)∼= {1,−1,1,−1. . .}are stable under adding ancillae. By Theorem 18, so are the properties|U |CZ;ℓ = 0 and|U |CZ;ℓ = 1. Since only these properties are used in the proof of Lemma 28, it too holdseven in the presence of ancillae. This leads to an extension of the CZ-cost classification ofthree-qubit diagonals to the case where ancilla qubits are permitted.

Lemma 31 Let A be a unitary operator; letC be qubit minimal amongCZ-circuits com-puting A, possibly with the use of ancillae, using only|A|a

CZCZ gates. Then every ancilla

in C touches at least threeCZ gates.

Proof. Fix an ancilla qubitℓ. If no CZ gates touchℓ, then it may be removed. If one(respectively two)CZ touchesℓ, then by Corollary 10 (respectively Corollary 13), thenthere is a circuit with no moreCZs in which the only one-qubit gates ona are diagonal.

Now form the circuit〈0|(ℓ) C |0〉(ℓ) as in the proof of Corollary 24. This circuit com-putes the operatorA using one fewer ancilla, fewerCZs thanC . �

Corollary 32 For any two-qubit operator V ,|V|aCZ

= |V|CZ.

Proof. If no ancillae are needed to minimizeCZ-count, then the result holds. Otherwise,each ancilla used in a qubit-minimalCZ-minimal implementation must touch at least threeCZgates. Thus| · |CZ ≥ |· |a

CZ≥ 3. However it is known [23, 22, 16] that two-qubit operators

have| · |CZ ≤ 3. Thus all the inequalities are equalities.�

Proposition 33 For any three-qubit diagonal operator, D,|D|aCZ

= |D|CZ.

24

Page 25: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Proof. Suppose|D|aCZ

< |D|CZ. By Lemma 31, a qubit-minimal circuit forD achieving thebound for|D|a

CZcontains at least threeCZ gates incident on each ancilla. By assumption at

least one ancilla is used, so|D|CZ > |D|aCZ

≥ 3. It follows from Theorem 30 and Lemma27 that|D|CZ;ℓ > 1 for the three qubitsℓ = 1,2,3. By Theorem 18, this property is stableunder addition of ancilla. Thus a qubit-minimal circuit forD achieving the bound for|D|a

CZ

contains at least 3CZs incident to each ancilla, and at least 2CZs incident to each non-ancilla qubit. Ifk ancillae are used, then we have|D|a

CZ≥ (3k+ 6)/2. From Theorem 30

and the supposition we have|D|aCZ

< |D|CZ = 6; it follows thatk = 1, that|D|aCZ

= 5, andthat |D|CZ = 6.

In any four-qubit, five-CZ circuit for D, we must have two of the non-ancilla, sayx1,x2

touching twoCZs, and both the remaining non-ancillaz and the ancillaa touching three.By Corollary 13, we can assume that the only one-qubit operators appearing onx1, x2 arediagonal. We may also assume that the graph where vertices are qubits and edges areCZgates is connected; otherwiseD could be split into the tensor product of a two-qubit and aone-qubit diagonal, and hence would have|D| ≤ 2. Then there are only three possibilitiesregarding which wires are connected byCZs.

I (x1,x2) (x1,z) (x2,a) (z,a) (z,a)

II (x1,z) (x1,z) (x2,a) (x2,a) (z,a)

III (x1,z) (x1,a) (x2,z) (x2,a) (z,a)

We will show that any circuit with thoseCZ gates can be transformed so that (*) aCZ

which does not touch the ancilla is outermost among theCZs, and (**) one of thex-qubitson which thisCZ gate acts has the property that all one-qubit gates acting onit are diagonal.As thisx-qubit only touched 2CZ gates to begin with, it follows from Lemma 28 thats(D)takes the form(a,b,c;ab/c). By Theorem 30,|D|CZ = 4, which is a contradiction.

We return to checking (*) and (**). Eliminate non-diagonal one-qubit gates onxi usingCorollary 13. In Case (I), the(x1,x2) CZ can therefore only be prevented from moving bythe (x1,a). This can be on only one side, so the(x1,x2) can be moved outwards to theother. Similarly, in Case (II), an(x,z) can only be blocked by(z,a) and the other(x,z). Inthis case, the second(x,z) is blocked on only one side and can be moved to the edge. InCase (III), we use Corollary 13 to clear both thex1 andx2 qubits of non-diagonal gates;the possible additional one-qubit gates will only fall on the z anda qubits. Now the(x1,z)can only be blocked by the(x2,z) and the(z,a), and also the(x2,z) can only be blocked by(z,a) and(x1,z). Thus one of(x1,z) and(x2,z) can be made outermost.�

Corollary 34 |CCZ|aCZ

= |TOFFOLI|aCZ

= 6 and |PERES|aCZ

= 5.

7 Conclusion

While our work is primarily focused on quantum circuit implementations, theTOFFOLIgate originally arose as a universal gate for classical reversible logic [21]. In contrast,the NOT andCNOT gates arenot universal for reversible logic: their action on bit-stringsis affine-linear over overF2, and thus the same is true for any operator computed by anycircuit containing only these gates.

AugmentingCNOT gates with single-qubit rotations to express theTOFFOLI gate pro-vides the lacking non-linearity. Thus the number of one-qubit gates (excluding inverters)

25

Page 26: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

needed to express theTOFFOLI, or more generally any reversible computation, can bethought of as a measure of its non-linearity. In this inverted cost model (also relevant tosome quantum implementation technologies) the following question remains open:howmany one-qubit gates are needed to implement theTOFFOLI? Furthermore,are there cir-cuits that simultaneously minimize the number ofCNOT and one-qubit gates ?

In a different direction, recall our results showing that diagonality and block-diagonalityof an operator impose strong constraints on small circuits that compute this operator. Webelieve other conditions may act in a similar way. In particular, we askwhat can be saidabout minimal quantum circuits for operators computable byclassical reversible circuits,i.e., operators expressed by 0-1 matrices?Very little is known even for three-qubit opera-tors. In particular, theCNOT-cost of the controlled-swap (Fredkin gate) remains unresolved.

Closest to our present work, the exactCNOT-cost of then-qubit analogue of theTOFFOLIgate remains unknown. We have shown that 2n CNOTs are necessary if ancillae are not per-mitted, but already forn= 4 we only know that 8≤ |CCCZ|CZ ≤ 14, where the upper boundis provided by a generic decomposition of diagonal operators [3]. Existing constructionsof then-qubit TOFFOLI gate require a quadratic number ofCNOT gates without the use ofancillae. With one ancilla, such constructions require linearly manyCNOTs, but the leadingcoefficient is in double-digits [1, 12].

Finally, we hope that our proof can be simplified and our techniques generalized. Inparticular, we have relied on repeated comparisons of various Cartan decompositions toeach other. A careful study of the proof will reveal the simultaneous use of six Cartandecompositions — those corresponding to conjugation byX andZ on each of three wires.Keeping track of these decompositions in a more systematic manner may simplify theproof, while using additional decompositions may lead to new results. A related challengeis gauging the power of the qubit-by-qubit gate counting we have used. It follows from theresults of [18] that|U |CZ;ℓ < 6(n− 1) for U an n-qubit operator, and hence no techniquerelying solely on this process can achieve better than a quadratic lower bound. On theother hand, we have only been able to characterize cases when|U |CZ;ℓ > 2, and thus haveachieved only linear lower bounds.

Acknowledgements. We thank Mikko Mottonen, Jun Zhang, K. Birgitta Whaley, andYaoyun Shi for helpful discussions. This work was sponsoredin part by the Air ForceResearch Laboratory under Agreement No. FA8750-05-1-0282.

References

[1] A. Barenco, C. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, P. Shor, T. Sleator,J. A. Smolin, and H. Weinfurter, Elementary gates for quantum computation,PRA52, 3457 (1995).

[2] S. S. Bullock, A note on the Khaneja-Glaser decomposition, QIC 4.5, 396-400(2004).

[3] S. S. Bullock, I. L. Markov, Asymptotically Optimal Circuits for Arbitrary n-qubitDiagonal Computations,QIC 4.1, 027-047 (2004).

[4] D. P. DiVincenzo and J. A. Smolin, Results on two-bit gatedesign for quantum com-puters,Proc. of the Workshop on Physics and Computation(1994).

26

Page 27: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

[5] D.P. DiVincenzo, Quantum gates and circuits.Proc. R. Soc. Lond. A, 454:261276,(1998).

[6] N. Margolus, Simple quantum gates, Unpublished manuscript (circa 1994).

[7] N. Khaneja and S. J. Glaser, Cartan decomposition of SU(n) and control of spinsystems,Chem. Physics267, 11-23 (2001).

[8] A. W. Knapp,Lie Groups Beyond an Introduction, Progress in Mathematics, vol. 140,Birkhauser, 1996.

[9] E. Knill, Approximation by quantum circuits, LANL report LAUR-95-2225.

[10] M. Lewenstein, B. Kraus, J. I. Cirac, and P. Horodecki, Optimization of entanglementwitnesses,PRA62, 052310 (2000).

[11] Yu. G. Makhlin, Nonlocal properties of two-qubit gatesand mixed states and opti-mizations of quantum computations,QIP 1, 243-252 (2002).

[12] D. Maslov and G. W. Dueck.IEE Electronics Letters39.25, 1790-1791 (2003).

[13] M. Mottonen, J. J. Vartiainen, V. Bergholm, and M. M. Salomaa, Quantum circuitsfor general multiqubit gates,PRL93, 130502 (2004).

[14] M. A. Nielsen and I. L. Chuang,Quantum Computation and Quantum Information,Cambridge University Press (2000).

[15] C. C. Paige and M. Wei, History and generality of the CS decomposition, LinearAlgebra and Applications208, 303 (1994).

[16] V. V. Shende, I. L. Markov, and S. S. Bullock, CNOT-optimal circuits for generictwo-qubit operators,PRA69, 062321 (2004).

[17] V. V. Shende, S. S. Bullock, and I. L. Markov, Recognizing small-circuit structure intwo-qubit operators.PRA70, 012310 (2004).

[18] V. V. Shende, S. S. Bullock, and I. L. Markov, Synthesis of quantum logic circuits,IEEE. Trans. on CAD25, 1000 (2006).

[19] G. Song and A. Klappenecker, Optimal realizations of controlled unitary gates,QIC3, 139-155 (2003).

[20] G. Song and A. Klappenecker, The simplified Toffoli gateimplementation by Mar-golus is optimal,QIC 4, 361-372 (2004).

[21] T. Toffoli, Reversible Computing,MIT Technical ReportMIT/LCS/TM-151 (1980).

[22] F. Vatan and C. Williams, Optimal quantum circuits for general two-qubit gates,PRA69, 032315 (2004).

[23] G. Vidal and C. M. Dawson, A universal quantum circuit for two-qubit transforma-tions with three CNOT gates,PRA69, 010301 (2004).

[24] J. Zhang, J. Vala, K. B. Whaley, and S. Sastry, A geometric theory of non-local two-qubit operations,PRA67, 042313 (2003).

27

Page 28: On the CNOT-cost of gates - arXivrequired in circuit implementations of operators. The results apply, mutatis mutandis, to CNOT-based implementations as well. Finally, in Section 4.3,

Appendix: Proof of Proposition 5

Below we restate Proposition 5 and complete its proof.

Proposition 5 Fix qubits ℓ1 . . . ℓk among N> k qubits. A unitary U commuting withZ

(ℓ1), . . . ,Z(ℓk) can be implemented by aCZ-circuit in which only diagonal gates operate onqubitsℓi if and only ifdetℓ1...ℓk(U) is separable (can be implemented by one-qubit gates).

Proof. (⇒). It suffices to show the separability of detℓ1...ℓk(U) for a small generating set ofoperators. Direct calculation confirms this for (i)CZ gates, (ii) diagonal one-qubit gates ontheℓi, and (iii) any gate not affecting qubitsℓi .

(⇐). By hypothesis, detℓ1...ℓk(U), and henceD = detℓ1...ℓk(U)−2k−N, can be imple-

mented using only one-qubit diagonal gates. It remains to implementU = U/D , whichsatisfies the normalizationU j1... jk ∈ SU(2N−k). We will construct a circuit forU by mul-tiplexing circuits forU j1... jk. Let C be a(N− k)-qubit circuit containing onlyCZs andone-qubitRx,Ry,Rz gates such that any operator in SU(2N−k) can be implemented by mak-ing the appropriate choice of parameter for theRx,Ry,Rz gates. Such universal circuitsexist [1]; see Section 2.3 for modern constructions. ChoosespecificationsC j1... jk imple-menting theU ji ... jk; let thes-th rotation gate inC j1... jk be given byRd(s)(θ j1... jk(s))

(q(s)),whereq(s) is a qubit,θ j1... jk(s) is an angle, andd(s) = x,y,z. DefineΘ(s) to be the realdiagonal operator on qubitsℓi . . . ℓk such thatΘ(s) ji ... jk = θ j1... jk(s). Form theN-qubit cir-cuit C by replacing thes-th rotation gate ofC by the multiplexed rotationRd(s)(Θ(s))(q(s));thenC implementsU . ImplementRd(s)(Θ(s))(q(s)) by aCZ-circuit containing no one-qubitoperator on any qubit saveq(s), which is not one of theℓi (see [13] or Section 2.3).�

Corollary 35 N-qubit operators which commute withZ on k qubits can be implementedusing on the order of2k4N−k one-qubit andCZ gates.5

Proof. This follows from the construction in the proof of Proposition 5 and the knownestimates in the casesk = 0,N−1 [13] andk = N [3]. �

5 Dimension-counting following [9] shows that roughly this many are necessary for almost all such operators.

28