paper aquantum

13
Quantum Sci. Technol. 5 (2020) 024003 https://doi.org/10.1088/2058-9565/ab7559 PAPER A quantum-classical cloud platform optimized for variational hybrid algorithms Peter J Karalekas , Nikolas A Tezak 1 , Eric C Peterson , Colm A Ryan , Marcus P da Silva 2 and Robert S Smith Rigetti Computing, 2919 Seventh Street, Berkeley, CA 94710 United States of America 1 Current address: OpenAI, 3180 18th St, San Francisco, CA 94110 United States of America. 2 Current address: Microsoft Quantum, One Microsoft Way, Redmond, WA 98052 United States of America. E-mail: [email protected] Keywords: cloud-based quantum computing, near-term quantum algorithms, quantum software engineering Supplementary material for this article is available online Abstract In order to support near-term applications of quantum computing, a new compute paradigm has emergedthe quantum-classical cloudin which quantum computers (QPUs) work in tandem with classical computers (CPUs) via a shared cloud infrastructure. In this work, we enumerate the architectural requirements of a quantum-classical cloud platform, and present a framework for benchmarking its runtime performance. In addition, we walk through two platform-level enhance- ments, parametric compilation and active qubit reset, that specically optimize a quantum-classical architecture to support variational hybrid algorithms, the most promising applications of near-term quantum hardware. Finally, we show that integrating these two features into the Rigetti Quantum Cloud Services platform results in considerable improvements to the latencies that govern algorithm runtime. 1. Introduction The rst experimental realizations of quantum algorithms date back to over a decade ago [14], but in the last three years quantum computing has rapidly transitioned from a eld of scientic research to a full-edged technology industry. The recent demonstration of quantum supremacy over classical computing [5] is a considerable milestone, but there is still much progress to be made on the road to solving real-world problems with quantum computers and achieving quantum advantage. Improving the error rates of quantum devices [6, 7] and ultimately reaching the regime of fault tolerance [8] is necessary for unlocking the most powerful known applications of quantum computers. At the same time, the industry has increased its focus on nding ways to solve valuable problems using the noisy intermediate-scale quantum (NISQ) processors that are currently available [9]. The desire to provide the research community with access to scarce quantum hardware in order to shorten the path to quantum advantage resulted in the development of a new compute architecturethe quantum cloud. As part of this architecture, the concept of an Internet-accessible data center has been extended to include quantum devices. Infrastructure for the quantum cloud requires a slew of new specialized hardware, for example, dilution refrigerators to house superconducting qubits and racks of microwave instruments to control them. To build the quantum cloud, some developers of quantum computers have pivoted to being full-stack, using in-house infrastructure to offer cloud-based access to their quantum devices (e.g. IBM Quantum Experience). In addition, some traditional cloud providers have begun to add quantum backends through strategic hardware-software partnerships. The rst iteration of quantum cloud offerings employed a hybrid cloud model [10], in which users of the service submitted quantum programs using a web API to a queue hosted by the public cloud (e.g. Amazon Web Services). Then, a server colocated with a quantum processor would periodically pull jobs off of the queue, execute them, and return results back to the user. This approach was effective in offering worldwide, public RECEIVED 4 December 2019 REVISED 9 February 2020 ACCEPTED FOR PUBLICATION 12 February 2020 PUBLISHED 26 March 2020 © 2020 IOP Publishing Ltd

Upload: others

Post on 10-Jul-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PAPER Aquantum

QuantumSci. Technol. 5 (2020) 024003 https://doi.org/10.1088/2058-9565/ab7559

PAPER

A quantum-classical cloud platform optimized for variational hybridalgorithms

Peter JKaralekas ,Nikolas ATezak1 , Eric CPeterson , ColmARyan ,Marcus P da Silva2 andRobert S SmithRigetti Computing, 2919 Seventh Street, Berkeley, CA 94710United States of America1 Current address: OpenAI, 3180 18th St, San Francisco, CA 94110United States of America.2 Current address:Microsoft Quantum,OneMicrosoftWay, Redmond,WA98052United States of America.

E-mail: [email protected]

Keywords: cloud-based quantum computing, near-term quantum algorithms, quantum software engineering

Supplementarymaterial for this article is available online

AbstractIn order to support near-term applications of quantum computing, a new compute paradigmhasemerged—the quantum-classical cloud—inwhich quantumcomputers (QPUs)work in tandemwithclassical computers (CPUs) via a shared cloud infrastructure. In this work, we enumerate thearchitectural requirements of a quantum-classical cloud platform, and present a framework forbenchmarking its runtime performance. In addition, wewalk through two platform-level enhance-ments, parametric compilation and active qubit reset, that specifically optimize a quantum-classicalarchitecture to support variational hybrid algorithms, themost promising applications of near-termquantumhardware. Finally, we show that integrating these two features into the Rigetti QuantumCloud Services platform results in considerable improvements to the latencies that govern algorithmruntime.

1. Introduction

Thefirst experimental realizations of quantum algorithms date back to over a decade ago [1–4], but in the lastthree years quantum computing has rapidly transitioned froma field of scientific research to a full-fledgedtechnology industry. The recent demonstration of quantum supremacy over classical computing [5] is aconsiderablemilestone, but there is stillmuch progress to bemade on the road to solving real-world problemswith quantum computers and achieving quantum advantage. Improving the error rates of quantumdevices[6, 7] and ultimately reaching the regime of fault tolerance [8] is necessary for unlocking themost powerfulknown applications of quantum computers. At the same time, the industry has increased its focus onfindingways to solve valuable problems using the noisy intermediate-scale quantum (NISQ) processors that arecurrently available [9].

The desire to provide the research community with access to scarce quantumhardware in order to shortenthe path to quantumadvantage resulted in the development of a new compute architecture—the quantum cloud.As part of this architecture, the concept of an Internet-accessible data center has been extended to includequantumdevices. Infrastructure for the quantum cloud requires a slew of new specialized hardware, forexample, dilution refrigerators to house superconducting qubits and racks ofmicrowave instruments to controlthem. To build the quantum cloud, some developers of quantum computers have pivoted to being full-stack,using in-house infrastructure to offer cloud-based access to their quantumdevices (e.g. IBMQuantumExperience). In addition, some traditional cloud providers have begun to add quantumbackends throughstrategic hardware-software partnerships.

Thefirst iteration of quantum cloud offerings employed a hybrid cloudmodel [10], inwhich users of theservice submitted quantumprograms using awebAPI to a queue hosted by the public cloud (e.g. AmazonWebServices). Then, a server colocatedwith a quantumprocessor would periodically pull jobs off of the queue,execute them, and return results back to the user. This approachwas effective in offeringworldwide, public

RECEIVED

4December 2019

REVISED

9 February 2020

ACCEPTED FOR PUBLICATION

12 February 2020

PUBLISHED

26March 2020

© 2020 IOPPublishing Ltd

Page 2: PAPER Aquantum

access to quantum resources, but suffers in terms of runtime efficiency due to the overhead of networkconnections and the use of a shared queue. In addition, the traditional webAPImodel fails to capitalize on oradapt to any properties specific to using a quantumdevice for computation.

In particular, themost promising approach to effectively using near-term quantumdevices is throughvariational hybrid algorithms (VHAs) [11]which employ a quantum-classical architecture, essentially leveragingthe quantum computer as a co-processor alongside a powerful classical computer. These algorithms have beenapplied to areas such as combinatorial optimization [12, 13], quantum chemistry [14, 15], andmachine learning[16], and numerous proposals for applications of the variationalmethod continue to arise with increasingfrequency [17, 18]. However, VHAs require a tight coupling between quantum and classical resources, and usinga cloud-hosted queue is slowwith respect to the scale of quantumoperations, especially on a superconductingdevice [19]. In addition, a quantum cloud architecturemust be specifically optimized in order to efficientlysupport the variationalmodel of execution. In this work, we investigate architectural bottlenecks of this newquantum-classical cloud, and provide a benchmarking framework to analyze its runtime performance.We thenuse the benchmark to quantify the dramatic reduction in latency achieved by the Rigetti QuantumCloudServices (QCS) platform via the implementation of specialized techniques for quantumprogram compilationand qubit register reset.

2. Runtime bottlenecks in the quantum-classical cloud

The job of a quantum cloud platform is to ingest programswritten in a backend-independent high-levelquantumprogramming language [20, 21], compile them into a platform-specific representation, run themonan available quantumdevice, and return the results to the user. Specifically, a quantum cloud platformhas fouressential components:

(i) An apparatus that houses the physical objects that act as qubits (e.g. an optical table and trapping system forions or neutral atoms).

(ii) A control system containing instruments for manipulating that apparatus in order to drive the desiredevolution and read out qubitmeasurement results.

(iii) An executor that orchestrates the control system to run quantum programs and returnmeasurement resultsto the user.

(iv) A compiler that takes in quantumprograms and produces instrument binaries for the executor.

To be categorized as quantum-classical cloud, a platformmust also include access to classical computeresources. Depending on the particular qubit implementation used by the platform, theCPU-QPU interactioncould become the largest bottleneck in the variationalmodel of execution. For example, when usingsuperconducting qubits (with gate times in the tens of nanoseconds) the CPU andQPU should be physicallycolocated in order to enable a low-latency link between the user and the quantumdevice. Although colocatinguser and compute is not a new concept in cloud computing in general, it has yet to take hold broadly in quantumcomputing, and drastically reduces overhead inVHAs. For Rigetti QCS, which uses superconducting qubits asthe backend, users interact with theQPUvia a preconfigured development environment called the quantummachine image (QMI) (figure 1(a)). TheQMI is a virtualmachine running on a classical compute cluster locatedinside theRigetti quantumdata center in Berkeley, CA, and contains the Forest SDK for building applicationsusing the quantum instruction languageQuil [21]. Oncewritten, quantumprograms are sent for compilationinto pulse-level instructions (figure 1(b)). The information that encodes this gate-to-pulsemapping is containedwithin a calibration database, and is updatedwhenever the systemdrifts out of specification. The binaries thatare returned by the compiler are then sent to the executor (figure 1(c)), which loads themonto a collection ofRigetti-custommicrowave arbitrary waveform generators (AWGs) and receivers, and triggers the instruments tobegin execution (figure 1(d)). The Rigetti AWGs then sendmicrowave pulses into a dilution refrigerator tomanipulate and read out the state of the Aspen-4 16-qubit quantumprocessor (figure 1(e)). Finally, thebitstrings resulting from readout are returned to the user’s compute environment (figures 1(f)–(g)) forprocessing and analysis.

In the variationalmodel of execution, one seeks tominimize an objective function that is expensive tocompute on classical hardware, by embedding it as a subroutine on a quantum computer. To begin, a classicalcomputer takes an initial guess and instructs a quantum computer to perform a predetermined computation.Then, from the output statistics of sampling the quantumprogrammany times, a classical optimizer running ontheCPUupdates theQPU instructions for the next round of iteration. Depending on the difficulty of the

2

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 3: PAPER Aquantum

problem, and the quality of theQPU, this will repeatmany times beforefinding a potential solution [22]. Thus,the structure of a variational hybrid algorithm is broken into two nested loops: an outer variational iteration loop,and an inner quantum execution loop.We denote each roundtrip completion of the variational iteration loop asan iteration step, and each roundtrip completion of the quantum execution loop as a shot. Each iteration stepincludes communicating with theQPU,waiting for it to return results, and using an optimizer to decide what torun on theQPUnext. Each shot is defined by running a single programof quantum instructions and gettingback a single bitstring ofmeasurement outcomes. This structure gives us away to think about the differentcomponents that contribute to hybrid algorithm runtime—they either occur once per step in the variationaliteration loop, or once per shot in the quantum execution loop. In order to improve the performance ofQCSand optimize it for the variationalmodel of execution, we began by constructing a latency budget for each of thenested loops (table 1).

The components of the per-step latency budget are instrument initialization, network communication, andcompilation, which is the largest contributor by at least one order ofmagnitude. The contribution frominstrument initialization has already been substantially reduced by using custom control hardware. In addition,the runtime of a single shot is split between gates (formanipulating and entangling qubit states), readout (formeasuring and extracting bitstring outcomes), and qubit reset. Qubit reset can be performed passively, bywaiting for all qubits to relax to their ground states. However, this relaxation process is the same decoherenceprocess that determinesT1, one of themetrics for qubit lifetime. Therefore, this passive qubit reset timewill onlyincrease as qubit performance improves [24, 25], and it already dominates the per-shot latency budget. Thus,from examining the two latency budgets, compilation and qubit reset are the areas towhich improvementswould have the largest impact on algorithm runtime.

3.Optimizing for the variational executionmodel

Having identified compilation and qubit reset as potential bottlenecks for a quantum-classical cloudarchitecture, wewalk through specific enhancementsmade tomitigate their contributions to the latency budgetsof Rigetti’s quantum cloud platform.

3.1. Parametric compilationThe underlying quantumprogram for a variational algorithm is parametric,meaning that from iteration step toiteration step, the sequence of instructions is static and only the instruction arguments change. Thus, aspecialized compiler that preserves this parametric ansatz structure can leverage it to improve runtime. To buildsuch a compiler, we need to understand the physical implementation of a high-level quantumprogram in orderto determinewhich instructions are easy tomodify parametrically at runtime. For superconducting quantumprocessors (andmany otherQPU implementations), qubits are controlled by shaped radio frequency (RF) ormicrowave pulses typically generated by anAWG.The pulse amplitude, duration, and phase control the rotationangle and axis effected on the qubits. Typically the phase of the pulse is determined by an abstract rotating

Figure 1. Schematic diagramof theQuantumCloud Services (QCS) data center. The classical compute (CPU) is connected over thenetwork to a collection of quantumprocessors (QPUs), which are composed of a classical host computer, control system rack, dilutionrefrigerator, and quantum integrated circuit (QuIC). (a)Users write quantumprograms usingQuil on theQMI,which is locatedinside the data center. (b)The quantumprogram is sent to the compiler. (c)The binaries produced in (b) are sent to the executor.(d)The executor loads the binaries onto a collection of Rigetti AWGs. (e)Rigetti AWGs sendmicrowave pulses into the dilutionrefrigerator containing theAspen-4 16Qdevice. (f)–(g)Bitstring results are returned to theQMI.

3

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 4: PAPER Aquantum

reference frame set to a frequency determined by the physics of the qubits.Z-rotations of a qubit then correspondto instantaneous reference frame update events that change the phase of subsequentmicrowave pulses.Withoutloss of generality, we can directly relate each parameter of a variational algorithm toZ-rotations or phase updatesfor one ormore qubit reference frames. Previously, the arguments to theseSHIFT-PHASE [26] operationswere provided at compile-time, and hardcoded in the instrument binary. Thus, for every step in aVHA, newinstrument binaries would need to be compiled, such that these arguments could be updated.

To circumvent this need for re-compiling, we implemented a feature called parametric compilation. Like astandard executablefile, each instrument binary includes a header and a collection of sections [27]. Theinstructionmemory section contains executable code, and thewaveformmemory section contains pulse shapesthat are referenced by the instructionmemory. Rather than havingSHIFT-PHASE instructions use a static datafield as an argument, each instrument binary now additionally has a datamemory sectionwhich instructions canaccess by reference. Like classical sharedmemory, the datamemory section can also be updated at runtime by anexternal process in order to change the effect of the binary [28]. The datamemory section layout of a particularbinary is prescribed byQuil’sDECLARE syntax (figure 2), which can be used to initialize namedmemoryregisters of various data types (BIT,OCTET,INTEGER, andREAL). Once initialized, thememory registers canbe provided as arguments to parameterized gates such asRZ.When these gates are compiled (figure 1(b)) theybecomeSHIFT-PHASE operations that reference entries in an empty datamemory section. Upon execution,the user provides these parametric binaries alongwith amap ofmemory register assignments, for example,variational parameters for the current iteration step of aVHA (figure 1(c)). Then, the executorfills in the datamemory section using thememorymap to produce patched binaries (figure 1(d)). For each step in aVHA, thesebinaries can be re-patchedwith a new set of variational parameters, and thus the variational iteration loop canrepeat until terminationwithout the need for compilation. For details on how to implement a variationalalgorithmusing parametric compilation, see appendix A.

3.2. Active qubit resetRather thanwaiting for qubits to passively reset, we can implement a protocol called active qubit reset that sets allqubits to their ground states at the beginning of a computation. This has been previously demonstrated bycontrollably transferring qubit excited state population to a systemwith amuch faster decay rate (such as thereadout resonators) [5, 29, 30], or by performing ameasurement and then quickly feeding back to conditionallyapply anX gate [31, 32]. But, for this protocol to be useful, the conditional control flowneeds to happen on atimescale comparable to the quantumgates, and it cannot introduce significant error into proceedingcomputations.

We chose to implement this classical feedback loop on our platformbecause it additionally unlocksmorecomplex feedback/feedforward circuits that take advantage ofmid-circuitmeasurements. To support this, the

Table 1.Per-step and per-shot latency budgets for the Aspen-4QPUviaQCS, rounded to onesignificant figure. The per-step budget was collected by runningMax-CutQAOA [12] on twoqubits, andmeasuring the completion time for each of the components. The per-shot budgetcontains the average gate and readout times for Aspen-4 from the calibration database. Thereset time isfive times the longestT1 timemeasured onAspen-4 (seefigure 3 for details). Thecompilation task is further broken down into two components: the first which convertsarbitrary quantumprograms into ones that use the gateset and topology of the targetquantumdevice (about 150 ms) [23]; and the secondwhich converts this nativized programinto pulse-level instrument binaries (about 50 ms).

Per-step latency budget

Task Time

Compilation 200 ms

AWG load and arm 8 ms

AWG trigger 10 ms

Network comms 5 ms

Per-shot latency budget

Task Time

Single-qubit gates 60 ns

Two-qubit gates 300 ns

Readout and capture 2 μs

Passive reset 100 μs

4

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 5: PAPER Aquantum

compilation toolchain can propagate control flow structures down to the hardware pulse sequencers. Coupledwith the ability to rapidly broadcast qubitmeasurement results across the control system, active reset is not aspecial-case operation but instead a simpleif-then-else control flowbranching off of qubitmeasurementresults. By providing theRESET instruction at the top of aQuil program, users can signal to the compilationtoolchain that theywould like to enable active qubit reset.When ingested by the compiler, theRESET directiveis translated to control-flow graphs (CFGs) [33] for each qubit that encapsulate the branching and loopingstructure of the active reset protocol.

Control-flowgraphs are composed of nodes called basic blocks, each of which includes a sequence ofinstructions without branching and up to two directed edges, or jump targets, to follow once the instructions arecomplete. If a block has two jump targets, it also contains a conditional expression for choosing between them.Each single-qubitRESETCFG contains four basic blocks—a header block, ameasurement block, an idle block,and a feedback block (figure 3(a)). The program starts at the header block, which initializes a counter to track thenumber of active reset rounds performed, and then jumps to themeasurement block and increments thecounter. Themeasurement block contains readout and capture instructions for the qubit, and conditionallyjumps to either the idle or feedback blocks dependent on the bit result produced by themeasurement. If theresult is 1, the qubit is in the excited state, and therefore the feedback block containing theX gate program isexecuted.Otherwise, if the result is 0, the program jumps to the idle block. After either the idle or feedback blockis completed, the program then jumps to themeasurement block again if the counter is less than the number ofreset rounds requested. After each single-qubit program traverses its active reset CFG, the program jumps to thefirst basic block of themain body quantum circuit and proceeds until completion.

The time required to perform the active reset protocol is dependent on the underlying architecture of aQPU.ForAspen-4, we require three rounds of feedback to achieve a reset performance comparable to that of passivequbit reset (figures 3(b)–(e)). All active reset sequencesmust be completed before themain body programbegins, and the readout operations onAspen-4 are in the 2–3 μs range. Combinedwith a broadcast and feedbacklatency of around 1 μs, this results in an active reset time of 9–12 μs, approximately a tenfold improvement overpassive reset times.

4. A volumetric framework for benchmarking runtime

Ausefulmetric for determining the runtime performance of a quantum-classical cloud platform isQPU latency,which tells us how long theCPUhas towait between requesting execution on a quantumdevice and receivingback the results. Generically, this latency depends on the number of qubits involved in the computation, thenumber of shots requested, and the programbeing run.We can define a function ( ) m n, , which returnsQPU latency and takes a number of qubitsm, a number of shots n, and a function that produces a family ofquantumprograms for a given number of qubits. Some examples of areGHZ_LINE, which produces thestraight-line program to create aGHZ state onm qubits [34], andMAXCUTQAOA_COMPLETE, which producesaQAOAprogram to solveMax-Cut on a fully-connected graphwithm nodes [12]. In addition, by sweeping nandmeasuring withfixedm and , we can determine two asymptotic latencies of a platform for a particular

Figure 2.Quantumprograms, written inQuil, forMax-CutQAOAon two qubits, and their corresponding circuit diagrams. ((a), top)The program, using a specific set of values for variational parameters (β, γ). These showup as arguments toRZ gates, withβ=π/4and γ=π/2. Thus, every iteration step, the program itself has to be updatedwith new variational parameters and re-compiled beforeexecution. ((b), bottom)The program, taking advantage of parametric compilation.Now, rather than providing values forβ and γwhenwriting the program,we instead initialize real-valued classicalmemory registersbeta andgammausingQuil’sDECLAREsyntax, and use them as arguments for theRZ gates. This allows for their assignment to be deferred from compile-time to run-time,and eliminates the need for re-compilation between variational iteration steps.

5

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 6: PAPER Aquantum

family of quantumprograms and number of qubits: the single-shot limit (n=1) tells us the variational steplatency (V ) and the scaling of the latency in the limit of large n tells us the quantum shot latency (Q).Moreformally, we assume that the total latency is straightforwardly related to the two components in the followingmanner

( ) ( ) ( ) ( )= + m n m n m, , , , . 1V Q

However, V and Q are still functions that depend on the number of qubits and the requested quantumprogram, and therefore need further specification to be an effective cross-platformbenchmark. Naivelychoosingm=1 and to be circuits containing onlyMEASURE instructionsmakes it trivial to experimentallybenchmark the system. By sweeping the number of shots,measuring latency, and fitting the data to a linearmodel, the resulting latency-axis intercept gives MEASURE( ) 1,V and the resulting slope gives

MEASURE( ) 1,Q . These numbers well-encapsulate some parts of the performance of a particular architecture,but fail to capture how the architecture fares as the number of qubits increases or as the program complexitychanges. For example, potential pitfalls such as long gate times or a poorly scaling control system initializationroutine are entirely omitted by the benchmark. But, simply settingm to be themaximumdevice size available ona particular platform is equallymisleading, as currentNISQdevices often have error rates such that they cannotproduce useful entanglement across the entireQPU.

Although there is no unanimously supported benchmark forQPUperformancewithin the community (andtheremay never be), log quantum volume [35, 36] ( Vlog Q2 )has been proposed as a reasonable near-termmetricfor the number of qubits that can bemeaningfully used in a computation.We are interested in something that issimilar to quantum volume (so that the choice of number of qubits remains relevant), butmore appropriate forthe near-termVHAs. Variational algorithms contain structures known as phase gadgets [37], which areRZ gatessandwiched betweenCNOTs. These structures are often the cornerstone of parametric-ansatz-style programs,and therefore we propose using a volumetric family of circuits [38] that we call random phase gadgets (RPG) tobenchmark algorithm runtime. The RPG circuit family incorporates the permutation aspect of quantumvolume for exercising connectivity, parallelism, and gateset [39], but replaces the random2Qunitaries withphase gadgets that haveRZ gates with randomly chosen arguments (figure 4). In addition, each permutation isfollowed by a layer ofHadamard gates on all qubits, tomake itmore difficult to compile away the phase gadgets.Setting =m Vlog Q2 and RPG= gives us the computationally relevant step (TV) and shot (TQ) latency of aQPU

RPG RPG( ) ( ) ( )= = T V T Vlog , , log , . 2V V Q Q Q Q2 2

Fitting the resulting runtime data to the linearmodelT(n)=TV+nTQ then allows us to easily estimatevariational algorithm runtimes using the computationally relevantQPU latency for a particular device availableon a quantum cloud platform.

Figure 3.Active qubit reset on a single qubit. (a)TheRESET control-flow graph, containing loop and branch constructs. The programstarts at header block hc (which is initializedwith c, the number of active reset rounds to perform). It then jumps tomeasurementblockM and increments the feedback counterκ. If themeasurement result is a 0, the program then jumps to idle block I; otherwise, itjumps to feedback blockX. After I orX the program jumps back toM unlessκ=c. In that case it jumps toU, themain body block.(b)–(e)Optimal quadrature histograms of IQ signal data from three successive rounds of active reset (c)–(e) on a single Aspen-4 qubitstarting from an equal superposition input state (b). OnAspen-4,T1 times range from10 to 20 μs. Thus, in order to achieve a resetfidelity > 99% in the passive reset case, wemust wait for 5T1=100 μs between shots (as 1−e−5>0.99). If we instead use activequbit reset, we can perform three rounds ofmeasurement and feedback on the order of 10 μs. By the third round (e), we achieve a resetfidelity of = 99.6%.

6

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 7: PAPER Aquantum

5.QPU latency results onQuantumCloud Services

With a volumetric framework for benchmarking runtime, we calculateTV andTQ for four different versions ofthe Rigetti quantum cloud platform. To demonstrate the initial performance improvements resulting fromsimply colocatingCPU andQPU,wefirst analyzed runtime data fromover 851 000 quantumprograms run byexternal users on the Acorn andAgaveQPUs via the ForestWebAPI, the initial version of Rigetti’s platform(figure 5(a)). The ForestWebAPI used the first-generationmodel of quantum cloud access, routing each jobthrough a queue onAmazonWeb Services (AWS), which resulted in considerable latencies. Because the ForestWebAPI programswere run by external users, there is no expectation that the average composition of theseprogramswouldmatch the composition of the benchmark programs described in section 4.However, onaverage the ForestWebAPI programs used 3 qubits, 2CZ gates, and 14RX gates. A program from RPG( )3 , uponcompilation to the native gateset available onAcorn andAgave, would havemore of both gates, and thereforewould take longer to execute. This, combinedwith the fact that the log quantum volume for Aspen-4 is

=Vlog 3Q2 , makes it reasonable to compare the data from the ForestWebAPI andQCS, and in fact skews thecomparison in favor of the former. Thus, usingmedian runtime data fromForestWebAPI, we calculateTQ=270 μs andTV=1.0 s. If we instead use the benchmarking framework from section 4 and collect data viaQCS’s colocated architecture,TV=410 ms andTQ=110 μs for the Aspen-4QPU (figure 5(b)). Usingparametric compilation (figure 5(c)), we can remove the compile step fromour runtime calculations, resultingin an improvement for small numbers of shots (TV=36 ms). For higher numbers of shots, passive reset timesstill dwarf the constant improvement fromparametric compilation. Finally, by enabling active qubit reset(figure 5(d)), we get an additional reduction in latencywithin the quantum execution loop. Thus, in this optimalconfiguration of the platform,TV=36 ms andTQ=21 μs, resulting in greater than 27× and 12×improvements, respectively, over the latencies of the first-generation accessmodel.

6. Conclusions

QuantumCloud Servicesmay be the first instance of a quantum-classical cloud platform, but this architecturalparadigmwill become increasingly common as the industry continues to progress toward useful applications ofquantum computers. Error rates and qubit count are well-known to be important systembenchmarks withinthefield, but asmore andmore hardware providers begin to offer access to quantum resources over the cloud,the latencies that govern this access and the resulting application runtimes will also be critical considerations forplatformperformance. Addressing these latencies requires approaching systembottlenecks with aninterdisciplinary quantum software engineeringmindset, bridging the knowledge bases of classical and quantumcomputing.We have shown that colocation, parametric compilation, and active qubit reset provideconsiderable improvements over thefirst-generation of quantum cloud offerings, but they are just a few of the

Figure 4.Randomphase gadget (RPG) family of volumetric circuits, used for benchmarking runtime on a quantum-classical cloudplatform. (left)The qubits and layers are indexed starting at zero, and the angle valuesαi,j for qubit j and layer i are chosen at random.Although the circuit family could be defined for an arbitrary number of layers d and qubitsm, we choose = =m d Vlog Q2 in order todetermine computationally relevant latencies. It is important to note, however, that this choice is arbitrary and onlymeant to simplifythe benchmark. As in quantumvolume, if the number of qubits is odd, the bottomqubit line has no gates. To benchmark a number ofqubitsm, we choose a set of permutations πi, run the resulting circuit r times, and compute the average runtime. For each run, werandomize all theα values and collect n shots. This effectively emulates a VHA [40], as the permutations arefixed ahead of time, andonly the phase gadgets themselves change. Then, repeating this entire process formultiple permutation sets ensures that we get a goodestimate of the average runtime for a particular number of qubits. (right)ExampleQuil circuit fromRPG(2), meaningm=d=2,and using parametric compilation to defer the assignment ofα.

7

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 8: PAPER Aquantum

many potential platform optimizations for accelerating industry progress and enabling the achievement ofquantumadvantage.

Acknowledgments

RSS designed the language constructs that support parametric compilation. PJK, ECP,NAT, andRSS built theparametric compilation toolchain. NAT implemented the software for expressing active qubit reset as a control-flow graph. CAR coordinated the integration of active reset into theQCS platform. PJK, ECP, andCARformulated the benchmark, and PJK collected and analyzed all runtime data. PJK andMPS architected theframework for runningVHAs using parametric compilation. RSS supervised theQCS effort. PJK, CAR, andMPSwrote themanuscript and prepared the figures3.

This workwas funded by Rigetti &Co Inc., dba Rigetti Computing.We thank theRigetti quantum softwareteam for providing tooling support, the Rigetti fabrication team formanufacturing the device, the Rigettitechnical operations team for fridgemaintenance, the Rigetti cryogenic hardware team for providing the chippackaging, the Rigetti control systems and embedded software teams for creating the Rigetti AWGcontrolsystem, and the Rigetti quantum engineering team for building the infrastructure for automatedQPUbringupand recalibration.

In addition, the authors would like to specifically thank Lauren ECapelluto, StevenHeidel, andAnthonyMPolloreno for their critical contributions to theQPU compiler toolchain, Glenn E Jones, Rodney F Sinclair, andBlake R Johnson for their work on designing a control system capable of supporting active reset, AlexanderDHill, JosephAValery, andAlexaN Staley for theirmanagement ofQPUperformance and deployment, ZacharyPBeane, AdamDLynch,Nicolas JOchem, andChristopher BOsborn for their orchestration of theQMIcompute infrastructure, E Schuyler Fried, Diego Scarabelli, and Prasahnt Sivarajah for their crucial work onexperimental bringup of active qubit reset, JoshuaCombes, Kyle VGulshen, andNicholas CRubin for buildingthe readout errormitigation tools that enable the implementation of VHAs,MatthewPHarrigan andWilliam JZeng for their leadership in designing a quantumprogramming interface toQCS, andMatthew JReagor forinsightful discussions onmodeling and benchmarkingQPU latency.

AppendixA. Implementing hybrid algorithmswith parametric compilation

The structure of quantumprograms for variational hybrid algorithms can be segmented into two parts [42]:parameterized ansatz preparation, andmeasurement in a variety ofmulti-qubit Pauli bases. Parametric

Figure 5.BenchmarkingQPU latency for Rigetti’s quantum cloud, plotted on a log–log scale to aid in visualization of the asymptoticbehavior. (a)Median runtime data from various quantumprograms run by external users via the ForestWebAPI, across the top tennumbers of shots (1, 10, 50, 100, 500, 1000, 2000, 5000, 8000, 10 000) used on that version of the platform. (b)Median runtime datacollected according to the framework in section 4 via the Rigetti QuantumCloud Services (QCS) platform, using the Aspen-4QPUwhich has =Vlog 3Q2 . TheQCSdata is taken for 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10 000, 20 000, 50 000, and100 000 shots, and eachmedian is extracted from 100 runs of the benchmarkwith afixed permutation set. (c)Median runtime datacollected as in (b), but with parametric compilation enabled. (d)Median runtime data collected as in (c), but with active qubit resetenabled. For this optimal platform configuration, we additionally note that the critical shot number (nc), which is the turning pointbetweenTV-dominatedQPU latency andTQ-dominatedQPU latency, occurs at the 1700-shotmark.

3All of the plotting and analysis from the paper can be recreated using the supplementary Jupyter notebooks and datasets [41], which are also

available on the notebook hosting serviceBinder.

8

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 9: PAPER Aquantum

compilation handles not only the parameterized ansatz component of aVHA, but can also be used toencapsulatemeasurement in arbitrary bases and symmetrization of readout error—all in a single quantumprogram. This allows for anyVHA to be expressed via a single parametric binarywithRZ arguments that areprovided and patched at runtime. To showhow this is possible, wewalk through three examples using pyQuil[43], the Python library forwriting and executingQuil programs.

A.1. Readout error symmetrization andmitigationWecan describemeasurement imperfections in terms of a confusionmatrixM, which describes probabilities ofreportedmeasurement outcomes conditioned on the values of the true state of the qubits. For the case of a singlequbit, the confusionmatrix can bewritten as

⎜ ⎟⎛⎝⎜

⎞⎠⎟

⎛⎝

⎞⎠

( ∣ ) ( ∣ )( ∣ ) ( ∣ )

( )= =-

-

M

Pr 0 0 Pr 1 0

Pr 0 1 Pr 1 11

1. A.10 1

0 1

While generallyMwill not be symmetric with respect to the exchange of 0s and 1s, one can enforce suchsymmetry by flipping bits beforemeasurement and subsequently flipping themeasurement outcomes. Thissymmetrization procedure corresponds to a simple formof twirling [44, 45], and in principle can be performedby randomly choosingwhich bits toflip independently each time ameasurement occurs4 .When averaging overall the shots, this results in a new effective confusionmatrixM′, with ò0 and ò1 replaced by ò′, which is thearithmeticmean of the two. This can be implemented inQuil (for a single qubit with index 0) by replacing themeasurement section of a programwith:

DECLARE symmetrization REAL 1DECLARE ro BIT 1RX symmetrization 0 0MEASURE 0 ro 0

[ ][ ]

( [ ])[ ].

During the execution of the program, thesymmetrizationmemory region assignments 0 andπmust beprovided, and the results from theπ assignmentmust beflipped (i.e.XORing the results with the bit value of 1).

It is straightforward to check that the effect of symmetricmeasurement errors is to scale the expectationvalue of a Pauli observable by factor that depends on the error rates. This immediately suggests how tomitigatethe effect of these errors: first characterize the error rates of the symmetrized readout, then rescale the observedexpectation values accordingly. Thefirst step is what we call readout calibration, while the second step is what we

Figure A1.Running 2QMax-CutQAOA for p=1 andfixedβ on qubits 1 and 2 of theAspen-4QPUusing parametric compilationand pyQuil’sExperiment framework. (a)Max-CutQAOA expressed as anExperiment object. Theprogram section containstheQuil program equivalent to theMax-CutQAOAansatz for p=1 andβ=π/8. The ansatz is composed of three parts: an initial∣++ñ state preparation, themixer unitary ( ) ( )b = b- +U em

X Xi 0 1 , and the cost unitary ( )g = g-U ecZ Zi 0 1. As theMax-CutQAOA cost

function uses onlyZZ expectation values, thesettings section contains only one entry. (b)Results from runningMax-CutQAOAfor fixedβ using qubits 1 and 2 on theAspen-4QPU, sweeping γ for 100 values in [−π/2,π/2]. The symmetrized data points arecollected using exhaustive readout symmetrization, butwithout correcting for imperfect readout. The corrected data points are thesymmetrized data points, but rescaled using the results frompyQuil’scalibrationmethod on theZZ expectation value for thequbits used in theExperiment.

4More sophisticated approaches to symmetrization can be taken bymaking use of orthogonal arrays, as implemented inforest-

benchmarking [46], but a detailed description of those techniques is beyond the scope of this article.

9

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 10: PAPER Aquantum

call readout errormitigation. Since errors due to the single-qubit rotations associatedwithmapping one Pauliobservable to another are orders ofmagnitude smaller thanmeasurement errors, readout calibration can focuson the expectation of tensor products ofZ observables only. Ideally the expectation of any of these tensorproducts would be+1 for the ground state (the ∣ ñ00 0 state), but for symmetrized readout error theexpectation valuewill beλ. Simply dividing the expectation value of this same observable for other states byλcorrects for the bias introduced by themeasurement errors. Note that, in general, evenwith symmetrization, adifferentλ for each different tensor product will be necessary, due to qubit-dependent readout signal-to-noiseratio (SNR) and potential correlations in the readout errors.Moreover, while the bias in the estimation of theexpectation value is removed, the uncertainty is increased (as ∣ ∣l < 1), so this procedure is not scalable [47].That being said, it is remarkably effective for small numbers of qubits.

Rather than handling symmetrizationmanually, we can use pyQuil’sExperiment framework.Whendefining anExperiment in pyQuil, two arguments are required—aprogramwhich defines themain bodyquantum circuit, and a list ofsettingswhich specify the different state preparations andmeasurements thatwewant towrap around themain program. In addition, theExperiment object also specifies the number ofshots to take, whether or not to perform active qubit reset, and how to symmetrize and calibrate readout. Toshowhow these features are used, we implement readout symmetrization and correction for a 2QMax-CutQAOAprogram (figure A1).

A.2. Bell state tomographyAn arbitrary single-qubit gateU can always be decomposed intowhat is in essence an Euler angle decomposition[48]

( ) ( ) ( ) ( ) ( ) ( ) ( )a b g g p b p a= -U R R R R R, , 2 2 . A.2z x z x z

Although ourQPUs can only performmeasurements in the z basis (Mz), performing single-qubit rotations priortomeasurement allows us to effectivelymeasure in a different basis

( ) ( ) ( )p p= - = -M M R M U2 0, 2, 0 , A.3x z y z

( ) ( ) ( )p p p p= = -M M R M U2 2, 2, 2 . A.4y z x z

Thus, by pre-pending a qubitmeasurement with an Euler-decomposed single-qubit gate containingRz

arguments that can be changed at runtime, we can perform a collection of differentmeasurements with a single

Figure A2.Running Bell state tomography on qubits 1 and 2 of theAspen-4QPUusing parametric compilation and pyQuil’sExperiment framework. (a)Bell state tomography, expressed as anExperiment object. Theprogram section contains the gatesrequired to generate the Bell state ∣ (∣ ∣ )F ñ = ñ + ñ+ 00 111

2. Thesettings section contains the 15 different non-identity Pauli

measurements required to tomograph a 2Q state, generated by the software libraryforest-benchmarking [46]. (b)Hinton plot[49, 50] of the ideal densitymatrix ρ as defined by the state ∣F ñ+ . (c)Hinton plot of the estimated densitymatrix ρest extracted fromreadout-corrected experimental data using the linear inversionmethod [51].We calculate a Bell state fidelity of =F+ 99.35% bycomparing ρ and ρest using thefidelity function fromforest-benchmarking.

10

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 11: PAPER Aquantum

parametric binary. InQuil, this is accomplished for a single qubit by adding the following snippet before themeasurement block:

DECLARE measurement alpha REAL 1DECLARE measurement beta REAL 1DECLARE measurement gamma REAL 1RZ measurement alpha 0 0RX pi 2 0RZ measurement beta 0 0RX pi 2 0RZ measurement gamma 0 0

[ ][ ]

[ ]( [ ])( )( [ ])( )( [ ])-

___

_

_

_ .

Additionally, this can be combinedwith readout symmetrization, allowing for any desired observable to besymmetrized, calibrated, and corrected. To showhowmeasurement bases can be changed parametrically, werunBell state tomography using a single binary (figure A2).

A.3. The variational quantumeigensolverFinally, we combine the techniques from the previous two examples in order to run a full variational algorithmusing a single parametric binary. The variational quantum eigensolver (VQE) [15], which is one of the leadingVHAs for applications in quantum chemistry, can be used to compute the ground state energy of the hydrogenmolecule (H2). To do so, VQE employs a classicalminimization routine inwhich the objective function isevaluated on theQPU. The procedure begins by preparing a parameterized ansatz wavefunction ∣ ( )qY ñusing aninitial guess for the variational parameter θ. The parameterized ansatz wavefunction can be chosen to becomposed of the unitary coupled cluster (UCC) ansatzU(θ) and theHartree–Fock (HF) reference state ∣Fñ [14],giving

∣ ( ) ( )∣ ∣ ( )q qY ñ = Fñ = ñq-U e 01 . A.5X Yi 0 1

Then, the procedure continues by computing the expectation value of theH2Hamiltonian, which can beformulated as a sumofmulti-qubit Pauli-basis expectation values with coefficients

Figure A3.Running the variational quantum eigensolver to compute the ground state energy of the hydrogenmolecule usingparametric compilation and pyQuil’sExperiment framework. (a)TheVQEH2 algorithm, expressed as anExperiment object.Theprogram section contains the parameterizedUCC ansatzU(θ) (equation (A.5)), and thesettings section contains the threemeasurement bases (expressed as Pauli strings) required to estimate the expectation value of theHamiltonian. TheHamiltonian(equation (A.6)) also contains single-qubitZ terms, but they can be determined frommeasurement outcomes of theZZ setting.(b)Results from running the VQEH2Experiment on theQuantumVirtualMachine (QVM) [52] aswell as qubits 1 and 2 on theAspen-4QPU.Weuse the table ofHamiltonian coefficients from the appendix ofO’Malley et al to convert our readout-correctedPauli expectations into expectation values for theH2Hamiltonian at different bond lengths. At theminimum-energy bond lengthRmin=0.750 Åwemeasure an energy differenceΔEmin=8±9 mHa. To demonstrate the framework, we simply scanned 250values of θ in the range [−π/2,π/2], but the latency numbers would not differ if we instead used an optimizer to update θ each step.Collecting the data took 248 s, compared to 266 s as predicted byT(2500)=88.5 ms repeated for 4 symmetrization settings, 3measurement bases, and 250 values of θ (for a total of 3000 executions on theQPU). This slight improvement over the prediction isconsistent, asT(n) from section 5 is calculated for a higher number of qubits (3) and number of two-qubit gates than are used in thisVHA.

11

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 12: PAPER Aquantum

( )= + + + + +H g g Z g Z g Z Z g Y Y g X X . A.60 1 0 2 1 3 0 1 4 0 1 5 0 1

Finally, the expectation value á ñH is fed to theminimizer to choose the variational parameter θ for the nextround of iteration, and this repeats until convergence. Following the protocol inO’Malley et al, we useparametric compilation and pyQuil’sExperiment framework to simulate the ground state energy of theH2

molecule withVQE (figure A3).

ORCID iDs

Peter J Karalekas https://orcid.org/0000-0001-6031-4800Nikolas ATezak https://orcid.org/0000-0002-5178-2298Eric CPeterson https://orcid.org/0000-0002-1633-0050ColmARyan https://orcid.org/0000-0001-9923-3354Marcus P da Silva https://orcid.org/0000-0002-6641-8712Robert S Smith https://orcid.org/0000-0002-6340-9589

References

[1] Chuang I L, GershenfeldN andKubinecM1998 Experimental implementation of fast quantum searching Phys. Rev. Lett. 80 3408–11[2] DiCarlo L et al 2009Demonstration of two-qubit algorithmswith a superconducting quantumprocessorNature 460 240–4[3] Gulde S, RiebeM, Lancaster GPT, Becher C, Eschner J, HäffnerH, Schmidt-Kaler F, Chuang I L andBlatt R 2003 Implementation of

the deutsch-jozsa algorithmon an ion-trap quantum computerNature 421 48[4] Kwiat PG,Mitchell J R, Schwindt PDDandWhite AG2000Grover’s search algorithm: an optical approach J.Mod.Opt. 47 257–66[5] Arute F et al 2019Quantum supremacy using a programmable superconducting processorNature 574 505–10[6] BallanceC J,Harty TP, LinkeNM, SepiolMA andLucasDM2016High-fidelity quantum logic gates using trapped-ion hyperfine

qubitsPhys. Rev. Lett. 117 060504[7] Hong S S, Papageorge AT, Sivarajah P, CrossmanG,DiderN, PollorenoAM, Sete EA, Turkowski SW, da SilvaMP and JohnsonBR

2020Demonstration of a parametrically activated entangling gate protected from flux noise Physical ReviewA 101[8] Preskill J 2012Quantum computing and the entanglement frontier arXiv:1203.5813 [quant-ph][9] Preskill J 2018Quantum computing in theNISQ era and beyondQuantum 2 79[10] Mell PMandGrance T 2011TheNIST definition of cloud computingTechnical Report SP 800-145Gaithersburg,MD,United States[11] McClean J R, Romero J, Babbush R andAspuru-Guzik A 2016The theory of variational hybrid quantum-classical algorithmsNew J.

Phys. 18 023023[12] Farhi E,Goldstone J andGutmann S 2014A quantum approximate optimization algorithm arXiv:1411.4028 [quant-ph][13] Otterbach J S et al 2017Unsupervisedmachine learning on a hybrid quantum computer arXiv:1712.05771 [quant-ph][14] O’Malley P J J et al 2016 Scalable quantum simulation ofmolecular energies Phys. Rev.X 6 031007[15] PeruzzoA,McClean J, Shadbolt P, YungM-H, ZhouX-Q, Love P J, Aspuru-Guzik A andO’Brien J L 2014A variational eigenvalue

solver on a photonic quantumprocessorNat. Commun. 5 4213[16] WilsonCM,Otterbach J S, TezakN, Smith R S, PollorenoAM,Karalekas P J,Heidel S, AlamMS,CrooksGE and da SilvaMP2018

Quantumkitchen sinks: An algorithm formachine learning on near-term quantum computers arXiv:1806.08321 [quant-ph][17] Bravo-Prieto C, LaRose R, CerezoM, Subasi Y, Cincio L andColes P J 2019Variational quantum linear solver: A hybrid algorithm for

linear systems arXiv:1909.05820 [quant-ph][18] VerdonG,Marks J, Nanda S, Leichenauer S andHidary J 2019Quantumhamiltonian-basedmodels and the variational quantum

thermalizer algorithm arXiv:1910.02071 [quant-ph][19] KjaergaardM, SchwartzME, Braumüller J, Krantz P,Wang J I J, Gustavsson S andOliverWD2019 Superconducting qubits: Current

state of play arXiv:1905.13641 [quant-ph][20] Cross AW, Bishop L S, Smolin J A andGambetta JM2017Open quantum assembly language arXiv:1707.03429 [quant-ph][21] Smith R S, CurtisM J andZengW J 2016Apractical quantum instruction set architecture arXiv:1608.03355 [quant-ph][22] McClean J R, Boixo S, Smelyanskiy VN, BabbushR andNevenH2018 Barren plateaus in quantumneural network training landscapes

Nat. Commun. 9 4812[23] Smith R S, Peterson EC,Davis E J and SkilbeckMG2020Quilc: AnOptimizingQuil Compiler (Zenodo) (https://doi.org/10.5281/

zenodo.3677536)[24] Gyenis A,Mundada P S,Di Paolo A,Hazard TM, YouX, SchusterD I, Koch J, Blais A andHouckAA 2019 Experimental realization of

an intrinsically error-protected superconducting qubit arXiv:1910.07542 [quant-ph][25] NersisyanA et al 2019Manufacturing lowdissipation superconducting quantumprocessors arXiv:1901.08042 [quant-ph][26] Smith R S 2020Quil: A Portable Quantum Instruction Language (Zenodo) (https://doi.org/10.5281/zenodo.3677540)[27] TISCommittee et al 1995Tool interface standard (TIS) executable and linking format (ELF) specification version 1.2TISCommittee[28] IEEE Standard for Information Technology–PortableOperating System Interface (POSIX(R))Base Specifications, Issue 7, in IEEE Std

1003.1-2017 (Revision of IEEE Std 1003.1-2008), pp.1–3951 (Accessed: 31 Jan. 2018)[29] EggerD J,WerninghausM,GanzhornM, Salis G, Fuhrer A,Müller P and Filipp S 2018 Pulsed reset protocol forfixed-frequency

superconducting qubitsPhys. Rev. Appl. 10 044030[30] Magnard P et al 2018 Fast and unconditional all-microwave reset of a superconducting qubit Phys. Rev. Lett. 121 060502[31] RistèD, BultinkCC, Lehnert KWandDiCarlo L 2012 Feedback control of a solid-state qubit using high-fidelity projective

measurement Phys. Rev. Lett. 109 240502[32] RyanCA, JohnsonBR, RistèD,Donovan B andOhki TA 2017Hardware for dynamic quantum computingRev. Sci. Instrum. 88

104703[33] AhoAV, Sethi R andUllman JD1986Compilers: Principles, Techniques, and Tools (Boston,MA,USA: Addison-Wesley Longman

PublishingCo., Inc.)[34] GreenbergerDM,HorneMAandZeilinger A 2007Going beyond bellʼs theorem arXiv:0712.0921 [quant-ph]

12

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al

Page 13: PAPER Aquantum

[35] Cross AW, Bishop L S, Sheldon S,Nation PD andGambetta JM2019Validating quantum computers using randomizedmodelcircuits Phys. Rev.A 100 032328

[36] MollN et al 2018Quantumoptimization using variational algorithms onnear-term quantumdevicesQuantumSci. Technol. 3 030503[37] CowtanA,Dilkes S, DuncanR, SimmonsWand Sivarajah S 2019 Phase gadget synthesis for shallow circuits arXiv:1906.01734

[quant-ph][38] Blume-Kohout R andYoungKC2019A volumetric framework for quantum computer benchmarks arXiv:1904.05546 [quant-ph][39] Peterson EC,CrooksGE and Smith R S 2019 Fixed-depth two-qubit circuits and themonodromy polytope arXiv:1904.10541

[quant-ph][40] CrooksGE 2018 Performance of the quantum approximate optimization algorithmon themaximumcut problem arXiv:1811.08419

[quant-ph][41] Karalekas P J 2020 SupplementaryNotebooks andDatasets for: AQuantum-Classical Cloud PlatformOptimized for Variational Hybrid

Algorithms (Zenodo) (https://doi.org/10.5281/zenodo.3635021)[42] McCaskeyA J, Parks Z P, Jakowski J,Moore SV,Morris TD,Humble T S and Pooser RC 2019Quantum chemistry as a benchmark for

near-term quantumcomputers npjQuantum Information 5 1–8[43] Karalekas P J et al 2020PyQuil: QuantumProgramming in Python (Zenodo) (https://doi.org/10.5281/zenodo.3631770)[44] Bartlett SD, Rudolph T and Spekkens RW2007Reference frames, superselection rules, and quantum informationRev.Mod. Phys. 79

555–609[45] Bennett CH,DiVincenzoDP, Smolin J A andWoottersWK1996Mixed-state entanglement and quantumerror correction Phys. Rev.

A 54 3824–51[46] GulshenKV et al 2019 Forest Benchmarking: QCVVusing PyQuil (Zenodo) (https://doi.org/10.5281/zenodo.3455848)[47] RyanCA, JohnsonBR,Gambetta JM, Chow JM, da SilvaMP,DialOE andOhki TA 2015Tomography via correlation of noisy

measurement records Phys. Rev.A 91 022118[48] NielsenMAandChuang I L 2011QuantumComputation andQuantum Information: 10thAnniversary Edition 10th edn (NewYork,

NY,USA: CambridgeUniversity Press)[49] Hunter J D 2007Matplotlib: a 2d graphics environmentComput. Sci. Eng. 9 90–5[50] Virtanen P et al 2020 SciPy 1.0: fundamental algorithms for scientific computing in PythonNatureMethods 17 261–72[51] WoodC J 2015 Initialization and characterization of open quantum systems (http://hdl.handle.net/10012/9557)[52] Smith R S 2020QuantumVirtualMachine (Zenodo) (https://doi.org/10.5281/zenodo.3677538)

13

QuantumSci. Technol. 5 (2020) 024003 P J Karalekas et al