DNoisy: Distributed noisy linear-algebra-based simulator

The DNoisy QPU builds upon the DLinAlg QPU to emulate a noisy quantum computer. As such it will behave similarly to the noiseless DLinAlg QPU, but will accept a hardware model describing the noise model to be applied.

Just like DLinAlg, the DNoisy simulator is a distributed linear-algebra simulator written in C++, with a python interface. It is designed to take advantage of multi-core (with OpenMP parallelism) and multi-nodes architectures. Message Passing Interface (MPI) protocol is used to handle communication between the distributed resources used to store and manipulate quantum states. Currently, it is tested for the OpenMPI implementation.

The current implementation is based on a deterministic simulation, where the full density matrix describing the qubits’ state is stored. This makes DNoisy both a noisy QPU (ie: it can simulate hardware noise) and an exact QPU (ie: the quantum state is stored in such a way that its representation does not induce any approximation other than the precision of floating point types).

This simulator offers the possibility to simulate larger circuits, as the complete representation of the state vector can be distributed across multiple nodes in a cluster, which allows us to overcome the memory limitation of a mono-node architecture. Each node in the cluster encodes a chunk of the quantum amplitudes in a vector with \(2^{2n}/nbnodes\) complex numbers, where \(n\) is the total number of qubits in the circuit and \(nbnodes\) the number of nodes in the cluster used for the simulation. Each node can be considered to contain \(2n-log_2(nbnodes)\) local qubits. These vectors situated on multiple nodes are modified by the application of quantum gates. When all the qubits that a quantum gate applies on are local, the operations are done locally and no communication is needed. Otherwise, the operation will be remote.

By default, the simulator can only simulate one or two qubits gates for remote application (and gate of any arity for local application). The Localizer plugin can be applied to make gates of higher arity local, which is useful when simulating circuits with large gates.

This QPU’s performances can also be improved on some benchmarks via the FusionPlugin plugin.

There are several constraints that applies to the simulator at the current stage:

The state vector is distributed uniformly between all the nodes used by the simulator, so preferably all the nodes used by a simulator should be of similar architecture.

Currently, the number of nodes used for a simulation must be a power of 2. Consequently, the size of the distributed state vector in each node will also be a power of 2.

Only sampling jobs with a specified number of shots and observable jobs can be simulated with this simulator. It cannot returns the entire amplitude vector in the Result as in the case of infinity number of shots (nbshots=0) in other ideal exact simulators, because the entire state vector does not fit in a single node in most use cases.

It can accept any gate (standard named gates and general n-qubit gates), as long as the arity of the gate does not exceed the number of qubits that each distributed resource holds.

As a noisy exact linear-algebra simulators, it requires to store the full density matrix, which is of size \(2^{n} \times 2^n\) where \(n\) is the number of qubits. Hence, it is memory and run-time exponential in the number of qubits, and the space and time complexity of a noisy simulation of \(n\) noisy qubits is equivalent to a simulation of \(2n\) noiseless qubits. However, the number of qubits that can be simulated also scales with the number of computation nodes that is available in the cluster.

Example of a simulation on a cluster

Two scripts are needed to run a distributed simulation:

A python program describing the quantum program to be simulated and hardware model containing the noise model to be emulated. This program ressembles a program used by other Qaptiva simulator.

from qat.lang.AQASM import Program, H, CNOT
from qat.hardware import make_depolarizing_hardware_model
from qat.qpus import DNoisy

nqbits = 16

prog = Program()
qbits = prog.qalloc(nqbits)
prog.apply(H, qbits[0])
for i in range(0, nqbits - 1):
    prog.apply(CNOT, [qbits[i], qbits[i + 1]])
circ = prog.to_circ()

job = circ.to_job(nbshots=10000)
res = DNoisy(depo_hardware).submit(job)
for sample in res:
  print(f"Sampled state {sample.state} with probability {sample.probability} and error {sample.err}")

A SLURM batch script that reserves some nodes in a cluster and then submits the python program to them.

#!/bin/bash

#SBATCH --partition XXX // partition of the cluster to be used
#SBATCH --nodes 8 // number of nodes
#SBATCH --time 24:00:00 // time limit of the simulation job
#SBATCH --job-name dnoisy_ghz_16 // name of the job
#SBATCH --output slurm-%A_%x_%N_%t.out // SLURM output file format
#SBATCH --error slurm-%A_%x_%N_%t.err // SLURM error file format
#SBATCH --exclusive // reserve the nodes on exclusive mode

module load dnoisy
python3 -u dnoisy_ghz.py

The job can then be submitted with the sbatch command:

sbatch submit_dnoisy.sh

Example of the simulation result:

Sampled state |0000000000000000> with probability 0.4368 and error 0.004960144786559195
Sampled state |0000000100000000> with probability 0.0017 and error 0.00041198054905211204
Sampled state |0000001000000000> with probability 0.0015 and error 0.00038702710369934015
Sampled state |0000010000000000> with probability 0.0023 and error 0.0004790552675787414
   .
   .
   .
Sampled state |1111111111111111> with probability 0.442 and error 0.004966494398130402