DNoisy: Distributed noisy linear-algebra-based simulator
The DNoisy
QPU builds upon the DLinAlg
QPU to emulate a noisy quantum computer. As
such it will behave similarly to the noiseless DLinAlg
QPU, but will accept a hardware model describing the noise
model to be applied.
Just like DLinAlg
, the DNoisy
simulator is a distributed linear-algebra simulator written in
C++, with a python interface. It is designed to take advantage of multi-core (with OpenMP parallelism) and multi-nodes
architectures. Message Passing Interface (MPI) protocol is used to handle communication between the distributed resources used to
store and manipulate quantum states. Currently, it is tested for the OpenMPI implementation.
The current implementation is based on a deterministic simulation, where the full density matrix describing the qubits’ state is
stored. This makes DNoisy
both a noisy QPU (ie: it can simulate hardware noise) and an exact QPU (ie: the quantum
state is stored in such a way that its representation does not induce any approximation other than the precision of floating point
types).
This simulator offers the possibility to simulate larger circuits, as the complete representation of the state vector can be distributed across multiple nodes in a cluster, which allows us to overcome the memory limitation of a mono-node architecture. Each node in the cluster encodes a chunk of the quantum amplitudes in a vector with \(2^{2n}/nbnodes\) complex numbers, where \(n\) is the total number of qubits in the circuit and \(nbnodes\) the number of nodes in the cluster used for the simulation. Each node can be considered to contain \(2n-log_2(nbnodes)\) local qubits. These vectors situated on multiple nodes are modified by the application of quantum gates. When all the qubits that a quantum gate applies on are local, the operations are done locally and no communication is needed. Otherwise, the operation will be remote.
By default, the simulator can only simulate one or two qubits gates for remote application (and gate of any arity for local application).
The Localizer
plugin can be applied to make gates of higher arity local, which is useful when simulating circuits
with large gates.
This QPU’s performances can also be improved on some benchmarks via the FusionPlugin
plugin.
There are several constraints that applies to the simulator at the current stage:
The state vector is distributed uniformly between all the nodes used by the simulator, so preferably all the nodes used by a simulator should be of similar architecture.
Currently, the number of nodes used for a simulation must be a power of 2. Consequently, the size of the distributed state vector in each node will also be a power of 2.
Only sampling jobs with a specified number of shots and observable jobs can be simulated with this simulator. It cannot returns the entire amplitude vector in the
Result
as in the case of infinity number of shots (nbshots=0) in other ideal exact simulators, because the entire state vector does not fit in a single node in most use cases.It can accept any gate (standard named gates and general n-qubit gates), as long as the arity of the gate does not exceed the number of qubits that each distributed resource holds.
As a noisy exact linear-algebra simulators, it requires to store the full density matrix, which is of size \(2^{n} \times 2^n\) where \(n\) is the number of qubits. Hence, it is memory and run-time exponential in the number of qubits, and the space and time complexity of a noisy simulation of \(n\) noisy qubits is equivalent to a simulation of \(2n\) noiseless qubits. However, the number of qubits that can be simulated also scales with the number of computation nodes that is available in the cluster.
Example of a simulation on a cluster
Two scripts are needed to run a distributed simulation:
A python program describing the quantum program to be simulated and hardware model containing the noise model to be emulated. This program ressembles a program used by other Qaptiva simulator.
from qat.lang.AQASM import Program, H, CNOT from qat.hardware import make_depolarizing_hardware_model from qat.qpus import DNoisy nqbits = 16 prog = Program() qbits = prog.qalloc(nqbits) prog.apply(H, qbits[0]) for i in range(0, nqbits - 1): prog.apply(CNOT, [qbits[i], qbits[i + 1]]) circ = prog.to_circ() job = circ.to_job(nbshots=10000) res = DNoisy(depo_hardware).submit(job) for sample in res: print(f"Sampled state {sample.state} with probability {sample.probability} and error {sample.err}")
A SLURM batch script that reserves some nodes in a cluster and then submits the python program to them.
#!/bin/bash #SBATCH --partition XXX // partition of the cluster to be used #SBATCH --nodes 8 // number of nodes #SBATCH --time 24:00:00 // time limit of the simulation job #SBATCH --job-name dnoisy_ghz_16 // name of the job #SBATCH --output slurm-%A_%x_%N_%t.out // SLURM output file format #SBATCH --error slurm-%A_%x_%N_%t.err // SLURM error file format #SBATCH --exclusive // reserve the nodes on exclusive mode module load dnoisy python3 -u dnoisy_ghz.py
The job can then be submitted with the sbatch command:
sbatch submit_dnoisy.sh
Example of the simulation result:
Sampled state |0000000000000000> with probability 0.4368 and error 0.004960144786559195 Sampled state |0000000100000000> with probability 0.0017 and error 0.00041198054905211204 Sampled state |0000001000000000> with probability 0.0015 and error 0.00038702710369934015 Sampled state |0000010000000000> with probability 0.0023 and error 0.0004790552675787414 . . . Sampled state |1111111111111111> with probability 0.442 and error 0.004966494398130402