DLinAlg: Distributed Linear-algebra-based simulator

Extending the mono-node CLinalg simulator, the DLinAlg simulator is a distributed Linear-algebra simulator written in C++, with a python interface. It is designed to take advantage of multi-core (with OpenMP parallelism) and multi-nodes architectures. Message Passing Interface (MPI) protocol is used to handle communication between the distributed resources used to store and manipulate quantum states. Currently, it is tested for the OpenMPI implementation.

This simulator offers the possibility to simulate larger circuits, as the complete representation of the state vector can be distributed across multiple nodes in a cluster, which allows us to overcome the memory limitation of a mono-node architecture. Each node in the cluster encodes a chunk of the quantum amplitudes in a vector with \(2^n/nbnodes\) complex numbers, where \(n\) is the total number of qubits in the circuit and \(nbnodes\) the number of nodes in the cluster used for the simulation. Each node can be considered to contain \(n-log_2(nbnodes)\) local qubits. These vectors situated on multiple nodes are modified by the application of quantum gates. When all the qubits that a quantum gate applies on are local, the operations are done locally and no communication is needed. Otherwise, the operation will be remote.

By default, the simulator can only simulate one or two qubits gates for remote application (and gate of any arity for local application). The Localizer plugin can be applied to make gates of higher arity local, which is useful when simulating circuits with large gates.

This QPU’s performances can also be improved on some benchmarks via the FusionPlugin plugin.

There are several constraints that applies to the simulator at the current stage:

The state vector is distributed uniformly between all the nodes used by the simulator, so preferably all the nodes used by a simulator should be of similar architecture.

Currently, the number of nodes used for a simulation must be a power of 2. Consequently, the size of the distributed state vector in each node will also be a power of 2.

Only sampling jobs with a specified number of shots and observable jobs can be simulated with this simulator. It cannot returns the entire amplitude vector in the Result as in the case of infinity number of shots (nbshots=0) in other ideal exact simulators, because the entire state vector does not fit in a single node in most use cases.

It can accept any gate (standard named gates and general n-qubit gates), as long as the arity of the gate does not exceed the number of qubits that each distributed resource holds.

Just like other ideal exact Linear-algebra simulators, it is memory and run-time exponential in the number of qubits. However, the number of qubits that can be simulated also scales with the number of computation nodes that is available in the cluster.

Comparison with some other open-source simulators

Less memory overhead – some simulators reserve a large temporary buffer, whether it is to perform the MPI communication or to calculate the value of an observable, some of them requiring twice as much memory as DLinAlg.

Fast simulation time – DLinAlg is optimized to run with a multi-core and multi-node architecture, a lot of optimization techniques are applied and tested.

Flexibility to construct a simulation stack with other Qaptiva plugins – Just like other Qaptiva simulators, the user can enhance DLinAlg with other Qaptiva plugins, whether it is to build the quantum circuit or transpile the circuit before simulation.

More complete functionality – Some simulators on the market can only simulate one-qubit gates remotely (not even two-qubits gates), DLinAlg is able to simulate any gates, as long as its arity is less than the number of qubits stored on a node.

Some additional tools – DLinAlg also offer the possibility to dump the state vector after a simulation into a binary file, that will be written in parallel by all the MPI processes.

Example of a simulation on a cluster

Two scripts are needed to run a distributed simulation:

A python program describing the quantum program to be simulated, and ressembles a program used by other Qaptiva simulator.

from qat.core import Job
from qat.qpus import DLinAlg

qaoa_job = Job.load(f"qaoa_35_qubits.job")
res = DLinAlg().submit(qaoa_job)
print(res)

A SLURM batch script that reserves some nodes in a cluster and then submits the python program to them.

#!/bin/bash

#SBATCH --partition XXX // partition of the cluster to be used
#SBATCH --nodes 8 // number of nodes
#SBATCH --time 24:00:00 // time limit of the simulation job
#SBATCH --job-name dlinalg_qaoa_35 // name of the job
#SBATCH --output slurm-%A_%x_%N_%t.out // SLURM output file format
#SBATCH --error slurm-%A_%x_%N_%t.err // SLURM error file format
#SBATCH --exclusive // reserve the nodes on exclusive mode

module load dlinalg
python3 -u dlinalg_qaoa.py

The job can then be submitted with the sbatch command:

sbatch submit_dlinalg.sh

Example of the simulation result:

Result(raw_data=[], _value=ComplexNumber(re=-19.28229108973577, im=0.0), error=None, value_data=None, error_data=None,
meta_data={'simulation_time': '147.557000'}, in_memory=False, data=None, qregs=[DefaultRegister(length=35, start=0, msb=None,
_subtype_metadata=None, key=None)], _parameter_map=None, _values=None, values_data=None, need_flip=False, nbqbits=None,
lsb_first=False, has_statevector=False, statevector=None)