Meta-Designing Quantum Experiments with Sequence-to-Sequence Transformers

This repository implements a sequence-to-sequence transformer for generating meta-solutions—Python programs that generalize experimental designs for entire classes of problems. The implementation builds on the simple and effective framework of NanoGPT for transformer training, but is tailored to tasks involving the meta-design of experiments. Specifically, it enables the creation of scalable, interpretable solutions for designing quantum systems and other structured tasks.

Overview

This project trains a sequence-to-sequence transformer to:

Generate synthetic data based on predefined rules and random program generation.
Train a transformer model to map quantum or structured states to executable Python programs.
Evaluate the model’s ability to extrapolate to unseen tasks by sampling and analyzing generated solutions.

This approach enables the discovery of interpretable solutions that generalize across complex problem spaces, offering insights and capabilities beyond conventional optimization methods.

Repository Structure

Data Directories

`data_main` (main task)

Contains scripts and resources for generating and managing synthetic data for experimental setups:

generate_data.py: The second step in the data generation pipeline.
generate_topologies.py: The first step in the data generation pipeline.
graphdata.py: Library for computing quantum states from graph-based representations.
reorganizedata.py: Utility for restructuring data files into the required format.
shuffledata.py: Script for randomizing the order of data entries.
tok.json: Tokenization file for managing input and output sequences.
valpos_res.py: Collection of valid terms required for code generation.

`data_circuits` (additional example)

Synthetic data generation for quantum circuits:

datagenerator.py: Generates quantum circuit-related data.
src_tok.json: Tokenized input (source) data for training.
tgt_tok.json: Tokenized output (target) data for training.

Root Files

config_circuit.py: Configurations for transformer training on circuit data (additional example).
config_main.py: Configurations for transformer training on general experimental setup data (main task).
hdf5dataloader.py: A utility for efficiently loading large datasets in HDF5 format.
helper.py: Contains helper functions for data manipulation and processing.
sample.py: Samples Python programs generated by the trained transformer and evaluates their correctness.
seq2seq.py: Implements the transformer-based sequence-to-sequence model.
train.py: The main training script for fitting the sequence-to-sequence transformer.

Features

Meta-Design of Quantum Experiments

This repository focuses on using transformer models for meta-design, enabling the generation of scalable solutions to classes of problems. For example:

Generate Python programs for designing experimental setups for quantum states like GHZ and W-states.
Extrapolate solutions to larger system sizes using patterns captured during training.

Synthetic Data Generation

The synthetic data generation pipeline provides a large and diverse set of sequence pairs:

Programs (sequence B) generate experimental setups.
Quantum states (sequence A) resulting from the setups.

This asymmetric generation process allows training models on challenging mappings from quantum states to Python programs.

Reproducibility / Data

Below are instructions on how to reproduce our work based on the code provided here. We are in the process of uploading data and model checkpoint files to Zenodo for additional reproducibility. While these files will provide convenient access to pre-generated data and trained models, all necessary scripts and configurations are already included in this repository to allow complete reproduction of the data and models.

Installation

Clone the repository:

git clone https://github.com/artificial-scientist-lab/metadesign.git
cd metadesign

Install pytheus, a library necessary for simulating the quantum optics experiments relevant to our work:
```
pip install pytheusQ
```

Usage

Training

Run the training script with the desired configuration:

python train.py --config config_main.py

For quantum circuits:

python train.py --config config_circuit.py

Sampling

Sampling is as simple as providing the configuration file:

python sample.py --config config_main.py

Methodology

This repository employs a transformer-based sequence-to-sequence model trained on synthetic datasets of quantum states and Python programs. The transformer captures patterns in the data to generate interpretable solutions that generalize across problem domains. Synthetic data generation is achieved by simulating quantum optics experiments using the pytheusQ library for the main task or simulating quantum circuits using the qiskit library. Sampling uses probabilistic techniques to generate multiple candidate solutions, which are then evaluated for fidelity to the target quantum states.

Results and Insights

This project demonstrates the ability of transformer models to:

Generate human-readable Python code that generalizes across problem domains.
Rediscover known meta-solutions (e.g., GHZ state setups).
Discover new meta-solutions for previously unsolved classes of quantum experiments, such as spin-½ states in photonic systems.

The interpretability of the generated solutions provides human-readable insights into the underlying patterns, enabling scientists to extend these solutions to larger, more complex systems.

Citation

If you use this repository in your work, please cite:

Arlt, S., Duan, H., Li, F., Xie, S. M., Wu, Y., & Krenn, M. (2024). Meta-Designing Quantum Experiments with Language Models. arXiv:2405.06107.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Meta-Designing Quantum Experiments with Sequence-to-Sequence Transformers

Overview

Repository Structure

Data Directories

`data_main` (main task)

`data_circuits` (additional example)

Root Files

Features

Meta-Design of Quantum Experiments

Synthetic Data Generation

Reproducibility / Data

Installation

Usage

Training

Sampling

Methodology

Results and Insights

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data_circuits		data_circuits
data_main		data_main
README.md		README.md
config_circuit.py		config_circuit.py
config_main.py		config_main.py
hdf5dataloader.py		hdf5dataloader.py
helper.py		helper.py
sample.py		sample.py
seq2seq.py		seq2seq.py
train.py		train.py

artificial-scientist-lab/metadesign

Folders and files

Latest commit

History

Repository files navigation

Meta-Designing Quantum Experiments with Sequence-to-Sequence Transformers

Overview

Repository Structure

Data Directories

data_main (main task)

data_circuits (additional example)

Root Files

Features

Meta-Design of Quantum Experiments

Synthetic Data Generation

Reproducibility / Data

Installation

Usage

Training

Sampling

Methodology

Results and Insights

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`data_main` (main task)

`data_circuits` (additional example)

Packages