Skip to content

The official implementation of Recurrent Diffusion for Large-Scale Parameter Generation.

License

Notifications You must be signed in to change notification settings

NUS-HPC-AI-Lab/Recurrent-Parameter-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scaling Up Parameter Generation: A Recurrent Diffusion Approach

🎥 Generating customized models with prompts in just minutes!

Generating.customized.models.with.prompts.in.just.minutes.mp4

You can try our demo on Hugging Face.

Abstract

Parameter generation has long struggled to match the scale of today’s large vision and language models, curbing its broader utility. In this paper, we introduce Recurrent Diffusion for Large-Scale Parameter Generation (RPG), a novel framework that generates full neural network parameters—up to hundreds of millions—on a single GPU. Our approach first partitions a network’s parameters into non-overlapping ‘tokens’, each corresponding to a distinct portion of the model. A recurrent mechanism then learns the inter-token relationships, producing ‘prototypes’ which serve as conditions for a diffusion process that ultimately synthesizes the full parameters. Across a spectrum of architectures and tasks—including ResNets, ConvNeXts and ViTs on ImageNet-1K and COCO, and even LoRA-based LLMs—RPG achieves performance on par with fully trained networks while avoiding excessive memory overhead. Notably, it generalizes beyond its training set to generate valid parameters for previously unseen tasks, highlighting its flexibility in dynamic and open-ended scenarios. By overcoming the longstanding memory and scalability barriers, RPG serves as a critical advance in ‘AI generating AI’, potentially enabling efficient weight generation at scales previously deemed infeasible.

Environment

Before you get started, you need to set up a conda environment first.

  1. Create your conda environment.
conda create -n rpg python=3.11
conda activate rpg
conda install pytorch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1 pytorch-cuda=12.1 -c pytorch -c nvidia
  1. Install mamba-ssm. (You may run into compilation issues, refer to the official mamba-ssm repository for details.)
pip install causal-conv1d
pip install mamba-ssm[causal-conv1d]
  1. Install other dependencies for this repository.
git clone https://github.com/NUS-HPC-AI-Lab/Recurrent-Parameter-Generation.git --depth=1
cd Recurrent-Parameter-Generation
pip install -r requirements.txt

Quick Start

This section covers the entire process from preparing the checkpoint dataset to training and testing the RPG model.

  1. Modify your config file.
# Set up your configs interactively.
python ./workspace/set_configs.py
  1. Prepare checkpoint datasets.
cd ./dataset/cifar10_resnet18
# cd ./dataset/<dataset_tag>
CUDA_VISIBLE_DEVICES=0 python train.py
# CUDA_VISIBLE_DEVICES=<GPU_index> python train.py
cd ../..
  1. Train RPG model. ('0' refers to GPU index)
cd ./workspace
bash launch.sh ./example/cifar10_resnet18.py '0'
# bash launch.sh <./relative/path/to/launch/script.py> '<GPU_index>'
cd ..
  1. Generate and test.
CUDA_VISIBLE_DEVICES=0 python ./workspace/evaluate/generate.py example.cifar10_resnet18
# CUDA_VISIBLE_DEVICES=<GPU_index> python ./workspace/evaluate/generate.py <relative.path.to.launch.script>
  1. Evaluate more details.
CUDA_VISIBLE_DEVICES=0 python ./workspace/evaluate/evaluate.py cifar10_resnet18
# CUDA_VISIBLE_DEVICES=<GPU_index> python ./workspace/evaluate/evaluate.py <dataset_tag>

The training and testing process for the other data followed a similar pattern.

Advanced Usage

Reproduce Section 4: RPG Generalizes to Unseen Tasks

In this section, we will show you how to reproduce the experiments of section 5: RPG’s Potential in Unseen Tasks.

Click here for details
  1. Modify your config file. (You can skip this step if you have done.)
python ./workspace/set_configs.py
  1. Prepare checkpoint dataset. (Choose one of two options.)
# Download our dataset from Hugging Face. (download about 68 GB)
cd ./dataset/condition_classinput_vittiny
git lfs install
git clone https://huggingface.co/datasets/MTDoven/ViTTiny1022
mv ./ViTTiny1022/* ./
rm -r ./ViTTiny1022
# Train by the sources. (need a long time)
cd ./dataset/condition_classinput_vittiny
CUDA_VISIBLE_DEVICES=0 bash train.sh
sh split.sh
cd ../..
  1. Train RPG model. ('1,2,3,4' refers to GPU index)
cd ./workspace
bash launch.sh ./condition/generalization.py '1,2,3,4'
cd ..
  1. Generate and test.
# Generate parameters for 20 random seen tasks
CUDA_VISIBLE_DEVICES=0 python ./workspace/condition/generate_seen.py

# Generate parameters for all unseen tasks
CUDA_VISIBLE_DEVICES=0 python ./workspace/condition/generate_unseen.py
  1. Check more detailed results.
cd ./dataset/condition_classinput_vittiny
CUDA_VISIBLE_DEVICES=0 python detail.py ./generated/generated_generalization_class0279.pth
cd ../..

Adapt other downstream tasks

In this section, we will use the DoRA adaptor for Llama as an example to show you how to apply our method to more downstream tasks.

Click here for details
  1. Create another conda environment for dora_llama following the official repositories, especially for commonsense reasoning. Meanwhile, you need to clone the repositories for dora_llama to any path you like. Then you should get a directory structure like this ("..." means there are many other files or folders here, but those are not important to us.):
└─DoRA
   ├─commonsense_reasoning
   │  ├─dataset
   │  │  ├─ARC-Challenge
   │  │  ├─ARC-Easy
   │  │  ├─boolq
   │  │  ├─hellaswag
   │  │  ├─openbookqa
   │  │  ├─piqa
   │  │  ├─social_i_qa
   │  │  ├─winogrande
   │  │  └─...
   │  ├─peft
   │  │  ├─src
   │  │  │  └─peft
   │  │  └─...
   │  ├─commonsense_170k.json
   │  ├─commonsense_evaluate.py
   │  ├─finetune.py
   │  ├─llama_7B_Dora.sh
   │  └─llama_7B_Dora_eval.sh
   └─...
  • You could try the official finetuning and testing code of dora_llama under /path/to/your/DoRA/commonsense_reasoning. If everything is working properly, it means all of your operation is correct. (Since waiting for the finetuning takes a lot of time, you can skip this step first. If there are any problems in the future, you can come back to test your dora_llama configuration.)
# execute under /path/to/your/DoRA/commonsense_reasoning

# finetuning
sh llama_7B_Dora.sh 32 64 ./finetuned_result/dora_r32 0

# testing
sh llama_7B_Dora_eval.sh ./finetuned_result/dora_r32 0
  1. Modify your config file.
vim ./dataset/config.json

###################### content in config.json ######################
{
  "dataset_root": ...,
  "imagenet_root": ...,
+ "dora_root": "/ABSOLUTE/path/to/your/DoRA/commonsense_reasoning",
+ "dora_env_name": "your_DoRA_conda_envrionment_name"
}
###################### content in config.json ######################
  1. Prepare checkpoint dataset.
cd ./dataset/downtask_dora_r4
CUDA_VISIBLE_DEVICES=0 python train.py
cd ../..
  1. Train RPG model. ('0' refers to GPU index)
cd ./workspace
bash launch.sh ./downtask/dora_r4.py '0'
cd ..
  1. Generate and test. (We recommend separating the generation and testing processes because the testing process is complex and time-consuming, and the separated operation makes it easier to check the results.)
# Generate without testing
CUDA_VISIBLE_DEVICES=0 python ./workspace/evaluate/generate.py workspace.downtask.dora_r4 "need_test=False,num_generated=5"

# Test one by one manually.
cd ./dataset/downtask_dora_r4
CUDA_VISIBLE_DEVICES=0 python test.py ./generated/generated_downtask_dora_r4_001.pth
CUDA_VISIBLE_DEVICES=0 python test.py ./generated/generated_downtask_dora_r4_002.pth
CUDA_VISIBLE_DEVICES=0 python test.py ./generated/generated_downtask_dora_r4_003.pth
CUDA_VISIBLE_DEVICES=0 python test.py ./generated/generated_downtask_dora_r4_004.pth
CUDA_VISIBLE_DEVICES=0 python test.py ./generated/generated_downtask_dora_r4_005.pth
cd ../..

Please note that the methods mentioned above involve automatically activating a specified conda environment through the dataset/downtask_dora_r4/test.py and dataset/downtask_dora_r4/train.py files, and executing the official training and testing shell script of dora_llama. For more details, you can check the specific contents of dataset/downtask_dora_r4/test.py (on line 85-90) and dataset/downtask_dora_r4/train.py (on line 100-105).

Adapt your own dataset

In this section, we will introduce how to register your own model checkpoints in this framework and use RPG to generate its parameters.

Click here for details
  1. Create a dataset
mkdir ./dataset/your_dataset_name
cd ./dataset/your_dataset_name
  • In this directory, there are three necessary items.
    1. A checkpoint folder is used to store checkpoints used for training. All the pretrained checkpoints should be placed in this folder.
    2. A generated folder is used to store the checkpoints to be generated. Before you start training RPG models, this folder should be empty.
    3. A test.py is used to test the specified checkpoint and output the test results. This test.py accepts a CLI argument that is the path to the checkpoint to be tested, which will facilitate subsequent calls.
  • The structure should be as following ("..." means there are many other files or folders here, but those are not important.):
└─Recurrent-Parameter-Generation
   ├─dataset
   │  ├─your_dataset_name
   │  │  ├─checkpoint
   │  │  │  ├─your_1st_checkpoint_name.pth
   │  │  │  ├─your_2nd_checkpoint_name.pth
   │  │  │  ├─your_3rd_checkpoint_name.pth
   │  │  │  └─...
   │  │  ├─generated
   │  │  ├─test.py
   │  │  └─...
   │  └─...
   └─...
  • Make sure you can run this command properly.
# execute under /path/to/Recurrent-Parameter-Generation/dataset/your_dataset_name
python test.py ./checkpoint/your_1st_checkpoint_name.pth
  • Remember to go back to the root directory.
cd ../..
# You should now be in the Recurrent-Parameter-Generation directory
  1. Register your dataset. You need to write your own dataset class in the dataset/register.py file, which contains three class variables.
vim ./dataset/register.py

####################################### add to the end of register.py #######################################
+ class Your_Dataset_Name(BaseDataset):
+     data_path = "./dataset/your_dataset_name/checkpoint"
+     generated_path = "./dataset/your_dataset_name/generated/generated_model.pth"
+     test_command = f"CUDA_VISIBLE_DEVICES={test_gpu_ids} python ./dataset/your_dataset_name/test.py " + \
+                    "./dataset/your_dataset_name/generated/generated_model.pth"
####################################### add to the end of register.py #######################################
  1. Create your training script. ('your_training_tag' is decided by yourself. And you can get more information for modifying the hyperparameters from the appendix of our paper)
cp ./workspace/example/cifar10_resnet18.py ./workspace/your_training_tag.py
vim ./workspace/your_training_tag.py

######################## on line 43 in your_training_tag.py ########################
- from dataset import Cifar10_ResNet18 as Dataset
+ from dataset import Your_Dataset_Name as Dataset
######################## on line 43 in your_training_tag.py ########################

###################### on line 49-91 in your_training_tag.py #######################
  config = {
      "seed": SEED,
      # dataset setting
      "dataset": Dataset,
-     "dim_per_token": 8192,
+     "dim_per_token": suitable_token_size_for_you,
      "sequence_length": 'auto',
      # train setting
-     "batch_size": 8,
+     "batch_size": suitable_batch_size_for_you,
      "num_workers": 16,
-     "total_steps": 80000,
+     "total_steps": the_number_of_steps_you_want_to_train,
-     "learning_rate": 0.00003,
+     "learning_rate": suitable_learning_rate_for_you,
      "weight_decay": 0.0,
-     "save_every": 80000//30,
+     "save_every": number_of_interval_steps_for_saving_and_testing,
      "print_every": 50,
      "autocast": lambda i: 5000 < i < 45000,
      "checkpoint_save_path": "./checkpoint",
      # test setting
      "test_batch_size": 1,  # fixed, don't change this
      "generated_path": Dataset.generated_path,
      "test_command": Dataset.test_command,
      # to log
      "model_config": {
          "num_permutation": 'auto',
          # mamba config
          "d_condition": 1,
-         "d_model": 8192,
+         "d_model": suitable_token_size_for_you,
          "d_state": 128,
          "d_conv": 4,
          "expand": 2,
          "num_layers": 2,
          # diffusion config
-         "diffusion_batch": 512,
+         "diffusion_batch": suitable_diffusion_batch_for_you,
-         "layer_channels": [1, 32, 64, 128, 64, 32, 1],
+         "layer_channels": suitable_layer_channels_for_you,
          "model_dim": "auto",
          "condition_dim": "auto",
          "kernel_size": 7,
          "sample_mode": DDPMSampler,
          "beta": (0.0001, 0.02),
          "T": 1000,
          "forward_once": True,
      },
-     "tag": "quick_start_cifar10_resnet18",
+     "tag": "your_training_tag",
  }
###################### on line 49-91 in your_training_tag.py #######################
  1. Train RPG model. ('0' refers to GPU index)
cd ./workspace
bash launch.sh your_training_tag.py '0'
cd ..
  1. Generate and test.
CUDA_VISIBLE_DEVICES=0 python ./evaluate/generate.py workspace.your_training_tag

Acknowledgment

We thank Zhiyuan Liang, Zhuang Liu, Gongfan Fang, Xuanlei Zhao, Yuhao Zhou, Mingjia Shi, Zangwei Zheng, Ziheng Qin, Tianlong Chen, and Zhangyang Wang for valuable discussions and feedbacks. This research is supported by the National Research Foundation, Singapore under its AI Singapore Programme (AISG Award No: AISG2-PhD-2021-08-008).

Citation

@misc{wang2025recurrent,
      title={Recurrent Diffusion for Large-Scale Parameter Generation},
      author={Wang, Kai and Tang, Dongwen and Zhao, Wangbo and You, Yang},
      year={2025},
}

About

The official implementation of Recurrent Diffusion for Large-Scale Parameter Generation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published