The Weights Transfer Playground is an experimental platform designed for cross-architecture analysis and weight transfer between diverse neural network models. This project facilitates the exploration of transferring weights from one architecture to another, enabling comparative studies and enhancing model performance through knowledge sharing.
- Cross-Architecture Weight Transfer: Transfer weights between different neural network architectures such as Transformers, LSTMs, Mamba, and LiquidS4.
- Unified Evaluation Framework: Evaluate multiple model architectures on synthetic and real-world datasets with comprehensive metrics.
- Visualization Tools: Visualize the convergence and differences in weights across various checkpoints and architectures using PCA and graph-based methods.
- Extensible Architecture: Easily integrate additional models and extend existing transfer mechanisms to accommodate new architectures.
- Automated Scripts: Utilize scripts for weight extraction, conversion, evaluation, and visualization to streamline experimentation.
To set up the Weights Transfer Playground, ensure you have Python 3.8+ installed. The project relies on several Python packages, including PyTorch and Hugging Face's Transformers library.
git clone https://github.com/your-username/weights-transfer-playground.git
cd weights-transfer-playground
You can install the required packages using pip
:
pip install torch transformers matplotlib networkx scikit-learn fire numpy
Alternatively, if you prefer using requirements.txt
, create the file with the necessary dependencies and install them:
pip install -r requirements.txt
Note: Ensure you have access to a GPU for efficient model training and inference.
The main.py
script demonstrates the transfer of weights from a pretrained LLaMA model to various target architectures. It processes a sample input text and compares outputs between the transferred models and the original LLaMA model.
python main.py
To convert a LLaMA model to a different architecture (e.g., LiquidS4), use the corresponding script in the convert/
directory:
python convert/convert_llama_to_liquid_s4.py
Evaluate different model architectures on synthetic datasets using the unified evaluation script:
python scripts/unifiedeval.py
Visualize the convergence of model weights across checkpoints or view the morphism graph:
python eval/view_checkpoints.py
python eval/view_graph.py
weights-transfer-playground/
├── README.md
├── main.py
├── llamaconfig
├── scripts/
│ ├── compute_distances.py
│ ├── dataset.py
│ ├── define_morphisms.py
│ ├── eval.py
│ ├── extract_weights.py
│ ├── unifiedeval.py
│ ├── unifiedmodel.py
│ └── view_graph.py
├── convert/
│ │ ├── convert_llama_to_liquid_s4.py
│ │ ├── convert_refined_conversion.py
│ │ ├── convert_llamatomamba.py
│ │ └── convert_llamatotransformer.py
├── eval/
│ ├── unifiedeval.py
│ ├── view_checkpoints.py
│ └── view_graph.py
├── checkpoints/
│ ├── checkpoint_epoch_100.pth
│ ├── checkpoint_epoch_500.pth
│ └── checkpoint_epoch_1000.pth
├── extracted_weights.npy
├── model_graph.gpickle
└── .gitignore
README.md
: Provides an overview and instructions for the project.main.py
: The main script that demonstrates weight transfer between models.
- Contains configuration files for the LLaMA model.
-
convert/
convert_llama_to_liquid_s4.py
: Converts a LLaMA model to a LiquidS4 model.convert_refined_conversion.py
: Refines the weight transfer process between models.convert_llamatomamba.py
: Converts LLaMA model to Mamba architecture.convert_llamatotransformer.py
: Converts LLaMA model to a Transformer architecture.
-
compute_distances.py
: Computes Fisher Information for each checkpoint. -
dataset.py
: Defines synthetic sequence datasets for training and evaluation. -
define_morphisms.py
: Defines morphisms for the model graph. -
eval.py
: Evaluates different models on test datasets. -
extract_weights.py
: Extracts weights from model checkpoints. -
unifiedeval.py
: Unified evaluation script. -
unifiedmodel.py
: Defines theUnifiedModel
integrating various architectures. -
view_graph.py
: Script for viewing the morphism graph.
- Contains additional model conversion scripts as needed.
unifiedeval.py
: Evaluates multiple model architectures on synthetic datasets.view_checkpoints.py
: Extracts and visualizes weights from model checkpoints.view_graph.py
: Visualizes the morphism graph between checkpoints.
visualize_weight_space.py
: Visualizes the convergence of architecture weights using PCA.
- Stores model checkpoint files for different training epochs.
extracted_weights.npy
: Numpy file containing extracted weights from checkpoints.model_graph.gpickle
: Pickle file storing the morphism graph..gitignore
: Specifies files and directories to ignore in Git.
Contributions are welcome! Please follow these steps to contribute:
- Fork the Repository: Click the fork button at the top right of this page.
- Clone Your Fork:
git clone https://github.com/peytontolbert/weights-transfer-playground.git cd weights-transfer-playground
- Create a Branch:
git checkout -b feature/YourFeatureName
- Make Your Changes.
- Commit Your Changes:
git commit -m "Add some feature"
- Push to Your Fork:
git push origin feature/YourFeatureName
- Open a Pull Request: Navigate to the original repository and open a pull request.
Please ensure your contributions adhere to the project’s coding standards and include appropriate tests.
- LLaMA by Facebook Research for the foundational model.
- PyTorch for the deep learning framework.
- Hugging Face Transformers for model and tokenizer utilities.
- NetworkX and Matplotlib for graphing and visualization tools.
- Scikit-learn for evaluation metrics and PCA.
- Fire for creating CLIs.
Feel free to reach out with any questions or suggestions!