Uncover hidden geometry in Transformers
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
To check the results of Llama2, you need to download the Llama2 checkpoints and place it under src/llama2-ckpt
.
Run the following commands to prepare a subset of the corpus (choose from openwebtext, wikitext, github).
Edit the DATASETS
entry in scripts/data_corpus.sh
to specify. The generated datasets will be placed under src/data/
.
chmod +x ./scripts/data_corpus.sh
./scripts/data_corpus.sh
Perform the DATASETS
entry in scripts/data_corpus.sh
to specify the corpus or the models.
chmod +x ./scripts/decompositions.sh
./scripts/decompositions.sh
Run the following commands to train NanoGPT (6L6H384D) on character-level Shakespeare dataset.
python src/train.py conf/train_nanogpt.py
Run the following commands to train a Transformer (8L8H512D) with various levels of randomizations in the inputs.
chmod +x scripts/train_randomization.sh
./scripts/train_randomization.sh
Run the following commands to train a Transformer (8L8H512D) on an arithmetic task.
chmod +x scripts/train_randomization.sh
./scripts/train_addition.sh
The directory notebooks/
contains jupyter notebooks that reproduce the figures and tables in the paper. Be aware that they may depend on datasets or checkpoints that requires generation in advance.