Train a sparse autoencoder using your laptop on this TinyStories model
Docs are here. The rest of this readme is software engineering details.
This repo uses uv for packaging,
- Install with
curl -LsSf https://astral.sh/uv/install.sh | sh
- Run scripts using
uv run
, e.g.uv run src/tiny_stories_sae/train_sae.py -h
. The first time you call uv, it will download all the necessary dependencies.
uv doesn't work well on machines that don't follow that Filesystem Hierarchy Standard (e.g. NixOS). To run uv in this case, use the provided Dockerfile:
- Build the image with
./build.sh
- Enter the container with
./run.sh
. If you have GPUs, instead use./run.sh --gpus all
- To mount a results directory, use
./run.sh -v /absolute/host/path/to/results/:/results
- Then inside the container you can run
uv run ...
as before
Trains a sparse autoencoder
on activations from the language model roneneldan/TinyStories-33M
.
Example usage:
uv run src/tiny_stories_sae/train_sae.py \
--cuda --l1_coefficient 50 \
--sae_hidden_dim 10000 --max_step 105000
This script uses the LM roneneldan/TinyStories-33M
to generate text,
but it adds a fixed vector (one of the autoencoder's features)
to the activations,
skewing the text responses towards a certain topic.
Example usage:
uv run src/tiny_stories_sae/steer.py \
--checkpoint path/to/sparse_autoencoder.pt \
--which_feature 1 --sae_hidden_dim 10000 \
--feature_strength 5 --cuda
This runs the LM on the validation set and tracks how strongly the various autoencoder features activate. It saves a list of validation examples that made the features activate the most.
Example usage:
uv run src/tiny_stories_sae/gather_high_activations.py \
--checkpoint path/to/sparse_autoencoder.pt \
--cuda --sae_hidden_dim 10000
The same as gather_high_activations.py
,
except it instead tracks how strongly
the LM's neurons activate.
Example usage:
uv run src/tiny_stories_sae/gather_high_activations_llm.py \
--cuda --output_file path/to/log.json --make_positive_by abs
Given a log file produced by
gather_high_activations.py
or gather_high_activations_llm.py
,
this script sends the examples to GPT-4,
and asks GPT-4 to look for a pattern
(for each feature/neuron separately)
and judge how clear the pattern is.
This requires an OPENAI_API_KEY
in .env
.
Example usage:
uv run src/tiny_stories_sae/call_openai.py \
--feature_lower 0 --feature_upper 100 \
--path_to_feature_strengths path/to/log.json
Given GPT-4's ratings, these scripts plots them.
combined_plot.py
can graph multiple ratings in different colors.
Example usage:
uv run src/tiny_stories_sae/plot.py \
--response_json results/gpt4_api/20241001-123456 \
--xlabel "Clearness (5 is most clear)" \
--title "GPT-4o's ranking of 100 sparse autoencoder features"
and
uv run src/tiny_stories_sae/combined_plot.py \
--response_jsons results/1.json results/2.json \
--labels "1" "2"
uv run pytest tests