TinyStories SAE

Train a sparse autoencoder using your laptop on this TinyStories model

Docs are here. The rest of this readme is software engineering details.

Installation

With uv

This repo uses uv for packaging,

Install with curl -LsSf https://astral.sh/uv/install.sh | sh
Run scripts using uv run, e.g. uv run src/tiny_stories_sae/train_sae.py -h. The first time you call uv, it will download all the necessary dependencies.

With docker

uv doesn't work well on machines that don't follow the Filesystem Hierarchy Standard (e.g. NixOS). To run uv in this case, use the provided Dockerfile:

Build the image with ./build.sh
Enter the container with ./run.sh. If you have GPUs, instead use ./run.sh --gpus all
To mount a results directory, use ./run.sh -v /absolute/host/path/to/results/:/results
Then inside the container you can run uv run ... as before

Available scripts

train_sae.py

Trains a sparse autoencoder on activations from the language model roneneldan/TinyStories-33M.

Example usage:

uv run src/tiny_stories_sae/train_sae.py \
  --cuda --l1_coefficient 50 \
  --sae_hidden_dim 10000 --max_step 105000

steer.py

This script uses the LM roneneldan/TinyStories-33M to generate text, but it adds a fixed vector (one of the autoencoder's features) to the activations, skewing the text responses towards a certain topic.

Example usage:

uv run src/tiny_stories_sae/steer.py \
  --checkpoint path/to/sparse_autoencoder.pt \
  --which_feature 1 --sae_hidden_dim 10000 \
  --feature_strength 5 --cuda

gather_high_activations.py

This runs the LM on the validation set and tracks how strongly the various autoencoder features activate. It saves a list of validation examples that made the features activate the most.

Example usage:

uv run src/tiny_stories_sae/gather_high_activations.py \
  --checkpoint path/to/sparse_autoencoder.pt \ 
  --cuda --sae_hidden_dim 10000

gather_high_activations_llm.py

The same as gather_high_activations.py, except it instead tracks how strongly the LM's neurons activate.

Example usage:

uv run src/tiny_stories_sae/gather_high_activations_llm.py \
  --cuda --output_file path/to/log.json --make_positive_by abs

call_openai.py

Given a log file produced by gather_high_activations.py or gather_high_activations_llm.py, this script sends the examples to GPT-4, and asks GPT-4 to look for a pattern (for each feature/neuron separately) and judge how clear the pattern is. This requires an OPENAI_API_KEY in .env.

Example usage:

uv run src/tiny_stories_sae/call_openai.py \
  --feature_lower 0 --feature_upper 100 \
  --path_to_feature_strengths path/to/log.json

plot.py and combined_plot.py

Given GPT-4's ratings, these scripts plots them. combined_plot.py can graph multiple ratings in different colors.

Example usage:

uv run src/tiny_stories_sae/plot.py \
  --response_json results/gpt4_api/20241001-123456 \
  --xlabel "Clearness (5 is most clear)" \
  --title "GPT-4o's ranking of 100 sparse autoencoder features"

and

uv run src/tiny_stories_sae/combined_plot.py \
  --response_jsons results/1.json results/2.json \
  --labels "1" "2"

Running tests

uv run pytest tests

Name		Name	Last commit message	Last commit date
Latest commit History 482 Commits
.github/workflows		.github/workflows
docs		docs
src/tiny_stories_sae		src/tiny_stories_sae
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
docker_name		docker_name
pyproject.toml		pyproject.toml
run.sh		run.sh
run_tb.sh		run_tb.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyStories SAE

Installation

With uv

With docker

Available scripts

train_sae.py

steer.py

gather_high_activations.py

gather_high_activations_llm.py

call_openai.py

plot.py and combined_plot.py

Running tests

About

Releases

Packages

Languages

License

TheodoreEhrenborg/tiny_stories_sae

Folders and files

Latest commit

History

Repository files navigation

TinyStories SAE

Installation

With uv

With docker

Available scripts

train_sae.py

steer.py

gather_high_activations.py

gather_high_activations_llm.py

call_openai.py

plot.py and combined_plot.py

Running tests

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages