GitHub - jonmorton/tart: training autoregressive transformers

Training AutoRegresive Transformers

More capable than nanoGPT, just as much fun! This is my library for experimenting with transformers. I am particularly interested in exploring byte level, tokenizer-free, heirarchical autoregressive models.

Uses flash attention v2 for maximum speeed.

Included model types:

gpt2: Vanilla gpt2 architecture
ibt (improved baseline transformer): achieves lower validation loss than vanilla gpt2 with similar compute and memory requirements, by incorporating some of the latest tricks like rotary embeddings, time shifting, geglu, improved initialization, rmsnorm, and sliding window attention.
hourglass (WIP): hourglass transformers for efficient character level modeling with long context window. Still working to achieve similar perf/compute/memory as above models.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.devcontainer		.devcontainer
.vscode		.vscode
configs		configs
data		data
tart		tart
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bench.py		bench.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

jonmorton/tart

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages