A PyTorch implementation of Hutchins & Schlag et al.. Owes very much to Phil Wang's x-transformers. Very much in-progress.
Dockerfile, requirements.txt, and environment.yaml because I love chaos.
- Keys and values are not shared between the "vertical" and "horizontal" directions (the standard input -> output information flow and the recurrent state flow, respectively).
- The state vectors are augmented with Rotary Embeddings for positional encoding, instead of using learned embeddings.
- The special LSTM gate initialization is not yet implemented.