PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)
-
Updated
May 4, 2024 - Python
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)
One-stop solutions for Mixture of Experts and Mixture of Depth modules in PyTorch.
The open-source Mixture of Depths code and the official implementation of the paper "Router-Tuning: A Simple and Effective Approach for Enabling Dynamic Depth in Transformers."
Google DeepMind: Mixture of Depths Unofficial Implementation.
Add a description, image, and links to the mixture-of-depths topic page so that developers can more easily learn about it.
To associate your repository with the mixture-of-depths topic, visit your repo's landing page and select "manage topics."