Flax T5 models (at least) should use scan over layers technique #27418

colehaus · 2023-11-09T22:27:09Z

Feature request

See the technique described here and here. The essence is using a JAX scan instead of a python loop to iterate over layers that have the same structure.

Motivation

The scan over layers technique allows JAX to "see" that the computational structure of each iteration is the same. This can dramatically reduce compile time and also system memory occupied by the JAX compilation cache (i.e. I believe if you have 25 layers in a model, the naive approach will end up with ~25 times as much JIT-compiled code since each layer will result in duplicative output code). My handwritten T5-like model uses ~1/50th of the system memory of the transformers Flax T5 models of similar size. It's easy to get system OOM errors with the current Flax implementation if you end up with multiple versions of the model compiled for different sequence lengths.

Your contribution

It's possible I could submit a PR for this at some point in the future, but I can't be certain.

The text was updated successfully, but these errors were encountered:

amyeroberts · 2023-11-10T13:43:44Z

cc @sanchit-gandhi

sanchit-gandhi · 2023-11-22T18:16:44Z

Hey @colehaus! Sorry for the late reply here. We've currently decided not to implement scan for the Flax models in Transformers. You can see a brief reason for this here: #24587 (comment)

Happy to re-open the conversation if you feel strongly about this! There was a WIP PR that shows how this could be done generally for Transformers models here: #18341

But currently I tend to view scan as a specific feature that can be built on top of the Transformers library by advanced users who require it.

github-actions · 2023-12-26T08:03:59Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Jan 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flax T5 models (at least) should use scan over layers technique #27418

Flax T5 models (at least) should use scan over layers technique #27418

colehaus commented Nov 9, 2023

amyeroberts commented Nov 10, 2023

sanchit-gandhi commented Nov 22, 2023 •

edited

Loading

github-actions bot commented Dec 26, 2023

Flax T5 models (at least) should use scan over layers technique #27418

Flax T5 models (at least) should use scan over layers technique #27418

Comments

colehaus commented Nov 9, 2023

Feature request

Motivation

Your contribution

amyeroberts commented Nov 10, 2023

sanchit-gandhi commented Nov 22, 2023 • edited Loading

github-actions bot commented Dec 26, 2023

sanchit-gandhi commented Nov 22, 2023 •

edited

Loading