Conditionals in EXLA graph create functions with much larger memory footprint #1003

seanmor5 · 2022-12-09T21:30:32Z

I am not sure if this is an XLA bug or our bug. Consider this example in Axon which implements gradient accumulation:

  defnp accumulate_gradients(
          gradients,
          model_state,
          new_state,
          optimizer_state,
          gradient_state,
          gradient_step,
          update_optimizer_fn,
          opts \\ []
        ) do
    opts = keyword!(opts, [:steps])
    steps = opts[:steps]

    # TODO: this explodes the graph
    if Nx.greater_equal(gradient_step, steps - 1) do
      {updates, new_optimizer_state} =
        update_optimizer_fn.(gradients, optimizer_state, model_state)

      new_gradient_state = zeros_like(model_state)
      new_model_state = Axon.Updates.apply_updates(model_state, updates, new_state)
      {new_model_state, new_optimizer_state, new_gradient_state, 0}
    else
      acc_gradients = deep_merge(gradient_state, gradients, fn x, y -> x + y end)
      {model_state, optimizer_state, acc_gradients, gradient_step + 1}
    end
  end

Leaving this as is causes an Axon training loop to OOM with batch size 4 and sequence length 16 (maybe even lower than that), whereas if I remove the conditional logic altogether and just do the update I can run with 4x longer sequences or batch sizes

josevalim · 2022-12-09T22:00:40Z

Does it happen on the GPU or CPU or both? Can you give me a single file Elixir script that reproduces it? :)

seanmor5 · 2022-12-09T22:02:40Z

I looked at the generated expressions and they are definitely much larger for this if, but also this might be a result more specific to my implementation

I will see if it's possible to isolate it further

josevalim · 2022-12-13T08:23:27Z

I believe this is addressed now. :) Please reopen if not.

seanmor5 mentioned this issue Dec 9, 2022

Got OOM message with GTX3060 elixir-nx/bumblebee#101

Closed

josevalim closed this as completed Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditionals in EXLA graph create functions with much larger memory footprint #1003

Conditionals in EXLA graph create functions with much larger memory footprint #1003

seanmor5 commented Dec 9, 2022

josevalim commented Dec 9, 2022

seanmor5 commented Dec 9, 2022

josevalim commented Dec 13, 2022

Conditionals in EXLA graph create functions with much larger memory footprint #1003

Conditionals in EXLA graph create functions with much larger memory footprint #1003

Comments

seanmor5 commented Dec 9, 2022

josevalim commented Dec 9, 2022

seanmor5 commented Dec 9, 2022

josevalim commented Dec 13, 2022