Initial kernel changes to support GaLore #1137

matthewdouglas · 2024-03-18T23:46:45Z

This is a draft containing some of the initial changes to support GaLore. So far this covers 2-state optimizers.

Optimizer2State.update_step() now contains an additional argument return_updates. When provided a tensor to hold the updates, they're returned here and p is not changed. Additionally, no weight decay is applied.

Needs tests, feedback welcome.

cc: @TimDettmers @jiaweizzhao @Titus-von-Koeller

github-actions · 2024-03-18T23:50:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

matthewdouglas · 2024-03-19T00:28:10Z

bitsandbytes/optim/adamw.py

+                self.prefetch_state(p)
+
+                if "rank" in group:
+                    self.update_step(group, p, gindex, pindex, return_updates=lor_update)


The main addition in this PR is the new return_updates kwarg. This will give us the update from AdamW in lor_update and p.data will not be changed.

Corresponds to this step in Algorithm 1 from the paper:
lor_update = update(lor_grad)

matthewdouglas · 2024-03-19T00:29:41Z

bitsandbytes/optim/adamw.py

+                    self.update_step(group, p, gindex, pindex, return_updates=lor_update)
+
+                    # GaLore Projection Back
+                    p.data.add_(state["projector"].project_back(lor_update))


From Algorithm 1 in the paper:

update = project_back(lor_update) weight.data += update

Titus-von-Koeller · 2024-04-05T10:20:02Z

@matthewdouglas Tim said he could review your work this weekend.

matthewdouglas · 2024-04-07T13:20:58Z

Updated with changes added for 1-state optimizers (Momentum, RMSProp, Adagrad, Lion).

TimDettmers

This looks like a solid straightforward implementation. Good work, @matthewdouglas!

I wanted to overhaul the optimizers since changing or adding implementations is a pain when everything is separated into 1state and 2state optimizers. You probably encountered this too, Matthew. However, probably it is okay to keep it like this for the time being and refactor if we add another change.

Separating the update computation and the actual update in general could also have benefits to implement some new optimizers more easily. But I think we can leave that to future work and favor getting Galore out quickly together with the QLoRA fix.

The last remaining thing would be testing. The original tests are all green, but what would be good is a galore test. The best would be to use the original repo code and test it against the bitsandbytes implementation.

Steps for that would be to add galore-torch to the dev dependencies and only execute the tests when the dependencies are met to prevent other devs from needing to run this if they do not have galore-torch installed.

Otherwise, the tests can mirror the other tests that already exist and check if the gradients are close to each other. For that you can probably just add the original galore and your galore to the dictionary of optimizers in test_optim.py and see if the errors are approximately similar compared to other optimizer comparison.

Let me know if you have any other concerns with this, but I think with a test this is all ready to go.

jiaweizzhao · 2024-09-27T23:46:25Z

Hi @matthewdouglas, thanks for your great effort! I would like to follow up this PR and finalize our integration as soon as possible.

To test galore benchmark easily, I created a new branch: https://github.com/jiaweizzhao/GaLore/tree/bitsandbytes, where I integrated your GaLore implementation into the most recent GaLoreAdamW8bit: https://github.com/jiaweizzhao/GaLore/blob/bitsandbytes/galore_torch/adamw8bit.py

For environments, I installed both your bitsandbytes and modified GaLore locally. However, I tried to test a baseline but the following error comes up:

175 [rank0]: File "/data/home/jwzhao/.conda/envs/galore_new/lib/python3.8/site-packages/torch/optim/optimizer.py", line 484, in wrapper
176 [rank0]: out = func(*args, **kwargs)
177 [rank0]: File "/data/home/jwzhao/.conda/envs/galore_new/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
178 [rank0]: return func(*args, **kwargs)
179 [rank0]: File "/opt/hpcaas/.mounts/fs-0565f60d669b6a2d3/home/jwzhao/projects/bitsandbytes/bitsandbytes/optim/optimizer.py", line 288, in step
180 [rank0]: self.update_step(group, p, gindex, pindex)
181 [rank0]: File "/data/home/jwzhao/.conda/envs/galore_new/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
182 [rank0]: return func(*args, **kwargs)
183 [rank0]: File "/opt/hpcaas/.mounts/fs-0565f60d669b6a2d3/home/jwzhao/projects/bitsandbytes/bitsandbytes/optim/optimizer.py", line 552, in update_step
184 [rank0]: F.optimizer_update_8bit_blockwise(
185 [rank0]: File "/opt/hpcaas/.mounts/fs-0565f60d669b6a2d3/home/jwzhao/projects/bitsandbytes/bitsandbytes/functional.py", line 1789, in optimizer_update_8bit_blockwise
186 [rank0]: and len(str2optimizer8bit_blockwise[optimizer_name]) == 3
187 [rank0]: NameError: name 'str2optimizer8bit_blockwise' is not defined

Do you have any ideas why it occurred? Seems it only happen in this old PR. I also tried latest bitsandbytes with regular GaLore and it works.

matthewdouglas · 2024-10-01T14:13:43Z

Thanks @jiaweizzhao! This indicates there was a problem loading the CUDA library. Were you able to build this part on this branch?

pip install -r requirements-dev.txt
cmake -DCOMPUTE_BACKEND=cuda -S .
cmake --build .
pip install -e .

If you try python -m bitsandbytes you may have a better idea of why it did not load.

I plan to follow up shortly and rebase with main!

jiaweizzhao · 2024-10-04T22:48:48Z

Not sure why I can not correctly load CUDA library using your scripts @matthewdouglas. Maybe it is due to the machine I am using. Could you actually try to run a simple galore benchmark on your end? I have packed everything in this branch: https://github.com/jiaweizzhao/GaLore/tree/bitsandbytes. Once installed, you can simply run sh scripts/verify_bitsandbytes/llama_60m_galore_adam8bit_new.sh to verify if the new galore_adam8bit works. Once it works I will made the changes across all optimizers.

jiaweizzhao · 2024-10-04T22:53:40Z

Seems the problem is I have no root access to compile from source (with pip install -e .) on my machine. Another way is if you could give me a complied galore version bitsandbytes package I can also try from my end @matthewdouglas

Initial kernel changes for 2-state optimizers to support GaLore

032ac2e

matthewdouglas added 4 commits March 18, 2024 20:05

Experimental implementation for bnb.optim.GaLoreAdamW8bit

5b9891b

Cleanup

bede563

Fix mistake

4fdb4d3

One more time

eceed12

matthewdouglas commented Mar 19, 2024

View reviewed changes

matthewdouglas added 2 commits April 5, 2024 09:47

Merge branch 'TimDettmers:main' into galore

16cc220

Support eturn_outputs buffer option for 1-state optimizers

91ea416

TimDettmers marked this pull request as ready for review July 17, 2024 23:35

TimDettmers reviewed Jul 18, 2024

View reviewed changes

Titus-von-Koeller force-pushed the main branch 2 times, most recently from 9b72679 to 7800734 Compare July 27, 2024 13:16

matthewdouglas added the enhancement New feature or request label Aug 15, 2024

matthewdouglas self-assigned this Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial kernel changes to support GaLore #1137

Initial kernel changes to support GaLore #1137

matthewdouglas commented Mar 18, 2024 •

edited

Loading

github-actions bot commented Mar 18, 2024

matthewdouglas Mar 19, 2024

matthewdouglas Mar 19, 2024

Titus-von-Koeller commented Apr 5, 2024

matthewdouglas commented Apr 7, 2024

TimDettmers left a comment

jiaweizzhao commented Sep 27, 2024 •

edited

Loading

matthewdouglas commented Oct 1, 2024

jiaweizzhao commented Oct 4, 2024

jiaweizzhao commented Oct 4, 2024

Initial kernel changes to support GaLore #1137

Are you sure you want to change the base?

Initial kernel changes to support GaLore #1137

Conversation

matthewdouglas commented Mar 18, 2024 • edited Loading

github-actions bot commented Mar 18, 2024

matthewdouglas Mar 19, 2024

Choose a reason for hiding this comment

matthewdouglas Mar 19, 2024

Choose a reason for hiding this comment

Titus-von-Koeller commented Apr 5, 2024

matthewdouglas commented Apr 7, 2024

TimDettmers left a comment

Choose a reason for hiding this comment

jiaweizzhao commented Sep 27, 2024 • edited Loading

matthewdouglas commented Oct 1, 2024

jiaweizzhao commented Oct 4, 2024

jiaweizzhao commented Oct 4, 2024

matthewdouglas commented Mar 18, 2024 •

edited

Loading

jiaweizzhao commented Sep 27, 2024 •

edited

Loading