Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: install additional fms-acceleration plugins #350

Merged
merged 6 commits into from
Sep 26, 2024

Conversation

anhuong
Copy link
Collaborator

@anhuong anhuong commented Sep 25, 2024

Description of the change

Users of the image will be able to automatically use padding free, multipack, and fast kernels via the fms-acceleration plugins.

Related issue number

NA

How to verify the PR

Tested the installation and running tuning with and without the flags. Just because they are installed does not mean they are enabled, the user must still pass the necessary flags/configs.

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

Signed-off-by: Anh Uong <anh.uong@ibm.com>
Signed-off-by: Anh Uong <anh.uong@ibm.com>
Copy link

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions bot added the build label Sep 25, 2024
Signed-off-by: Anh Uong <anh.uong@ibm.com>
@Ssukriti Ssukriti requested review from fabianlim and removed request for alex-jw-brooks September 25, 2024 21:35
@@ -671,6 +672,16 @@ Notes:
- works only for *multi-gpu*.
- currently only includes the version of *multipack* optimized for linear attention implementations like *flash-attn*.

Note: To pass the above flags via a JSON config, each of the flags expects the value to be a mixed type list, so the values must be a list. For example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in line 653: - attention_and_distributed_packing (experimental) we have mentioned it as experimental, but we are talking about releasing it to product with openshift 2.14, is it still experimental or ready for release @fabianlim @anhuong

Copy link
Collaborator Author

@anhuong anhuong Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I can mark these as ready in this PR as well and no longer experimental from earlier conversation with Fabian. Will wait on @fabianlim to review as well

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Padding free is already upstreamed to HF main. Instruct lab is using multipack, and this has been tested for up to about 500K samples in the dataset. Beyond that, I am not aware of the speed performance of multipack, as it runs through the lengths of each example before the start of every epoch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any issue to including these new plugins into product if the fused-op-and-kernels plugin uses Apache 2.0 license but (contains extracted code) from unsloth?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anhuong yes that is a good point thanks for bringing this up.

  • unsloth is Apache 2.0, but we were disturbed by those "comments" peppered in the code.
  • we only extracted part of the unsloth code, and we did the extraction on a version that existed before those "comments" appeared (as far as we could tell)
  • all extracted portions contained the relevant License Notice headers credited to the owners of unsloth

Beyond what we have done, I am not any more knowledgable to say what is permissible and what is not. This requires a person knowledgable in these things to run through.

The peft plugin also contains a triton-only extraction of the ModelCloud fork of AutoGPTQ https://github.com/foundation-model-stack/fms-acceleration/tree/main/plugins/accelerated-peft#gptq-loras-autogptq---current-implementation-vs-legacy-implementation. The fork is released as Apache 2.0

Copy link

@wynterl wynterl Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anhuong Code scan should pass with no issue as regarding the inclusion of the new plug-ins, and as noted by @fabianlim unsloth is apache 2.0.

@anhuong anhuong changed the title build: install fms-acceleration plugins to enable padding free, multipack, and fast kernels build: install additional fms-acceleration plugins Sep 25, 2024
fabianlim
fabianlim previously approved these changes Sep 25, 2024
Copy link
Collaborator

@fabianlim fabianlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. ILAB training uses multipack, so it to some extent quite ready, but see my comment

Signed-off-by: Anh Uong <anh.uong@ibm.com>
Signed-off-by: Anh Uong <anh.uong@ibm.com>
Signed-off-by: Anh Uong <anh.uong@ibm.com>
@anhuong anhuong merged commit 1350f8a into foundation-model-stack:main Sep 26, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants