Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Frontend][Feature] Add jamba tool parser #9154

Merged
merged 22 commits into from
Oct 18, 2024

Conversation

tomeras91
Copy link
Contributor

Add JambaToolParser to support tool calling for ai21labs/AI21-Jamba-1.5-Mini and ai21labs/AI21-Jamba-1.5-Large

Copy link

github-actions bot commented Oct 8, 2024

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@DarkLight1337
Copy link
Member

@K-Mistele do you have time to review this?

@K-Mistele
Copy link
Contributor

@K-Mistele do you have time to review this?

Yes, happy to take a look!

Copy link
Collaborator

@mgoin mgoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable to me but I will wait on Kyle to review

Comment on lines 84 to 85
# TODO: edit comment
# use a regex to find the tool call between the tags
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Removed it now.
Thanks for catching this

@tomeras91
Copy link
Contributor Author

@K-Mistele do you have time to review this?

Yes, happy to take a look!

Ping? Anything I can do to help this getting merged?

Copy link
Contributor

@K-Mistele K-Mistele left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! just a couple thoughts:

  1. Right now, tool use tests are already standardized across models in tests/tool_use, you can add a config for Jamba in vllm/tests/tool_use/utils.py so that the same tests that are run on other models are run on it. This is preferred since it keeps tool testing standardized to make sure that all models pass the same tests, rather than defining custom tests for each model
  2. Please update the docs at docs/source/serving/openai_compatible_server.md with the relevant information for jamba - you'll see that Llama 3.1, Hermes, Mistral, and InternLM already have entries here, so you should be able to use the same template.

@tomeras91
Copy link
Contributor Author

tomeras91 commented Oct 15, 2024

@K-Mistele

  1. RE tests - that was my initial thought as well. Problem is even Jamba-1.5-Mini is a pretty large model with ~52B params. It will take a long time to download every time the unittest is run. Moreover, even with quantization it needs an 80GB GPU, and AFAIU tests currently run on L4 GPUs.. so we'll have issues with memory as well. The Jamba-tiny-dev model used for other Jamba tests is a pretrained model and doesn't support tool calling.
    Other than the model size issues, the end2end behavior of tool calling is already tested in tool calling tests of other models. Also, that end2end logic hasn't changed in this PR. The only logic this PR adds is the JambaToolParser logic. And conveniently enough (AKA good code design by you guys), it can be thoroughly tested without initializing a model, as I did in tests/tool_use/test_jamba_tool_parser.py. All it needs is the tokenizer and a raw string treated as the model output.

  2. RE docs - sure! I'll update them soon

@K-Mistele
Copy link
Contributor

@K-Mistele

  1. RE tests - that was my initial thought as well. Problem is even Jamba-1.5-Mini is a pretty large model with ~52B params. It will take a long time to download every time the unittest is run. Moreover, even with quantization it needs an 80GB GPU, and AFAIU tests currently run on L4 GPUs.. so we'll have issues with memory as well. The Jamba-tiny-dev model used for other Jamba tests is a pretrained model and doesn't support tool calling.
    Other than the model size issues, the end2end behavior of tool calling is already tested in tool calling tests of other models. Also, that end2end logic hasn't changed in this PR. The only logic this PR adds is the JambaToolParser logic. And conveniently enough (AKA good code design by you guys), it can be thoroughly tested without initializing a model, as I did in tests/tool_use/test_jamba_tool_parser.py. All it needs is the tokenizer and a raw string treated as the model output.
  2. RE docs - sure! I'll update them soon

Sounds good! I didn't realize the model was so big. I will defer to @mgoin @DarkLight1337 then.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your patience!

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 18, 2024
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) October 18, 2024 02:49
@DarkLight1337 DarkLight1337 merged commit d2b1bf5 into vllm-project:main Oct 18, 2024
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants