Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 handle MistralTokenizer special case #162

Merged
merged 5 commits into from
Oct 12, 2024
Merged

Conversation

prashantgupta24
Copy link
Contributor

@prashantgupta24 prashantgupta24 commented Oct 11, 2024

Description

Right now, MistralTokenizer doesn't support the encode_plus option. We use the encode option for now, and throw an error to the user if return_offset is requested.

Also, mistral models do NOT select mistral as the tokenizer by default. TOKENIZER_MODE=mistral has to be set in order to do that.

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@codecov-commenter
Copy link

codecov-commenter commented Oct 11, 2024

Codecov Report

Attention: Patch coverage is 60.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 61.54%. Comparing base (44cc0d5) to head (1fc31cd).

Files with missing lines Patch % Lines
src/vllm_tgis_adapter/grpc/grpc_server.py 55.55% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #162      +/-   ##
==========================================
- Coverage   61.66%   61.54%   -0.13%     
==========================================
  Files          28       28              
  Lines        1693     1698       +5     
  Branches      208      210       +2     
==========================================
+ Hits         1044     1045       +1     
- Misses        553      556       +3     
- Partials       96       97       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
@prashantgupta24 prashantgupta24 changed the title WIP mistral tokenizer fix 🐛 handle MistralTokenizer special case Oct 11, 2024
Copy link
Contributor

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @prashantgupta24, looks great just a couple of comments.

src/vllm_tgis_adapter/grpc/grpc_server.py Outdated Show resolved Hide resolved
src/vllm_tgis_adapter/grpc/grpc_server.py Outdated Show resolved Hide resolved
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Copy link
Contributor

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last nit

src/vllm_tgis_adapter/grpc/grpc_server.py Outdated Show resolved Hide resolved
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Copy link
Contributor

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill njhill added this pull request to the merge queue Oct 12, 2024
Merged via the queue into main with commit fa9b930 Oct 12, 2024
3 checks passed
@njhill njhill deleted the mistral-tokenizer-fix branch October 12, 2024 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants