-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 handle MistralTokenizer special case #162
Conversation
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #162 +/- ##
==========================================
- Coverage 61.66% 61.54% -0.13%
==========================================
Files 28 28
Lines 1693 1698 +5
Branches 208 210 +2
==========================================
+ Hits 1044 1045 +1
- Misses 553 556 +3
- Partials 96 97 +1 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @prashantgupta24, looks great just a couple of comments.
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last nit
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @prashantgupta24!
Description
Right now, MistralTokenizer doesn't support the
encode_plus
option. We use theencode
option for now, and throw an error to the user ifreturn_offset
is requested.Also,
mistral
models do NOT selectmistral
as the tokenizer by default.TOKENIZER_MODE=mistral
has to be set in order to do that.How Has This Been Tested?
Merge criteria: