-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
weirdness with max tokens and openrouter #1
Comments
Thank you for such a detailed bug report and great suggestions! All three proposed solutions make perfect sense. I'll work on implementing these core improvements within the ≈next week and release an updated version. Your thorough explanation and reproduction steps are incredibly helpful for addressing this issue effectively. Sérgio |
- Completely reworked token handling mechanism - Removed custom token calculation logic - Direct max_tokens passing to LLM APIs - Added support for DeepSeek provider - Integrated deepseek-chat and deepseek-reasoner models Thanks to @estiens for reporting token handling issues and providing valuable feedback (#1).
Great news! I've implemented the fix for the max_tokens handling issue you reported. The changes are now available in the latest release. Key changes:
This means:
Thank you again for bringing this up and providing such detailed feedback. Your report helped make the integration more robust and flexible. Please try the latest version and let me know if you encounter any issues. Best regards, |
thanks for fixing this and glad you could understand my ramblings, when I reread my report I was like you should re-write that and make that clearer! Testing now |
You're very welcome! Thank you for taking the time to test it and for your kind words - your detailed report was incredibly helpful in addressing the issue. If you encounter anything else or have further suggestions, please don't hesitate to reach out. Your feedback is always appreciated! Sérgio |
Describe the bug
I haven't dug into it yet, but I nearly always get an error that MAX_TOKENS must be between 1-4000 when I am using an open router model (even if I have them not set at all, and sometimes I have to set them OVER 4000 to get them to work) and I can nearly never include any context, but I also can't include 0 context or that is a for sure error.
I'm not sure if it is happening with open_ai and anthropic, I can check, but not sure where it gets set as many of the openai models I use have like giant context histories...as a feature request, it might be nice to truncate the conversation history to just the responses for something like regularly getting summaries of areas because otherwise we are re-sending in the same giant blob of states, etc for the entire history (but not problem to do that manually and just leave conversation history out of it)
my only clue is that it always says between 1-4000 when I get the error. Also it doesn't seem to matter what I set, just set it to 3000 to finally get one to work and then the amount of tokens I sent over was like 16k or something...so it doesn't actually seem to be limiting on the way out...I guess personally I'd rather the endpoint return an error if sending over too many tokens than try to precalculate it, especially for openrouter as different models can have wildly different context sizes
As I said, didn't dig in but it makes it super finicky to use at the moment, kind of blindly try different max_token sizes and different context history sizes until something goes through
To Reproduce
Set up an open router model and try not passing in max_tokens, and use 1k, 4k, and 10k with different context lengths
Expected behavior
Max tokens is pulled from the model being used (not sure openrouter returns that though?) and or can override with your own max_tokens rather than getting an error when you set it above 4000. Doesn't look like the requests got sent across the wire so pretty sure they were short circuited before trying.
The text was updated successfully, but these errors were encountered: