Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weirdness with max tokens and openrouter #1

Closed
estiens opened this issue Jan 25, 2025 · 4 comments
Closed

weirdness with max tokens and openrouter #1

estiens opened this issue Jan 25, 2025 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@estiens
Copy link

estiens commented Jan 25, 2025

Describe the bug
I haven't dug into it yet, but I nearly always get an error that MAX_TOKENS must be between 1-4000 when I am using an open router model (even if I have them not set at all, and sometimes I have to set them OVER 4000 to get them to work) and I can nearly never include any context, but I also can't include 0 context or that is a for sure error.

I'm not sure if it is happening with open_ai and anthropic, I can check, but not sure where it gets set as many of the openai models I use have like giant context histories...as a feature request, it might be nice to truncate the conversation history to just the responses for something like regularly getting summaries of areas because otherwise we are re-sending in the same giant blob of states, etc for the entire history (but not problem to do that manually and just leave conversation history out of it)

my only clue is that it always says between 1-4000 when I get the error. Also it doesn't seem to matter what I set, just set it to 3000 to finally get one to work and then the amount of tokens I sent over was like 16k or something...so it doesn't actually seem to be limiting on the way out...I guess personally I'd rather the endpoint return an error if sending over too many tokens than try to precalculate it, especially for openrouter as different models can have wildly different context sizes

As I said, didn't dig in but it makes it super finicky to use at the moment, kind of blindly try different max_token sizes and different context history sizes until something goes through

To Reproduce

Set up an open router model and try not passing in max_tokens, and use 1k, 4k, and 10k with different context lengths

Expected behavior
Max tokens is pulled from the model being used (not sure openrouter returns that though?) and or can override with your own max_tokens rather than getting an error when you set it above 4000. Doesn't look like the requests got sent across the wire so pretty sure they were short circuited before trying.

@estiens estiens added the bug Something isn't working label Jan 25, 2025
@smkrv
Copy link
Owner

smkrv commented Jan 25, 2025

@estiens,

Thank you for such a detailed bug report and great suggestions! All three proposed solutions make perfect sense.

I'll work on implementing these core improvements within the ≈next week and release an updated version. Your thorough explanation and reproduction steps are incredibly helpful for addressing this issue effectively.

Sérgio

@smkrv smkrv self-assigned this Jan 25, 2025
smkrv pushed a commit that referenced this issue Jan 28, 2025
- Completely reworked token handling mechanism
- Removed custom token calculation logic
- Direct max_tokens passing to LLM APIs
- Added support for DeepSeek provider
- Integrated deepseek-chat and deepseek-reasoner models

Thanks to @estiens for reporting token handling issues and providing valuable feedback (#1).
@smkrv
Copy link
Owner

smkrv commented Jan 28, 2025

@estiens,

Great news! I've implemented the fix for the max_tokens handling issue you reported. The changes are now available in the latest release.

Key changes:

  • Removed the pre-calculation of tokens that was causing the artificial limits
  • Now passing max_tokens directly to the LLM API
  • Simplified the token handling logic

This means:

  1. No more arbitrary 4000 token limit
  2. Token limits are now handled natively by each model's API
  3. More reliable operation with OpenRouter and other providers
  4. Better support for models with large context windows

Thank you again for bringing this up and providing such detailed feedback. Your report helped make the integration more robust and flexible.

Please try the latest version and let me know if you encounter any issues.

Best regards,
Sérgio

@smkrv smkrv closed this as completed Jan 28, 2025
@estiens
Copy link
Author

estiens commented Jan 30, 2025

thanks for fixing this and glad you could understand my ramblings, when I reread my report I was like you should re-write that and make that clearer! Testing now

@smkrv
Copy link
Owner

smkrv commented Jan 30, 2025

@estiens,

You're very welcome! Thank you for taking the time to test it and for your kind words - your detailed report was incredibly helpful in addressing the issue. If you encounter anything else or have further suggestions, please don't hesitate to reach out. Your feedback is always appreciated!

Sérgio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants