weirdness with max tokens and openrouter #1

estiens · 2025-01-25T00:02:26Z

Describe the bug
I haven't dug into it yet, but I nearly always get an error that MAX_TOKENS must be between 1-4000 when I am using an open router model (even if I have them not set at all, and sometimes I have to set them OVER 4000 to get them to work) and I can nearly never include any context, but I also can't include 0 context or that is a for sure error.

I'm not sure if it is happening with open_ai and anthropic, I can check, but not sure where it gets set as many of the openai models I use have like giant context histories...as a feature request, it might be nice to truncate the conversation history to just the responses for something like regularly getting summaries of areas because otherwise we are re-sending in the same giant blob of states, etc for the entire history (but not problem to do that manually and just leave conversation history out of it)

my only clue is that it always says between 1-4000 when I get the error. Also it doesn't seem to matter what I set, just set it to 3000 to finally get one to work and then the amount of tokens I sent over was like 16k or something...so it doesn't actually seem to be limiting on the way out...I guess personally I'd rather the endpoint return an error if sending over too many tokens than try to precalculate it, especially for openrouter as different models can have wildly different context sizes

As I said, didn't dig in but it makes it super finicky to use at the moment, kind of blindly try different max_token sizes and different context history sizes until something goes through

To Reproduce

Set up an open router model and try not passing in max_tokens, and use 1k, 4k, and 10k with different context lengths

Expected behavior
Max tokens is pulled from the model being used (not sure openrouter returns that though?) and or can override with your own max_tokens rather than getting an error when you set it above 4000. Doesn't look like the requests got sent across the wire so pretty sure they were short circuited before trying.

smkrv · 2025-01-25T18:32:38Z

@estiens,

Thank you for such a detailed bug report and great suggestions! All three proposed solutions make perfect sense.

I'll work on implementing these core improvements within the ≈next week and release an updated version. Your thorough explanation and reproduction steps are incredibly helpful for addressing this issue effectively.

Sérgio

@estiens

- Completely reworked token handling mechanism - Removed custom token calculation logic - Direct max_tokens passing to LLM APIs - Added support for DeepSeek provider - Integrated deepseek-chat and deepseek-reasoner models Thanks to @estiens for reporting token handling issues and providing valuable feedback (#1).

smkrv · 2025-01-28T13:21:06Z

@estiens,

Great news! I've implemented the fix for the max_tokens handling issue you reported. The changes are now available in the latest release.

Key changes:

Removed the pre-calculation of tokens that was causing the artificial limits
Now passing max_tokens directly to the LLM API
Simplified the token handling logic

This means:

No more arbitrary 4000 token limit
Token limits are now handled natively by each model's API
More reliable operation with OpenRouter and other providers
Better support for models with large context windows

Thank you again for bringing this up and providing such detailed feedback. Your report helped make the integration more robust and flexible.

Please try the latest version and let me know if you encounter any issues.

Best regards,
Sérgio

estiens · 2025-01-30T02:07:10Z

thanks for fixing this and glad you could understand my ramblings, when I reread my report I was like you should re-write that and make that clearer! Testing now

smkrv · 2025-01-30T15:19:13Z

@estiens,

You're very welcome! Thank you for taking the time to test it and for your kind words - your detailed report was incredibly helpful in addressing the issue. If you encounter anything else or have further suggestions, please don't hesitate to reach out. Your feedback is always appreciated!

Sérgio

estiens added the bug Something isn't working label Jan 25, 2025

smkrv self-assigned this Jan 25, 2025

smkrv closed this as completed Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weirdness with max tokens and openrouter #1

weirdness with max tokens and openrouter #1

estiens commented Jan 25, 2025

smkrv commented Jan 25, 2025

smkrv commented Jan 28, 2025

estiens commented Jan 30, 2025

smkrv commented Jan 30, 2025

weirdness with max tokens and openrouter #1

weirdness with max tokens and openrouter #1

Comments

estiens commented Jan 25, 2025

smkrv commented Jan 25, 2025

smkrv commented Jan 28, 2025

estiens commented Jan 30, 2025

smkrv commented Jan 30, 2025