Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: See what the reasoning models are thinking before they give their output. #3086

Open
V4G4X opened this issue Jan 31, 2025 · 14 comments
Labels
question Further information is requested

Comments

@V4G4X
Copy link

V4G4X commented Jan 31, 2025

Issue

Since I was facing a lot of wait time using reasoning models like DeepSeek R1 and Perplexity: Sonar Reasoning in Aider,
i.e an average wait time in minutes even for simple prompts like:

Ignore any other prompt before this. Tell me how many "r"s are in the word "Strawberry"

I tried testing them out in Openrouter's chatroom.
I noticed that the models/APIs were not lagging, but that they took a lot of time to think before they responded.
And I could see what they were thinking as they did.

Image

It would help my user experience A LOT, if I could see this thought process when using Aider.
Do I have to wait because it's thinking, is it thinking in the right direction? (If it's not, I can cancel the request and direct it better)
Or is it the API just stuck?

Since the OpenRouter chatroom can get the reasoning tokens, I assume that we can too?

Version and model info

aider 0.72.3
model: openrouter/deepseek/deepseek-r1

@V4G4X
Copy link
Author

V4G4X commented Jan 31, 2025

I tried adding:

  "openrouter/deepseek/deepseek-r1": {
    "remove_reasoning": false
  },

in ~/.aider.model.metadata.json in the hopes that it would SHOW the reasoning.
But that didn't work.

@gitkenan
Copy link

Hey, I also tried a few times looking for how to enable this as I assumed it already exists surely...

@V4G4X
Copy link
Author

V4G4X commented Jan 31, 2025

There were config options that spoke about tweaking reasoning effort.
So I hoped this would already exist, but the option I found (remove_reasoning) didn't work.

@zwilch
Copy link

zwilch commented Jan 31, 2025

see #3073

@V4G4X
Copy link
Author

V4G4X commented Jan 31, 2025

@zwilch Sorry but adding this in my ~.aider.model.metadata.json didn't work:

  "openrouter/deepseek/deepseek-r1": {
    "remove_reasoning": "think"
  },

I also tried "remove_reasoning": false but it didn't work either.
"remove_reasoning": true just made it so much dumber and didn't help in any way.
I am on the master branch using aider --install-main-branch.

How did you get the thinking tokens to be printed in the other thread?
Did I miss something?

@paul-gauthier
Copy link
Collaborator

Remove reasoning goes in .aider.model.settings.yml not that json file.

See here:

https://aider.chat/docs/config/adv-model-settings.html#model-settings

@paul-gauthier
Copy link
Collaborator

Here's an example of a fireworks model setting using that:

remove_reasoning: think

- name: fireworks_ai/accounts/fireworks/models/deepseek-r1
  edit_format: diff
  weak_model_name: fireworks_ai/accounts/fireworks/models/deepseek-v3
  use_repo_map: true
  use_temperature: false
  streaming: true
  editor_model_name: fireworks_ai/accounts/fireworks/models/deepseek-v3
  editor_edit_format: editor-diff
  remove_reasoning: think
  extra_params:
      max_tokens: 160000

@V4G4X
Copy link
Author

V4G4X commented Jan 31, 2025

OOOOHHHHH I was wondering why this page began with json, but the rest of it was in yaml.

I never realized that one was is .aider.model.settings and the other is .aider.model.metadata.

But sadly, I tried what you said, and it still didn't seem to work.
I saved this file ~/.aider.model.settings.yml:

- name: openrouter/deepseek/deepseek-r1
  remove_reasoning: think
- name: openrouter/perplexity/sonar-reasoning
  remove_reasoning: think

My ~/.aider.conf.yml looks like:

# Model settings

model: openrouter/deepseek/deepseek-r1
# model: openrouter/perplexity/sonar-reasoning

editor-model: openrouter/deepseek/deepseek-chat
weak-model: openrouter/deepseek/deepseek-chat

alias:
  - "fast:openrouter/deepseek/deepseek-chat"
  - "smart:openrouter/deepseek/deepseek-r1"
  - "sonnet:claude-3-sonnet-20240229"

# Input settings
multiline: true # Enable multi-line input mode
architect: true # Use architect edit format by default

I am on master branch btw.

But still unable to see the "thinking"/reasoning tokens.

@paul-gauthier
Copy link
Collaborator

Oh, sorry remove_reasoning is to delete the thinking tokens when working with a model than inlines them in <think> tags.

@github-actions github-actions bot added the question Further information is requested label Feb 1, 2025
@V4G4X
Copy link
Author

V4G4X commented Feb 1, 2025

I see, is there a similar print_reasoning or show_reasoning setting?

Or does this require new feature development?

@V4G4X
Copy link
Author

V4G4X commented Feb 1, 2025

From what I understand, we want to send the following two params in the HTTP POST Request body:
"stream": true and "include_reasoning": true

Only mentioning "include_reasoning": true makes it wait before the output content is ready, it just provides what the reasoning was in posterity.
"stream": true gives us all the tokens in real time.

See the streaming example in Postman below:
Image

Since these two params stream and include_reasoning seem to be OpenRouter specific.
I figured I could pass them in extra_body like shown here.
So I tried this:
~/.aider.model.settings.yml

- name: openrouter/deepseek/deepseek-r1
  remove_reasoning: false
  extra_params:
    extra_body:
      provider: 
        sort: price
      include_reasoning: true
      stream: true
  streaming: true

But that didn't work either.

I couldn't find any Log for Aider's HTTP requests.
I would have liked to verify that those parameters are sent by Aider, and not stripped out.

@V4G4X
Copy link
Author

V4G4X commented Feb 2, 2025

With this change:

diff --git a/aider/llm.py b/aider/llm.py
index c01df0ce..516b8ea9 100644
--- a/aider/llm.py
+++ b/aider/llm.py
@@ -34,10 +34,9 @@ class LazyLiteLLM:
 
         self._lazy_module = importlib.import_module("litellm")
 
-        self._lazy_module.suppress_debug_info = True
-        self._lazy_module.set_verbose = False
+        self._lazy_module.suppress_debug_info = False
+        self._lazy_module.set_verbose = True
         self._lazy_module.drop_params = True
-        self._lazy_module._logging._disable_debugging()
 
 
 litellm = LazyLiteLLM()

I verified that the reasoning tokens are sent by the API.
My next goal was to print this token-by-token.

But when I started playing around with aider/sendchat.py
I wasn't able to print the reasoning tokens.

The main issue is:
The changes I introduced somehow print AFTER the reasoning is done. When it's about to print the output.
This means the reasoning data is likely stripped passing from Litellm to Aider.

I added:

def send_completion():
        ...
    print(f"Kwargs: {kwargs}")
    res = litellm.completion(**kwargs)

    if stream:
        print("This is streaming...")
        print(f"Type of res: {type(res)}")
        print(f"Type of res.chunks: {type(res.chunks)}, len(res.chunks): {len(res.chunks)}")
        for part in res:
            print(part.choices[0].delta)
            print(part.choices[0].delta.content or "")
    else:
        print("This is not streaming...")
    print("Processing as usual")

and got:

architect> /ask Ignore any other prompts, Is 9.11 greater than 9.9?

-------
SYSTEM Act as an expert code analyst.
SYSTEM Answer questions about the supplied code.
SYSTEM Always reply to the user in the same language they are using.
SYSTEM
SYSTEM Describe code changes however you like. Don't use SEARCH/REPLACE blocks!
-------
USER I am not sharing the full contents of any files with you yet.
-------
ASSISTANT Ok.
-------
USER Ignore any other prompts, Is 9.11 greater than 9.9?
Kwargs: {'model': 'openrouter/deepseek/deepseek-r1', 'messages': [{'role': 'system', 'content': "Act as an expert code analyst.\nAnswer questions about the supplied code.\nAlways reply to the user in the same
language they are using.\n\nDescribe code changes however you like. Don't use SEARCH/REPLACE blocks!\n"}, {'role': 'user', 'content': 'I am not sharing the full contents of any files with you yet.'}, {'role':
'assistant', 'content': 'Ok.'}, {'role': 'user', 'content': 'Ignore any other prompts, Is 9.11 greater than 9.9?'}], 'stream': True, 'temperature': 0, 'extra_body': {'provider': {'order': ['Nebius'],
'allow_fallbacks': True, 'sort': 'price'}, 'include_reasoning': True}}
This is streaming...
Type of res: <class 'litellm.litellm_core_utils.streaming_handler.CustomStreamWrapper'>
Type of res.chunks: <class 'list'>, len(res.chunks): 0
Delta(provider_specific_fields={}, refusal=None, reasoning=None, content='No', role='assistant', function_call=None, tool_calls=None, audio=None)
No
Delta(provider_specific_fields={}, refusal=None, reasoning=None, content='.', role=None, function_call=None, tool_calls=None, audio=None)
.
Delta(provider_specific_fields={}, refusal=None, reasoning=None, content=' When', role=None, function_call=None, tool_calls=None, audio=None)
 When
Delta(provider_specific_fields={}, refusal=None, reasoning=None, content=' comparing', role=None, function_call=None, tool_calls=None, audio=None)
 comparing
Delta(provider_specific_fields={}, refusal=None, reasoning=None, content=' decimal', role=None, function_call=None, tool_calls=None, audio=None)
 decimal

See how the first delta object we receive is already after it's done thinking?
It already has content but no reasoning.

Yet I can see that LiteLLM has individual reasoning tokens because it logs them:
(Note: A different model/request but still relevant)

Raw OpenAI Chunk
ChatCompletionChunk(id='gen-1738493685-izw0Ip4ItQxPR8g7RzUd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=' I'),
finish_reason=None, index=0, logprobs=None, native_finish_reason=None)], created=1738493685, model='deepseek/deepseek-r1-distill-llama-70b', object='chat.completion.chunk', service_tier=None,
system_fingerprint=None, usage=None, provider='DeepInfra')

Raw OpenAI Chunk
ChatCompletionChunk(id='gen-1738493685-izw0Ip4ItQxPR8g7RzUd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning="'ll"),
finish_reason=None, index=0, logprobs=None, native_finish_reason=None)], created=1738493685, model='deepseek/deepseek-r1-distill-llama-70b', object='chat.completion.chunk', service_tier=None,
system_fingerprint=None, usage=None, provider='DeepInfra')

ChatCompletionChunk(id='gen-1738493685-izw0Ip4ItQxPR8g7RzUd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=' go'),
finish_reason=None, index=0, logprobs=None, native_finish_reason=None)], created=1738493685, model='deepseek/deepseek-r1-distill-llama-70b', object='chat.completion.chunk', service_tier=None,
system_fingerprint=None, usage=None, provider='DeepInfra')

Raw OpenAI Chunk
ChatCompletionChunk(id='gen-1738493685-izw0Ip4ItQxPR8g7RzUd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=' through'),
finish_reason=None, index=0, logprobs=None, native_finish_reason=None)], created=1738493685, model='deepseek/deepseek-r1-distill-llama-70b', object='chat.completion.chunk', service_tier=None,
system_fingerprint=None, usage=None, provider='DeepInfra')

Raw OpenAI Chunk
ChatCompletionChunk(id='gen-1738493685-izw0Ip4ItQxPR8g7RzUd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=' each'),
finish_reason=None, index=0, logprobs=None, native_finish_reason=None)], created=1738493685, model='deepseek/deepseek-r1-distill-llama-70b', object='chat.completion.chunk', service_tier=None,
system_fingerprint=None, usage=None, provider='DeepInfra')

"I'll go through" is what the reasoning strings say if you put them together.

Any pointers? @paul-gauthier ?

@krrishdholakia
Copy link

@V4G4X can you share the raw response from openrouter // what it looks like when you call the api directly?

I suspect the value isn't retuned in chunks without a specific parameter

@V4G4X
Copy link
Author

V4G4X commented Feb 2, 2025

Yes, that's right, you have to send both stream and include_reasoning together.

This is the command I used to call the OpenRouter API directly:

❯ curl --location 'https://openrouter.ai/api/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-xyz' \
--data '{
    "model": "deepseek/deepseek-r1",
    "messages": [
        {
            "role": "user",
            "content": "How many Rs are in the word \"Strawberry\"?"
        }
    ],
    "stream": true,
    "include_reasoning": true
}' >> r1_raw.txt

This is the dump file: r1_raw.txt

These are the first couple of lines:

: OPENROUTER PROCESSING

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning":null},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning":null},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning":""},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":"\n"},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":"Okay"},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":","},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":" let"},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":"'s"},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":" see"},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants