Suggestion: See what the reasoning models are thinking before they give their output. #3086

V4G4X · 2025-01-31T09:09:41Z

Issue

Since I was facing a lot of wait time using reasoning models like DeepSeek R1 and Perplexity: Sonar Reasoning in Aider,
i.e an average wait time in minutes even for simple prompts like:

Ignore any other prompt before this. Tell me how many "r"s are in the word "Strawberry"

I tried testing them out in Openrouter's chatroom.
I noticed that the models/APIs were not lagging, but that they took a lot of time to think before they responded.
And I could see what they were thinking as they did.

It would help my user experience A LOT, if I could see this thought process when using Aider.
Do I have to wait because it's thinking, is it thinking in the right direction? (If it's not, I can cancel the request and direct it better)
Or is it the API just stuck?

Since the OpenRouter chatroom can get the reasoning tokens, I assume that we can too?

Version and model info

aider 0.72.3
model: openrouter/deepseek/deepseek-r1

V4G4X · 2025-01-31T09:19:23Z

I tried adding:

  "openrouter/deepseek/deepseek-r1": {
    "remove_reasoning": false
  },

in ~/.aider.model.metadata.json in the hopes that it would SHOW the reasoning.
But that didn't work.

gitkenan · 2025-01-31T09:24:22Z

Hey, I also tried a few times looking for how to enable this as I assumed it already exists surely...

V4G4X · 2025-01-31T09:38:27Z

There were config options that spoke about tweaking reasoning effort.
So I hoped this would already exist, but the option I found (remove_reasoning) didn't work.

zwilch · 2025-01-31T10:35:31Z

see #3073

V4G4X · 2025-01-31T19:57:43Z

@zwilch Sorry but adding this in my ~.aider.model.metadata.json didn't work:

  "openrouter/deepseek/deepseek-r1": {
    "remove_reasoning": "think"
  },

I also tried "remove_reasoning": false but it didn't work either.
"remove_reasoning": true just made it so much dumber and didn't help in any way.
I am on the master branch using aider --install-main-branch.

How did you get the thinking tokens to be printed in the other thread?
Did I miss something?

paul-gauthier · 2025-01-31T20:56:29Z

Remove reasoning goes in .aider.model.settings.yml not that json file.

See here:

https://aider.chat/docs/config/adv-model-settings.html#model-settings

paul-gauthier · 2025-01-31T20:59:16Z

Here's an example of a fireworks model setting using that:

aider/aider/resources/model-settings.yml

Line 614 in b0d58d1

remove_reasoning: think

- name: fireworks_ai/accounts/fireworks/models/deepseek-r1
  edit_format: diff
  weak_model_name: fireworks_ai/accounts/fireworks/models/deepseek-v3
  use_repo_map: true
  use_temperature: false
  streaming: true
  editor_model_name: fireworks_ai/accounts/fireworks/models/deepseek-v3
  editor_edit_format: editor-diff
  remove_reasoning: think
  extra_params:
      max_tokens: 160000

V4G4X · 2025-01-31T21:56:22Z

OOOOHHHHH I was wondering why this page began with json, but the rest of it was in yaml.

I never realized that one was is .aider.model.settings and the other is .aider.model.metadata.

But sadly, I tried what you said, and it still didn't seem to work.
I saved this file ~/.aider.model.settings.yml:

- name: openrouter/deepseek/deepseek-r1
  remove_reasoning: think
- name: openrouter/perplexity/sonar-reasoning
  remove_reasoning: think

My ~/.aider.conf.yml looks like:

# Model settings

model: openrouter/deepseek/deepseek-r1
# model: openrouter/perplexity/sonar-reasoning

editor-model: openrouter/deepseek/deepseek-chat
weak-model: openrouter/deepseek/deepseek-chat

alias:
  - "fast:openrouter/deepseek/deepseek-chat"
  - "smart:openrouter/deepseek/deepseek-r1"
  - "sonnet:claude-3-sonnet-20240229"

# Input settings
multiline: true # Enable multi-line input mode
architect: true # Use architect edit format by default

I am on master branch btw.

But still unable to see the "thinking"/reasoning tokens.

paul-gauthier · 2025-01-31T22:51:16Z

Oh, sorry remove_reasoning is to delete the thinking tokens when working with a model than inlines them in <think> tags.

V4G4X · 2025-02-01T08:30:04Z

I see, is there a similar print_reasoning or show_reasoning setting?

Or does this require new feature development?

V4G4X · 2025-02-01T10:40:17Z

From what I understand, we want to send the following two params in the HTTP POST Request body:
"stream": true and "include_reasoning": true

Only mentioning "include_reasoning": true makes it wait before the output content is ready, it just provides what the reasoning was in posterity.
"stream": true gives us all the tokens in real time.

See the streaming example in Postman below:

Since these two params stream and include_reasoning seem to be OpenRouter specific.
I figured I could pass them in extra_body like shown here.
So I tried this:
~/.aider.model.settings.yml

- name: openrouter/deepseek/deepseek-r1
  remove_reasoning: false
  extra_params:
    extra_body:
      provider: 
        sort: price
      include_reasoning: true
      stream: true
  streaming: true

But that didn't work either.

I couldn't find any Log for Aider's HTTP requests.
I would have liked to verify that those parameters are sent by Aider, and not stripped out.

V4G4X · 2025-02-02T11:11:22Z

With this change:

diff --git a/aider/llm.py b/aider/llm.py
index c01df0ce..516b8ea9 100644
--- a/aider/llm.py
+++ b/aider/llm.py
@@ -34,10 +34,9 @@ class LazyLiteLLM:
 
         self._lazy_module = importlib.import_module("litellm")
 
-        self._lazy_module.suppress_debug_info = True
-        self._lazy_module.set_verbose = False
+        self._lazy_module.suppress_debug_info = False
+        self._lazy_module.set_verbose = True
         self._lazy_module.drop_params = True
-        self._lazy_module._logging._disable_debugging()
 
 
 litellm = LazyLiteLLM()

I verified that the reasoning tokens are sent by the API.
My next goal was to print this token-by-token.

But when I started playing around with aider/sendchat.py
I wasn't able to print the reasoning tokens.

The main issue is:
The changes I introduced somehow print AFTER the reasoning is done. When it's about to print the output.
This means the reasoning data is likely stripped passing from Litellm to Aider.

I added:

def send_completion():
        ...
    print(f"Kwargs: {kwargs}")
    res = litellm.completion(**kwargs)

    if stream:
        print("This is streaming...")
        print(f"Type of res: {type(res)}")
        print(f"Type of res.chunks: {type(res.chunks)}, len(res.chunks): {len(res.chunks)}")
        for part in res:
            print(part.choices[0].delta)
            print(part.choices[0].delta.content or "")
    else:
        print("This is not streaming...")
    print("Processing as usual")

and got:

architect> /ask Ignore any other prompts, Is 9.11 greater than 9.9?

-------
SYSTEM Act as an expert code analyst.
SYSTEM Answer questions about the supplied code.
SYSTEM Always reply to the user in the same language they are using.
SYSTEM
SYSTEM Describe code changes however you like. Don't use SEARCH/REPLACE blocks!
-------
USER I am not sharing the full contents of any files with you yet.
-------
ASSISTANT Ok.
-------
USER Ignore any other prompts, Is 9.11 greater than 9.9?
Kwargs: {'model': 'openrouter/deepseek/deepseek-r1', 'messages': [{'role': 'system', 'content': "Act as an expert code analyst.\nAnswer questions about the supplied code.\nAlways reply to the user in the same
language they are using.\n\nDescribe code changes however you like. Don't use SEARCH/REPLACE blocks!\n"}, {'role': 'user', 'content': 'I am not sharing the full contents of any files with you yet.'}, {'role':
'assistant', 'content': 'Ok.'}, {'role': 'user', 'content': 'Ignore any other prompts, Is 9.11 greater than 9.9?'}], 'stream': True, 'temperature': 0, 'extra_body': {'provider': {'order': ['Nebius'],
'allow_fallbacks': True, 'sort': 'price'}, 'include_reasoning': True}}
This is streaming...
Type of res: <class 'litellm.litellm_core_utils.streaming_handler.CustomStreamWrapper'>
Type of res.chunks: <class 'list'>, len(res.chunks): 0
Delta(provider_specific_fields={}, refusal=None, reasoning=None, content='No', role='assistant', function_call=None, tool_calls=None, audio=None)
No
Delta(provider_specific_fields={}, refusal=None, reasoning=None, content='.', role=None, function_call=None, tool_calls=None, audio=None)
.
Delta(provider_specific_fields={}, refusal=None, reasoning=None, content=' When', role=None, function_call=None, tool_calls=None, audio=None)
 When
Delta(provider_specific_fields={}, refusal=None, reasoning=None, content=' comparing', role=None, function_call=None, tool_calls=None, audio=None)
 comparing
Delta(provider_specific_fields={}, refusal=None, reasoning=None, content=' decimal', role=None, function_call=None, tool_calls=None, audio=None)
 decimal

See how the first delta object we receive is already after it's done thinking?
It already has content but no reasoning.

Yet I can see that LiteLLM has individual reasoning tokens because it logs them:
(Note: A different model/request but still relevant)

Raw OpenAI Chunk
ChatCompletionChunk(id='gen-1738493685-izw0Ip4ItQxPR8g7RzUd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=' I'),
finish_reason=None, index=0, logprobs=None, native_finish_reason=None)], created=1738493685, model='deepseek/deepseek-r1-distill-llama-70b', object='chat.completion.chunk', service_tier=None,
system_fingerprint=None, usage=None, provider='DeepInfra')

Raw OpenAI Chunk
ChatCompletionChunk(id='gen-1738493685-izw0Ip4ItQxPR8g7RzUd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning="'ll"),
finish_reason=None, index=0, logprobs=None, native_finish_reason=None)], created=1738493685, model='deepseek/deepseek-r1-distill-llama-70b', object='chat.completion.chunk', service_tier=None,
system_fingerprint=None, usage=None, provider='DeepInfra')

ChatCompletionChunk(id='gen-1738493685-izw0Ip4ItQxPR8g7RzUd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=' go'),
finish_reason=None, index=0, logprobs=None, native_finish_reason=None)], created=1738493685, model='deepseek/deepseek-r1-distill-llama-70b', object='chat.completion.chunk', service_tier=None,
system_fingerprint=None, usage=None, provider='DeepInfra')

Raw OpenAI Chunk
ChatCompletionChunk(id='gen-1738493685-izw0Ip4ItQxPR8g7RzUd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=' through'),
finish_reason=None, index=0, logprobs=None, native_finish_reason=None)], created=1738493685, model='deepseek/deepseek-r1-distill-llama-70b', object='chat.completion.chunk', service_tier=None,
system_fingerprint=None, usage=None, provider='DeepInfra')

Raw OpenAI Chunk
ChatCompletionChunk(id='gen-1738493685-izw0Ip4ItQxPR8g7RzUd', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, refusal=None, role='assistant', tool_calls=None, reasoning=' each'),
finish_reason=None, index=0, logprobs=None, native_finish_reason=None)], created=1738493685, model='deepseek/deepseek-r1-distill-llama-70b', object='chat.completion.chunk', service_tier=None,
system_fingerprint=None, usage=None, provider='DeepInfra')

"I'll go through" is what the reasoning strings say if you put them together.

Any pointers? @paul-gauthier ?

krrishdholakia · 2025-02-02T15:34:09Z

@V4G4X can you share the raw response from openrouter // what it looks like when you call the api directly?

I suspect the value isn't retuned in chunks without a specific parameter

V4G4X · 2025-02-02T18:54:56Z

Yes, that's right, you have to send both stream and include_reasoning together.

This is the command I used to call the OpenRouter API directly:

❯ curl --location 'https://openrouter.ai/api/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer sk-xyz' \
--data '{
    "model": "deepseek/deepseek-r1",
    "messages": [
        {
            "role": "user",
            "content": "How many Rs are in the word \"Strawberry\"?"
        }
    ],
    "stream": true,
    "include_reasoning": true
}' >> r1_raw.txt

This is the dump file: r1_raw.txt

These are the first couple of lines:

: OPENROUTER PROCESSING

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning":null},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning":null},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":"","reasoning":""},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":"\n"},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":"Okay"},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":","},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":" let"},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":"'s"},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

data: {"id":"gen-1738522110-C35ibeGe4OvOkOUQk2Ed","provider":"DeepInfra","model":"deepseek/deepseek-r1","object":"chat.completion.chunk","created":1738522110,"choices":[{"index":0,"delta":{"role":"assistant","content":null,"reasoning":" see"},"finish_reason":null,"native_finish_reason":null,"logprobs":null}]}

Hope this helps.

zwilch mentioned this issue Jan 31, 2025

deepseek-r1 model long commit message includes think #3073

Open

github-actions bot added the question Further information is requested label Feb 1, 2025

This was referenced Feb 2, 2025

[Bug]: reasoning_content missing from completion's response BerriAI/litellm#8193

Open

🎅 I WISH LITELLM HAD... BerriAI/litellm#361

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: See what the reasoning models are thinking before they give their output. #3086

Suggestion: See what the reasoning models are thinking before they give their output. #3086

V4G4X commented Jan 31, 2025

V4G4X commented Jan 31, 2025 •

edited

Loading

gitkenan commented Jan 31, 2025

V4G4X commented Jan 31, 2025 •

edited

Loading

zwilch commented Jan 31, 2025

V4G4X commented Jan 31, 2025

paul-gauthier commented Jan 31, 2025

paul-gauthier commented Jan 31, 2025

V4G4X commented Jan 31, 2025

paul-gauthier commented Jan 31, 2025

V4G4X commented Feb 1, 2025

V4G4X commented Feb 1, 2025

V4G4X commented Feb 2, 2025 •

edited

Loading

krrishdholakia commented Feb 2, 2025

V4G4X commented Feb 2, 2025 •

edited

Loading

Suggestion: See what the reasoning models are thinking before they give their output. #3086

Suggestion: See what the reasoning models are thinking before they give their output. #3086

Comments

V4G4X commented Jan 31, 2025

Issue

Version and model info

V4G4X commented Jan 31, 2025 • edited Loading

gitkenan commented Jan 31, 2025

V4G4X commented Jan 31, 2025 • edited Loading

zwilch commented Jan 31, 2025

V4G4X commented Jan 31, 2025

paul-gauthier commented Jan 31, 2025

paul-gauthier commented Jan 31, 2025

V4G4X commented Jan 31, 2025

paul-gauthier commented Jan 31, 2025

V4G4X commented Feb 1, 2025

V4G4X commented Feb 1, 2025

V4G4X commented Feb 2, 2025 • edited Loading

krrishdholakia commented Feb 2, 2025

V4G4X commented Feb 2, 2025 • edited Loading

V4G4X commented Jan 31, 2025 •

edited

Loading

V4G4X commented Jan 31, 2025 •

edited

Loading

V4G4X commented Feb 2, 2025 •

edited

Loading

V4G4X commented Feb 2, 2025 •

edited

Loading