[Core] Roadmap for handling context overflow #156

yiranwu0 · 2023-10-08T21:44:01Z

Any help is appreciated!

Thit task demands a considerable amount of effort. If you have insights, suggestions, or can contribute in any way, your help would be immensely valued.

Problem Description

(Continue from #9) Current LLM have limited context size / token limit (gpt3.5turbo: 4096, gpt4 8192, etc). Although the current max_token limit from OpenAI is sufficient for many tasks, the token limit will be always exceeded with the conversation running. autogen.Completion will raise this InvalidRequestError that indicates the context size is exceeded since autogen doesn’t have a way to handle long context sizes.

Potential Methods

Compression: we can utilize LLMs to compress previous messages to reduce context size.
Retrieve related history messages: we can retrieve the most related messages based on the latest message.
Truncation: a simple way is keep the recent k messages and truncate all previous messages. We can also implement some truncation mechanisms, such as remove failed code executions.
A mixture of methods above.

Some References

Compression & Truncation

Give feedback

Add compression agent #131

feature long context handling
Add token_count_util #421
Add CompressibleAgent #443

long context handling
Allow async compression
[Core] Compression in GroupChat #497

group chat/teams long context handling
Add new mode "WINDOW" for compressible agent #685

long context handling
Options

Retrieval

Give feedback

Explore memGPT agent
Options

OrderAndCh4oS · 2023-10-10T14:45:59Z

Probably a slight tangent, but I'm finding error stack traces to a major culprit of context overflow

kazunator · 2023-10-12T00:51:48Z

Would using something like Llama Index help?

kazunator · 2023-10-14T02:21:41Z

could this work? https://memgpt.ai/

juanmacuevas · 2023-10-15T22:12:15Z

MemWalker processed long context intro a tree of summaries
Could this approach be applied for autogen? link to the paper

sonichi · 2023-10-16T04:41:17Z

@Hacker0912 Are you interested in this topic?

qidanrui · 2023-10-20T11:18:36Z

AutoGen is a great project! I'm very interested in how do you solve the context overflow.
When I was using AutoGen, sometimes I meet this situation:

Do you have any possible solution now? Thanks! @kevin666aa

yiranwu0 · 2023-10-20T14:51:18Z

AutoGen is a great project! I'm very interested in how do you solve the context overflow. When I was using AutoGen, sometimes I meet this situation: Do you have any possible solution now? Thanks! @kevin666aa

@qidanrui Here is a experimental PR for compression: #131 . It would be great if you can check it out and test it!

Just found a potential good solution for compression and I will look into this: https://arxiv.org/abs/2310.06839

qidanrui · 2023-10-21T02:00:19Z

Thanks for sharing! @kevin666aa I'm so interested in the AutoGen project!
I have two other more general questions: 1. what is the difference between AutoGen and ChatArena? 2. Can we customize our own tool/agent except python code writing like what we can do in LangChain?

aaronstevenson408 · 2023-10-21T04:44:32Z

i also am having issue with using a mistral model , using textgen webui as the api host

openai.error.InvalidRequestError: This model maximum context length is 2048 tokens. However, your messages resulted in over 2165 tokens.

MrXandbadas · 2023-10-21T05:01:03Z

could this work? https://memgpt.ai/

letta-ai/letta#65 (comment)

Edit:
Its been done! Fix your context issues by including a new bot. For the most up to date comment click the link but to save time here:
Please note this implementation replaces the coder from an example found in the examples folder

import os
import autogen
import asyncio
from absl import app, flags

config_list = [
    {
        'model': 'gpt-4',
        'api_key': os.getenv('OPENAI_API_KEY'),
    },
]

MEMGPT = True
if not MEMGPT:
    llm_config = {"config_list": config_list, "seed": 42}
    user_proxy = autogen.UserProxyAgent(
       name="User_proxy",
       system_message="A human admin.",
       code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
       human_input_mode="TERMINATE"
    )
    coder = autogen.AssistantAgent(
        name="Coder",
        llm_config=llm_config,
    )
    pm = autogen.AssistantAgent(
        name="Product_manager",
        system_message="Creative in software product ideas.",
        llm_config=llm_config,
    )

else:
    import memgpt.autogen.memgpt_agent as memgpt_autogen
    import memgpt.autogen.interface as autogen_interface 
    import memgpt.agent as agent
    import memgpt.system as system
    import memgpt.utils as utils
    import memgpt.presets as presets
    import memgpt.constants as constants
    import memgpt.personas.personas as personas
    import memgpt.humans.humans as humans
    from memgpt.persistence_manager import InMemoryStateManager, InMemoryStateManagerWithPreloadedArchivalMemory, InMemoryStateManagerWithFaiss
    
    llm_config = {"config_list": config_list, "seed": 42}
    user_proxy = autogen.UserProxyAgent(
       name="User_proxy",
       system_message="A human admin.",
       code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
    )

    interface = autogen_interface.AutoGenInterface()
    persistence_manager = InMemoryStateManager()
    memgpt_agent = presets.use_preset(presets.DEFAULT, 'gpt-4', personas.get_persona_text(personas.DEFAULT), humans.get_human_text(humans.DEFAULT), interface, persistence_manager)

    # MemGPT coder
    coder = memgpt_autogen.MemGPTAgent(
        name="MemGPT_coder",
        agent=memgpt_agent,
    )

    # non-MemGPT PM
    pm = autogen.AssistantAgent(
        name="Product_manager",
        system_message="Creative in software product ideas.",
        llm_config=llm_config,
    )

groupchat = autogen.GroupChat(agents=[user_proxy, coder, pm], messages=[], max_round=12)
manager = autogen.GroupChatManager(groupchat=groupchat, llm_config=llm_config)

user_proxy.initiate_chat(manager, message="First send the message 'Let's go Mario!'")

PriNova · 2023-10-21T09:23:57Z

I suggest starting with the simplest implementations without external dependencies first and later building on these.
This means to use truncation so that the history has a sliding window set by users if custom models will be used.

Later on, the user can switch between different implementations how to handle context window through truncation, vector embeddings DB or MemGPT.

yiranwu0 · 2023-10-21T21:18:45Z

Hello, @MrXandbadas Thanks for the heads up! It's great to have a memGPT agent in AutoGen. It requires much effort to add memgpt, but it seems that we can work with people from memGPT to make it happen. I guess the first step would be to make memgpt a built-in agent in AutoGen. Then we can think about how users can switch with different options for context overflow.

@PriNova Thanks for the advice! Your suggestion of truncating history is brought up here #195.
I am making a PR to add a compression agent for the user to choose here #131.
There are two parts to this PR:

A implementation where user can switch between two modes at a pre-set token limit:
TERMINATE: terminate the process before RateLimitError is raised; "COMPRESS": call a compression agent to compress previous messages.
A compression agent that does compression.

From the first part, it would be easy to add different ways to handle context later.
The compression agent is experimental but serves as a choice for now.

Please take a look at it if you are interested! @PriNova @MrXandbadas

sonichi · 2023-10-22T00:24:04Z

Sounds good, except "the first step would be to make memgpt a built-in agent in AutoGen." Before we make it a built-in agent, it'll be helpful to demonstrate one good use case of memgpt-based agent in autogen.

JonMike12341234 · 2023-10-22T01:14:46Z

I must be totally misunderstanding the request for "one good use case". Would this approach not give AutoGen Agents near limitless long term memory storage?

sonichi · 2023-10-22T04:39:28Z

Just suggesting doing it step by step. Test-driven development.

MrXandbadas · 2023-10-22T05:29:18Z

Just suggesting doing it step by step. Test-driven development.

I completely agree. Testing it in an more robust setup would garner more fruitful insight as to if it was going to be beneficial for the system or just a hinderance in the continual prompt generation that allows these agents to function so flawlessly.

ddwinhzy · 2023-10-24T08:51:54Z

I think it's gonna be a great start.

SDcodehub · 2023-10-31T16:25:34Z

I am interested in this topic, is anyone working on this? I can help can work together with others

yiranwu0 · 2023-11-01T01:30:14Z

I am interested in this topic, is anyone working on this? I can help can work together with others

Hello @SDcodehub, thanks for your interest! Currently I am working on adding a compressible agent #443. It could be used as an interface for different types of compression and truncations.
I think it is a good start pointing to know what we are doing.
I am starting a draft for compressible groupchat in #497.
The next step will be to allow async management of history.

On the other hand, it is also possible to utilize existing framework like memGPT. memGPT is actively supporting autogen agents. As @sonichi pointed out, we need to "demonstrate one good use case of memgpt-based agent in autogen". It is not hard to add a memGPT agent, but how to modify it to serve as a group memory requires more effort and thinking.

Grigorij-Dudnik · 2023-12-29T13:46:39Z

Hey guys!

The lack of possibility to see exact prompt which agent gets to a context and lack of ability of managing agent's contexts are the main problems of autogen in my opinion. Main reason for it is not even overflow of context window - the main reasons is the LLMs working much worse when had too long context with a lot of useless noise, and also generate unnecesary costs.

Maybe you remember, lack of input prompt visibility was big problem with Langchain, until they did a Langsmith. With autogen we have same problem again. In my opinion, we can't talk about any serious AI development if we can't see and edit input prompt of LLMs.

What do you think about it? Is such features of editing context (as summarizing or removing old messages) will be available in near future? Maybe there are already solutions exist I don't know about?

Cheers!

sonichi · 2024-01-01T02:18:00Z

I'm quite excited at #1091 by @rickyloynd-microsoft. It makes the teachability a composable capability to any conversable agent. More generally, the same mechanism may be used for solving other longstanding issues like long context handling and allowing other interesting capabilities to be defined. I like the extensibility and the composability of this approach. Reviews are welcome.
cc @kevin666aa

sgjohnson1981 · 2024-01-29T16:26:19Z

I'm quite excited at #1091 by @rickyloynd-microsoft. It makes the teachability a composable capability to any conversable agent. More generally, the same mechanism may be used for solving other longstanding issues like long context handling and allowing other interesting capabilities to be defined. I like the extensibility and the composability of this approach. Reviews are welcome. cc @kevin666aa

@sonichi
So does @rickyloynd-microsoft 's PR add a feature that eliminates the need for MemGPT or other context length solutions? I can't tell from "...mechanism may be used..." Is this mechanism used now for context length, or will that be implemented later?

rickyloynd-microsoft · 2024-01-29T16:47:00Z

I'm quite excited at #1091 by @rickyloynd-microsoft. It makes the teachability a composable capability to any conversable agent. More generally, the same mechanism may be used for solving other longstanding issues like long context handling and allowing other interesting capabilities to be defined. I like the extensibility and the composability of this approach. Reviews are welcome. cc @kevin666aa

@sonichi So does @rickyloynd-microsoft 's PR add a feature that eliminates the need for MemGPT or other context length solutions? I can't tell from "...mechanism may be used..." Is this mechanism used now for context length, or will that be implemented later?

Teachability is just one capability added through this new mechanism, and teachability is not designed to compress context or memorize general things like MemGPT. But other capabilities (like MemGPT or other ways of handling long context) could be added through this general capability-addition mechanism.

ekzhu · 2024-03-13T07:19:27Z

Shall we close this issue as several recent PRs related to long context handling have merged. @kevin666aa

yiranwu0 · 2024-03-13T19:01:05Z

Yes, thanks!

JingPush · 2024-04-25T21:25:18Z

Hi, I'm working on conversable agent flow with autogen. and really wants to know the status of handling context window length and truncate chat history.

I read the above conversation and have a few questions?

Is the status already completed and released?
Is there any docs or paper from autogen explaining how this is being handled specifically?

It would be really helpful if you can answer the question!

ekzhu · 2024-04-25T21:51:25Z

@JingPush current we use this: https://microsoft.github.io/autogen/docs/topics/long_contexts

yiranwu0 self-assigned this Oct 8, 2023

yiranwu0 added compression labels Oct 8, 2023

sonichi added the roadmap Issues related to roadmap of AutoGen label Oct 8, 2023

sonichi added this to AutoGen Priority Roadmap Oct 10, 2023

sonichi mentioned this issue Oct 11, 2023

Two errors about token limits and rate per minute limits in autogen_agentchat_groupchat_research, agentchat_teaching #198

Closed

sonichi mentioned this issue Oct 12, 2023

token size max #186

Closed

sonichi moved this to In Progress in AutoGen Priority Roadmap Oct 13, 2023

sonichi mentioned this issue Oct 14, 2023

InvalidRequestError Please reduce the length of message #238

Closed

tomsib2001 mentioned this issue Oct 18, 2023

Integration with AutoGen letta-ai/letta#20

Closed

rickyloynd-microsoft self-assigned this Oct 21, 2023

gagb mentioned this issue Oct 24, 2023

Loop and drop second interaction #195

Closed

rickyloynd-microsoft mentioned this issue Dec 28, 2023

[Feature Request]: Context window support #1080

Closed

ekzhu changed the title ~~Roadmap for handling context overflow~~ [Core] Roadmap for handling context overflow Dec 29, 2023

Grigorij-Dudnik mentioned this issue Jan 13, 2024

functionality of manual history cleaning by user proxy added #1230

Merged

3 tasks

yiranwu0 closed this as completed Mar 13, 2024

github-project-automation bot moved this from In Progress to Done in AutoGen Priority Roadmap Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Roadmap for handling context overflow #156

[Core] Roadmap for handling context overflow #156

yiranwu0 commented Oct 8, 2023 •

edited by ekzhu

Loading

Compression & Truncation

Retrieval

OrderAndCh4oS commented Oct 10, 2023

kazunator commented Oct 12, 2023

kazunator commented Oct 14, 2023

juanmacuevas commented Oct 15, 2023 •

edited

Loading

sonichi commented Oct 16, 2023

qidanrui commented Oct 20, 2023 •

edited

Loading

yiranwu0 commented Oct 20, 2023 •

edited

Loading

qidanrui commented Oct 21, 2023

aaronstevenson408 commented Oct 21, 2023 •

edited

Loading

MrXandbadas commented Oct 21, 2023 •

edited

Loading

PriNova commented Oct 21, 2023

yiranwu0 commented Oct 21, 2023

sonichi commented Oct 22, 2023

JonMike12341234 commented Oct 22, 2023

sonichi commented Oct 22, 2023

MrXandbadas commented Oct 22, 2023

ddwinhzy commented Oct 24, 2023

SDcodehub commented Oct 31, 2023

yiranwu0 commented Nov 1, 2023

Grigorij-Dudnik commented Dec 29, 2023

sonichi commented Jan 1, 2024

sgjohnson1981 commented Jan 29, 2024 •

edited

Loading

rickyloynd-microsoft commented Jan 29, 2024

ekzhu commented Mar 13, 2024

yiranwu0 commented Mar 13, 2024

JingPush commented Apr 25, 2024

ekzhu commented Apr 25, 2024

[Core] Roadmap for handling context overflow #156

[Core] Roadmap for handling context overflow #156

Comments

yiranwu0 commented Oct 8, 2023 • edited by ekzhu Loading

Any help is appreciated!

Problem Description

Potential Methods

Some References

Compression & Truncation

Retrieval

OrderAndCh4oS commented Oct 10, 2023

kazunator commented Oct 12, 2023

kazunator commented Oct 14, 2023

juanmacuevas commented Oct 15, 2023 • edited Loading

sonichi commented Oct 16, 2023

qidanrui commented Oct 20, 2023 • edited Loading

yiranwu0 commented Oct 20, 2023 • edited Loading

qidanrui commented Oct 21, 2023

aaronstevenson408 commented Oct 21, 2023 • edited Loading

MrXandbadas commented Oct 21, 2023 • edited Loading

PriNova commented Oct 21, 2023

yiranwu0 commented Oct 21, 2023

sonichi commented Oct 22, 2023

JonMike12341234 commented Oct 22, 2023

sonichi commented Oct 22, 2023

MrXandbadas commented Oct 22, 2023

ddwinhzy commented Oct 24, 2023

SDcodehub commented Oct 31, 2023

yiranwu0 commented Nov 1, 2023

Grigorij-Dudnik commented Dec 29, 2023

sonichi commented Jan 1, 2024

sgjohnson1981 commented Jan 29, 2024 • edited Loading

rickyloynd-microsoft commented Jan 29, 2024

ekzhu commented Mar 13, 2024

yiranwu0 commented Mar 13, 2024

JingPush commented Apr 25, 2024

ekzhu commented Apr 25, 2024

yiranwu0 commented Oct 8, 2023 •

edited by ekzhu

Loading

juanmacuevas commented Oct 15, 2023 •

edited

Loading

qidanrui commented Oct 20, 2023 •

edited

Loading

yiranwu0 commented Oct 20, 2023 •

edited

Loading

aaronstevenson408 commented Oct 21, 2023 •

edited

Loading

MrXandbadas commented Oct 21, 2023 •

edited

Loading

sgjohnson1981 commented Jan 29, 2024 •

edited

Loading