Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-Agents will have redundant content across different sections #548

Closed
DandinPower opened this issue May 29, 2024 · 23 comments
Closed

Comments

@DandinPower
Copy link
Contributor

Hello, I have tried the multi-agent implementation found in the multi-agents folder. I also read the multi-agent blog first. I discovered that even though each subsection focuses on different topics, it is easy for the content to overlap and discuss the same concepts across different subsections. As a result, the final report often contains a lot of redundant content, which is not useful at all.

I initially tried to add guidelines like "Each subsection must not have redundant content across different subsections." However, since the reviewer and reviser agents only work on individual subsections, the guidelines can only be applied at the subsection level, not to the entire report. Consequently, the reviewers and revisers are unaware of what is happening in other subsections, making it difficult to resolve the redundancy problem.

I am considering a solution where we have a chief reviewer and reviser. After each subsection completes its research, the chief reviewer and reviser would evaluate the final research across all subsections, ensure there is no redundant content, and provide revision directions to each subsection group to restart their research.

I think this kind of workflow will make the whole process more complex, increase token waste, and cause higher latency. However, I believe that if we can set global guidelines, such as "Each subsection must not have redundant content across different subsections," it can improve the final report's robustness and usefulness.

@assafelovic
Copy link
Owner

Hey @DandinPower that's a great discovery and definitely a huge improvement to the experience. At the moment (mostly for cost purposes) indeed the report does not take into consideration previous sub topics. To enable this every iteration would need to see the entire report generated before that which would be very costly in terms LLMs calls. I guess it's a tradeoff but definitely something we can consider adding as an option

@DandinPower
Copy link
Contributor Author

Hello @assafelovic ,
Thank you for your response! I understand that the current multi-agent workflow already incurs significant costs from LLM calls, and it's always important to consider the trade-off between cost and utility. However, I have a preliminary idea: perhaps we could sync with the parallel research group after each draft iteration. First, we could allow the chief reviewer to view all drafts, then apply "report-level guidelines" to create dynamic guidelines for each subsection reviewer. For instance, if both Section A and Section B cover a certain concept, the chief reviewer could instruct one reviewer with a dynamic guideline like "Do not cover [this concept]." After this addition, the original reviewer would use both the standard and dynamic guidelines to provide notes to the revisor. Following this workflow, the additional cost would be just one LLM call per review iteration, with a slight increase in latency for synchronization. This approach also enables the implementation of "entire report" level guidelines without having to restart the subsection research repeatedly, which could otherwise waste numerous LLM calls. I will attempt to develop an agent flow based on this concept and compare the average cost and latency with current workflow to ensure the costs are acceptable. If you're interested, I would be happy to share the results with you!

@antoremin
Copy link

Hey, I played with the researcher and noticed same issue. IMO having the tradeoff is well worth it, since the alternative is to remove redundant sections by hand. Maintaining coherence across the entire report seems paramount for real-world applications and complex subjects.
Maybe it could be an option so users could choose. Excited about possible architecture iterations in this direction and would love to help!

@DandinPower
Copy link
Contributor Author

Hello @antoremin, I have forked the repo and written a brief draft about the modification design. If you are interested, you are welcome to check the multi_agent_v2 folder for details !

@assafelovic
Copy link
Owner

Hey @DandinPower would love help with a PR for this! Currently working on a new front end UX/UI

@DandinPower
Copy link
Contributor Author

Hello @assafelovic, thank you for your invitation! I am willing to help with a PR after I finish the feature about the workflow I mentioned before.

@assafelovic
Copy link
Owner

@DandinPower @antoremin Thought about it deeper and I think there's a pretty decent approach that can work well:
Basically we can save all written content in a vector db and for every new section retrieve relevant information from it and consider it when generating new content. This way we're not overloading redundant content in the context for each section generation and it will most likely solve the issue. Wdyt?

@DandinPower
Copy link
Contributor Author

Hello @assafelovic @antoremin,

I think this is a very good approach to reduce redundant content across different subsections. It also allows for generating content in complete parallel since we can retrieve the data independently. Additionally, moving extra LLMs calls into embedding models and vector searches is generally quicker and cheaper. However, I have some concerns about using the vector similarity-based approach:

  1. How to set the similarity threshold to ensure the subsection content is covered enough.
  2. How to set the similarity threshold to ensure it doesn't cover other subsection content.
  3. If we use each subsection draft topic to retrieve relevant content, since the subsections are all related to the same topic, it is likely that all three "subsection draft topic" embeddings in the vector space will be very close. This makes it difficult to use the vector similarity-based method to retrieve the correct content we want.

image

In this figure, it shows that the three subsection topic embeddings (red points) are close to each other, and the other gray points are all written content chunks. The shallow yellow circles represent the relevant range based on the similarity threshold we set. This demonstrates the difficulty in finding a perfect way to retrieve the data without redundancy while getting all the content we want.

@assafelovic
Copy link
Owner

This is a great point! But consider that what we want is the LLM to take previously written sections into consideration when writing the next sections. So it might be enough to get similar chunks just for reasoning of generating new content. I'd be happy to see how we can push this forward anyone want to take a stab at it? @antoremin @DandinPower

@DandinPower
Copy link
Contributor Author

Hey @assafelovic @antoremin,
Sorry for the mix-up earlier. Here’s my updated take on your idea:
We’re looking at creating content section by section. For each part, we’ll base the writing on the gpt_researcher’s detailed report, its topic, and some subsection guidelines. Plus, we’ll pull in similar content from previous sections to avoid repeating ourselves.
This method keeps things simple and cost-effective. We won’t need to feed all previous sections into the next one, which would otherwise make things way more complex and expensive. By tackling each section one at a time and only pulling in a bit of relevant info from before, we get good results with fewer calls to the embedding model, even if it takes a bit longer overall. It might take more time, but it’s worth it for a high-quality, non-redundant final product.

If I’ve got this right, I’m up for helping push this idea forward. Here’s a quick plan on how we could tweak the current multi_agent workflow:

  1. After planning each subsection, we get researchers to do depth_report research in parallel for the first draft.
  2. Start the subsection writing process. We’ll get a reviewer to check the draft and look at previous relevant results by pulling content from the embedding storage. With the draft and relevant contents, the reviewer will give notes for the revisor, who will make changes based on those notes (similar to the current workflow). Once the reviewer confirms it meets all guidelines, we can split the report into chunks, call the embedding model, and save to embedding storage for future subsections.
  3. Once all sections are done, we can use the original workflow to write and publish the final report.
    This is just my initial idea on how to implement this. Do you think it’s better to handle redundant content directly at the revisor stage, or do you have any other thoughts?

@danieldekay
Copy link
Contributor

This sounds like a good plan, @DandinPower!

I have experience with converting transcripts of interviews, which might jump between topics, into a coherent report that keeps details (unlike a "summarize this meeting" prompt would).

My workflow was:

  1. Chunk the interview into pieces (as it can be hours long, and we use GPT-3.5-turbo).
  2. Create headlines
  3. Create a structure based on the headlines.
  4. Then split into chunks and write each chapter using the transcript chunked from a vectorDB.
  5. Then compile all into final report.

There is still some redundancy, but not as much, as if not using the vectordb, and token efficiency increases a lot.

In your proposed workflow I could imagine the editor (or someone else) gathering abstract summary knowledge from the researchers, and writing a guided outline with headlines, and what facts/questions need to be addressed where.
This outline could be chunked and processed separately with knowledge from the vectorDB.

@assafelovic
Copy link
Owner

This sounds great guys, who is helping with leading this PR? :)

@DandinPower
Copy link
Contributor Author

@danieldekay Thank you for sharing your experience and workflow! Your approach with interview transcripts is quite interesting and offers some valuable insights.

@assafelovic I'm willing to take the lead on this PR and implement the improvements we've been discussing. Is there anything specific you'd like me to focus on or consider as I develop this feature?

@danieldekay
Copy link
Contributor

@DandinPower, LangGraph also has a human-in-the-loop feature, and we might also want to ask a human on the highest abstraction level of the report structure to provide editorial feedback. This could be what it takes to bring from 60% quality to 85%.

@DandinPower
Copy link
Contributor Author

@danieldekay That sounds like a good feature! In my opinion, I think there are two approaches to incorporate human feedback into the workflow:
(1) After the planner outlines each subsection topic, we allow humans to provide high-level feedback on the direction of each section.
(2) After each subsection revisor finishes its revision, not only can the reviewer give review notes, but it can also ask humans to provide direction for further revision.
Perhaps we can include a configuration option to let users decide whether they want to activate human feedback or not.

@assafelovic
Copy link
Owner

@DandinPower ping me on Discord if you'd like or we can open a channel for this feature and invite whomever would like to contribute/test it. Generally take in mind that GPT Researcher is in change of generating a research report based on research tasks, and the long detailed report leverages it multiple times. I assume the logic should be backend/report_type/detailed_report/ path. Looking forward and excited to see what comes out of it!

@DandinPower
Copy link
Contributor Author

@assafelovic Okay, I'll ping you on Discord later! Thanks for your guidelines. I will make sure I understand the detailed report logic before pushing forward with the progress.

@danieldekay
Copy link
Contributor

@DandinPower , I am also reading the STORM paper (https://arxiv.org/pdf/2402.14207) which has loads more insights into possible processes.

@DandinPower
Copy link
Contributor Author

@danieldekay Hey! Thanks for introducing this paper. I think the current GPT researcher's detailed report type is also inspired by this paper. I will take a look at it.

@assafelovic
Copy link
Owner

Hey @DandinPower we're all eager to see this go live! :D

@DandinPower
Copy link
Contributor Author

Hey @assafelovic, I was previously busy, but now I have more time to push this forward! I am eager to see this move forward too!

@emilmirzayev
Copy link

Additionally I noticed that by having long descriptions (follow_guidelines does not work for me, throws error which is reported in discussion and also in #684 ) with explicit section logic helps.

@assafelovic
Copy link
Owner

Hey all, we've release a proposal solution for this. Please try it out and reopen this issue if you still find it an issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants