Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually Added Keyword Ineffective in RAG Keyword Retrieval #12884

Closed
5 tasks done
hieheihei opened this issue Jan 20, 2025 · 2 comments · Fixed by #12908
Closed
5 tasks done

Manually Added Keyword Ineffective in RAG Keyword Retrieval #12884

hieheihei opened this issue Jan 20, 2025 · 2 comments · Fixed by #12908
Labels
👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database.

Comments

@hieheihei
Copy link
Contributor

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.15.1

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

1.Add a keyword to the Segment
Image

✔️ Expected Behavior

Test the keyword recall function, it recalled the paragraph that 'Key1' was just added to.

❌ Actual Behavior

Test the keyword recall function, the result is nothing.

Image
@dosubot dosubot bot added the 👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database. label Jan 20, 2025
@hieheihei
Copy link
Contributor Author

I have read the source code and found the cause: There are two keyword indexes. One is the segment index in the document_segments table, and the other is the inverted index of the knowledge base in the dataset_keyword_tables table. The current implementation only modifies the document_segments when adding keywords.

I am trying to fix it. And it seems to be effective

Copy link

dosubot bot commented Jan 20, 2025

Hey @hieheihei! 👋 I'm here to help you with bug fixes, answer questions, and even guide you on becoming a contributor. I'm a bot designed to assist you while waiting for a human maintainer.

To address the issue of manually added keywords not being effective in the RAG keyword retrieval, it's important to ensure that both the document_segments and dataset_keyword_tables are updated when keywords are added. The current implementation only updates the document_segments, which is why the keyword recall function is not retrieving the expected results.

To fix this, you should ensure that the keywords are also added to the dataset_keyword_tables. This involves:

  1. Updating the Document Segments: Use the _update_segment_keywords method to update the keywords in the document_segments table.

  2. Updating the Keyword Table: Use the _add_text_to_keyword_table method to add the keywords to the in-memory keyword table associated with the dataset_keyword_tables.

  3. Saving the Updated Keyword Table: Use the _save_dataset_keyword_table method to persist the changes to the dataset_keyword_tables, ensuring that the keywords are stored correctly in the database or file system, depending on your setup.

By ensuring both tables are updated, the keyword recall function should work as expected, retrieving the segments associated with the newly added keywords [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
👻 feat:rag Embedding related issue, like qdrant, weaviate, milvus, vector database.
Projects
None yet
1 participant