-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deletion of documents with no chunks #1771
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to f5ab69f in 1 minute and 7 seconds
More details
- Looked at
128
lines of code in2
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. py/core/main/services/management_service.py:144
- Draft comment:
The logic for extractingdocument_ids
fromfilters
is not robust enough to handle all possible filter structures. Consider using a recursive function to extract all potential document IDs from nested filter structures. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment has a point - the code only handles simple cases and could miss document_ids in more complex filter structures like $and/$or operators or nested conditions. However, without seeing examples of the actual filter structures used in the codebase, we can't be certain this is a real issue. The current implementation may be sufficient for all actual use cases.
I may be overestimating the complexity needed - if the codebase only uses simple filters, making it more complex could be premature optimization. Also, the code already has a recursive transform_chunk_id_to_id function that could be adapted if needed.
While the concern is valid, the comment is somewhat speculative without evidence that complex nested filters are actually used. The existing code handles the basic cases cleanly.
Delete the comment since it's speculative and not based on clear evidence that more complex filter handling is needed. The current implementation may be perfectly sufficient for actual usage patterns.
Workflow ID: wflow_rqGRV3YULr5jgTLl
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
if remaining["total_entries"] == 0: | ||
docs_to_delete.append(doc_id) | ||
|
||
# 7. Delete documents that no longer have associated chunks. | ||
# Also update graphs if needed (entities/relationships). | ||
# Delete documents that no longer have associated chunks | ||
for doc_id in docs_to_delete: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding error handling around the deletion of entities and relationships to ensure that any issues during these operations do not prevent the deletion of documents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 5adf263 in 46 seconds
More details
- Looked at
68
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
2
drafted comments based on config settings.
1. py/core/main/services/management_service.py:188
- Draft comment:
Raising an exception when a document is not found is a change in behavior. Ensure this change is documented in the PR description. - Reason this comment was not posted:
Comment did not seem useful.
2. py/core/main/services/management_service.py:191
- Draft comment:
Raising an exception for owner_id mismatch is a security measure. Ensure this behavior change is documented in the PR description. - Reason this comment was not posted:
Comment did not seem useful.
Workflow ID: wflow_3aTu7t34rS3EneUH
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on abe12ac in 42 seconds
More details
- Looked at
54
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. py/core/base/providers/ingestion.py:153
- Draft comment:
Ensure that all usages ofchunk_enrichment_settings
are compatible with theChunkEnrichmentSettings
instance, as the default value has been changed from a dictionary to an instance. - Reason this comment was not posted:
Comment did not seem useful.
Workflow ID: wflow_dJvzQ0IFUNUB5woD
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 1f30ec1 in 58 seconds
More details
- Looked at
73
lines of code in3
files - Skipped
0
files when reviewing. - Skipped posting
4
drafted comments based on config settings.
1. py/core/main/services/management_service.py:187
- Draft comment:
Consider revisiting the logic for checking document ownership and appending todocs_to_delete
to ensure it correctly handles cases where documents have no chunks. - Reason this comment was not posted:
Confidence changes required:50%
The change in thedelete_documents_and_chunks_by_filter
method removes a check that raises an exception if no document is found. This aligns with the PR description, which aims to ensure documents with no chunks are deleted. However, the logic for checking document ownership and appending todocs_to_delete
should be revisited to ensure it correctly handles cases where documents have no chunks.
2. py/core/main/services/management_service.py:694
- Draft comment:
Ensure that the response structure is consistent with expectations when using the walrus operator for assignment and conditional check. - Reason this comment was not posted:
Confidence changes required:33%
The change in thesummarize_collection
method uses the walrus operator for assignment and conditional check. This is a valid and concise way to handle the logic, but ensure that the response structure is consistent with expectations.
3. py/compose.yaml:17
- Draft comment:
Ensure that all dependent systems are updated to use the new environment variables, as the fallback to older variables has been removed. - Reason this comment was not posted:
Confidence changes required:33%
The change in thecompose.yaml
file simplifies the environment variable definitions by removing the fallback to older environment variables. This is a good practice for reducing complexity, but ensure that all dependent systems are updated to use the new variables.
4. py/core/configs/r2r_azure.toml:24
- Draft comment:
Ensure that the change tobase_model
for embeddings is intentional and that the new model is supported and tested. - Reason this comment was not posted:
Confidence changes required:33%
The change in ther2r_azure.toml
file updates thebase_model
for embeddings. Ensure that this change is intentional and that the new model is supported and tested.
Workflow ID: wflow_95bm8ZsgBGho3Iek
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Important
Fixes deletion of documents with no chunks in
delete_documents_and_chunks_by_filter
by removing unnecessary checks inmanagement_service.py
.delete_documents_and_chunks_by_filter
inmanagement_service.py
by removing unnecessary checks.documents_router.py
.compose.yaml
.This description was created by for 1f30ec1. It will automatically update as commits are pushed.