Fix deletion of documents with no chunks #1771

NolanTrem · 2025-01-07T22:11:27Z

Important

Fixes deletion of documents with no chunks in delete_documents_and_chunks_by_filter by removing unnecessary checks in management_service.py.

Behavior:
- Fixes deletion of documents with no chunks in delete_documents_and_chunks_by_filter in management_service.py by removing unnecessary checks.
- Ensures documents with no chunks are deleted without prior checks for entries.
Misc:
- Minor docstring formatting change in documents_router.py.
- Simplifies environment variable handling in compose.yaml.

^{This description was created by}^{for 1f30ec1. It will automatically update as commits are pushed.}

ellipsis-dev

❌ Changes requested. Reviewed everything up to f5ab69f in 1 minute and 7 seconds

More details

Looked at 128 lines of code in 2 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. py/core/main/services/management_service.py:144

Draft comment:
The logic for extracting document_ids from filters is not robust enough to handle all possible filter structures. Consider using a recursive function to extract all potential document IDs from nested filter structures.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment has a point - the code only handles simple cases and could miss document_ids in more complex filter structures like $and/$or operators or nested conditions. However, without seeing examples of the actual filter structures used in the codebase, we can't be certain this is a real issue. The current implementation may be sufficient for all actual use cases.
I may be overestimating the complexity needed - if the codebase only uses simple filters, making it more complex could be premature optimization. Also, the code already has a recursive transform_chunk_id_to_id function that could be adapted if needed.
While the concern is valid, the comment is somewhat speculative without evidence that complex nested filters are actually used. The existing code handles the basic cases cleanly.
Delete the comment since it's speculative and not based on clear evidence that more complex filter handling is needed. The current implementation may be perfectly sufficient for actual usage patterns.

Workflow ID: wflow_rqGRV3YULr5jgTLl

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2025-01-07T22:12:50Z

py/core/main/services/management_service.py

            if remaining["total_entries"] == 0:
                docs_to_delete.append(doc_id)

-        # 7. Delete documents that no longer have associated chunks.
-        #    Also update graphs if needed (entities/relationships).
+        # Delete documents that no longer have associated chunks
        for doc_id in docs_to_delete:


Consider adding error handling around the deletion of entities and relationships to ensure that any issues during these operations do not prevent the deletion of documents.

ellipsis-dev

👍 Looks good to me! Incremental review on 5adf263 in 46 seconds

More details

Looked at 68 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 2 drafted comments based on config settings.

1. py/core/main/services/management_service.py:188

Draft comment:
Raising an exception when a document is not found is a change in behavior. Ensure this change is documented in the PR description.
Reason this comment was not posted:
Comment did not seem useful.

2. py/core/main/services/management_service.py:191

Draft comment:
Raising an exception for owner_id mismatch is a security measure. Ensure this behavior change is documented in the PR description.
Reason this comment was not posted:
Comment did not seem useful.

Workflow ID: wflow_3aTu7t34rS3EneUH

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on abe12ac in 42 seconds

More details

Looked at 54 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. py/core/base/providers/ingestion.py:153

Draft comment:
Ensure that all usages of chunk_enrichment_settings are compatible with the ChunkEnrichmentSettings instance, as the default value has been changed from a dictionary to an instance.
Reason this comment was not posted:
Comment did not seem useful.

Workflow ID: wflow_dJvzQ0IFUNUB5woD

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 1f30ec1 in 58 seconds

More details

Looked at 73 lines of code in 3 files
Skipped 0 files when reviewing.
Skipped posting 4 drafted comments based on config settings.

1. py/core/main/services/management_service.py:187

Draft comment:
Consider revisiting the logic for checking document ownership and appending to docs_to_delete to ensure it correctly handles cases where documents have no chunks.
Reason this comment was not posted:
Confidence changes required: 50%
The change in the delete_documents_and_chunks_by_filter method removes a check that raises an exception if no document is found. This aligns with the PR description, which aims to ensure documents with no chunks are deleted. However, the logic for checking document ownership and appending to docs_to_delete should be revisited to ensure it correctly handles cases where documents have no chunks.

2. py/core/main/services/management_service.py:694

Draft comment:
Ensure that the response structure is consistent with expectations when using the walrus operator for assignment and conditional check.
Reason this comment was not posted:
Confidence changes required: 33%
The change in the summarize_collection method uses the walrus operator for assignment and conditional check. This is a valid and concise way to handle the logic, but ensure that the response structure is consistent with expectations.

3. py/compose.yaml:17

Draft comment:
Ensure that all dependent systems are updated to use the new environment variables, as the fallback to older variables has been removed.
Reason this comment was not posted:
Confidence changes required: 33%
The change in the compose.yaml file simplifies the environment variable definitions by removing the fallback to older environment variables. This is a good practice for reducing complexity, but ensure that all dependent systems are updated to use the new variables.

4. py/core/configs/r2r_azure.toml:24

Draft comment:
Ensure that the change to base_model for embeddings is intentional and that the new model is supported and tested.
Reason this comment was not posted:
Confidence changes required: 33%
The change in the r2r_azure.toml file updates the base_model for embeddings. Ensure that this change is intentional and that the new model is supported and tested.

Workflow ID: wflow_95bm8ZsgBGho3Iek

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Fix deletion of documents with no chunks

f5ab69f

ellipsis-dev bot reviewed Jan 7, 2025

View reviewed changes

Fix user auth on delete document

5adf263

ellipsis-dev bot reviewed Jan 7, 2025

View reviewed changes

Add offset and limit

abe12ac

ellipsis-dev bot reviewed Jan 7, 2025

View reviewed changes

Fix deletion logic, clean up compose

1f30ec1

ellipsis-dev bot reviewed Jan 7, 2025

View reviewed changes

NolanTrem merged commit 518138d into main Jan 7, 2025
13 of 14 checks passed

NolanTrem deleted the Nolan/DeleteEmptyChunkDocument branch January 7, 2025 22:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deletion of documents with no chunks #1771

Fix deletion of documents with no chunks #1771

NolanTrem commented Jan 7, 2025 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Jan 7, 2025

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

Fix deletion of documents with no chunks #1771

Fix deletion of documents with no chunks #1771

Conversation

NolanTrem commented Jan 7, 2025 • edited by ellipsis-dev bot Loading

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Jan 7, 2025

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

NolanTrem commented Jan 7, 2025 •

edited by ellipsis-dev bot

Loading