Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deletion of documents with no chunks #1771

Merged
merged 4 commits into from
Jan 7, 2025

Conversation

NolanTrem
Copy link
Collaborator

@NolanTrem NolanTrem commented Jan 7, 2025

Important

Fixes deletion of documents with no chunks in delete_documents_and_chunks_by_filter by removing unnecessary checks in management_service.py.

  • Behavior:
    • Fixes deletion of documents with no chunks in delete_documents_and_chunks_by_filter in management_service.py by removing unnecessary checks.
    • Ensures documents with no chunks are deleted without prior checks for entries.
  • Misc:
    • Minor docstring formatting change in documents_router.py.
    • Simplifies environment variable handling in compose.yaml.

This description was created by Ellipsis for 1f30ec1. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested. Reviewed everything up to f5ab69f in 1 minute and 7 seconds

More details
  • Looked at 128 lines of code in 2 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/main/services/management_service.py:144
  • Draft comment:
    The logic for extracting document_ids from filters is not robust enough to handle all possible filter structures. Consider using a recursive function to extract all potential document IDs from nested filter structures.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The comment has a point - the code only handles simple cases and could miss document_ids in more complex filter structures like $and/$or operators or nested conditions. However, without seeing examples of the actual filter structures used in the codebase, we can't be certain this is a real issue. The current implementation may be sufficient for all actual use cases.
    I may be overestimating the complexity needed - if the codebase only uses simple filters, making it more complex could be premature optimization. Also, the code already has a recursive transform_chunk_id_to_id function that could be adapted if needed.
    While the concern is valid, the comment is somewhat speculative without evidence that complex nested filters are actually used. The existing code handles the basic cases cleanly.
    Delete the comment since it's speculative and not based on clear evidence that more complex filter handling is needed. The current implementation may be perfectly sufficient for actual usage patterns.

Workflow ID: wflow_rqGRV3YULr5jgTLl


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

if remaining["total_entries"] == 0:
docs_to_delete.append(doc_id)

# 7. Delete documents that no longer have associated chunks.
# Also update graphs if needed (entities/relationships).
# Delete documents that no longer have associated chunks
for doc_id in docs_to_delete:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding error handling around the deletion of entities and relationships to ensure that any issues during these operations do not prevent the deletion of documents.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 5adf263 in 46 seconds

More details
  • Looked at 68 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. py/core/main/services/management_service.py:188
  • Draft comment:
    Raising an exception when a document is not found is a change in behavior. Ensure this change is documented in the PR description.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. py/core/main/services/management_service.py:191
  • Draft comment:
    Raising an exception for owner_id mismatch is a security measure. Ensure this behavior change is documented in the PR description.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_3aTu7t34rS3EneUH


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on abe12ac in 42 seconds

More details
  • Looked at 54 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/base/providers/ingestion.py:153
  • Draft comment:
    Ensure that all usages of chunk_enrichment_settings are compatible with the ChunkEnrichmentSettings instance, as the default value has been changed from a dictionary to an instance.
  • Reason this comment was not posted:
    Comment did not seem useful.

Workflow ID: wflow_dJvzQ0IFUNUB5woD


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 1f30ec1 in 58 seconds

More details
  • Looked at 73 lines of code in 3 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. py/core/main/services/management_service.py:187
  • Draft comment:
    Consider revisiting the logic for checking document ownership and appending to docs_to_delete to ensure it correctly handles cases where documents have no chunks.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The change in the delete_documents_and_chunks_by_filter method removes a check that raises an exception if no document is found. This aligns with the PR description, which aims to ensure documents with no chunks are deleted. However, the logic for checking document ownership and appending to docs_to_delete should be revisited to ensure it correctly handles cases where documents have no chunks.
2. py/core/main/services/management_service.py:694
  • Draft comment:
    Ensure that the response structure is consistent with expectations when using the walrus operator for assignment and conditional check.
  • Reason this comment was not posted:
    Confidence changes required: 33%
    The change in the summarize_collection method uses the walrus operator for assignment and conditional check. This is a valid and concise way to handle the logic, but ensure that the response structure is consistent with expectations.
3. py/compose.yaml:17
  • Draft comment:
    Ensure that all dependent systems are updated to use the new environment variables, as the fallback to older variables has been removed.
  • Reason this comment was not posted:
    Confidence changes required: 33%
    The change in the compose.yaml file simplifies the environment variable definitions by removing the fallback to older environment variables. This is a good practice for reducing complexity, but ensure that all dependent systems are updated to use the new variables.
4. py/core/configs/r2r_azure.toml:24
  • Draft comment:
    Ensure that the change to base_model for embeddings is intentional and that the new model is supported and tested.
  • Reason this comment was not posted:
    Confidence changes required: 33%
    The change in the r2r_azure.toml file updates the base_model for embeddings. Ensure that this change is intentional and that the new model is supported and tested.

Workflow ID: wflow_95bm8ZsgBGho3Iek


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@NolanTrem NolanTrem merged commit 518138d into main Jan 7, 2025
13 of 14 checks passed
@NolanTrem NolanTrem deleted the Nolan/DeleteEmptyChunkDocument branch January 7, 2025 22:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant