Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: passing document id to chunks #1815

Merged
merged 3 commits into from
Jan 14, 2025
Merged

Conversation

NolanTrem
Copy link
Collaborator

@NolanTrem NolanTrem commented Jan 13, 2025

Important

Fix document ID handling in create_document for chunk-based document creation and update related tests.

  • Behavior:
    • In create_document in documents_router.py, use provided id for document creation from chunks, or generate a new one if not provided.
  • Tests:
    • Add test Create a document from chunks with an id in ChunksIntegrationSuperUser.test.ts to verify document creation with a specified ID.
    • Update document ID in Create document with file path test in DocumentsIntegrationSuperUser.test.ts to use a specific ID.
    • Add test Delete a document in ChunksIntegrationSuperUser.test.ts to verify document deletion by ID.

This description was created by Ellipsis for 3bc5b3c. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to 2348cc9 in 37 seconds

More details
  • Looked at 355 lines of code in 6 files
  • Skipped 0 files when reviewing.
  • Skipped posting 5 drafted comments based on config settings.
1. py/core/main/api/v3/documents_router.py:406
  • Draft comment:
    The document ID generation logic has changed from using chunks to using the filename. Ensure that filenames are unique to avoid ID collisions.
  • Reason this comment was not posted:
    Comment did not seem useful.
2. js/sdk/src/baseClient.ts:52
  • Draft comment:
    Trailing commas in function parameters improve readability and ease future modifications. Good practice!
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The trailing commas added in the constructor and method parameters improve code readability and maintainability, especially when adding new parameters.
3. js/sdk/src/baseClient.ts:74
  • Draft comment:
    Trailing commas in function parameters improve readability and ease future modifications. Good practice!
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The trailing commas added in the constructor and method parameters improve code readability and maintainability, especially when adding new parameters.
4. js/sdk/src/r2rClient.ts:56
  • Draft comment:
    Trailing commas in function parameters improve readability and ease future modifications. Good practice!
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The trailing commas added in the constructor and method parameters improve code readability and maintainability, especially when adding new parameters.
5. js/sdk/src/r2rClient.ts:32
  • Draft comment:
    Trailing commas in function parameters improve readability and ease future modifications. Good practice!
  • Reason this comment was not posted:
    Confidence changes required: 10%
    The trailing commas added in the constructor and method parameters improve code readability and maintainability, especially when adding new parameters.

Workflow ID: wflow_G12yqi6t7keb8xq0


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 3bc5b3c in 42 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/main/api/v3/documents_router.py:407
  • Draft comment:
    Concatenating all chunks into a single string for document ID generation can be inefficient and may cause performance issues if the chunks list is large. Consider using a more efficient method, such as hashing the chunks.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    Looking at the context:
  1. This is a document ID generation function call
  2. The chunks are already in memory since they're passed as a parameter
  3. String concatenation in Python is optimized for lists of strings
  4. The performance impact would be negligible compared to the rest of the document processing
  5. Hashing wouldn't necessarily be better since you'd still need to process all chunks
    The comment raises a valid concern about performance, but may be overestimating the impact. String concatenation in Python is actually quite efficient for lists.
    The performance impact would be minimal in practice, and the suggested solution of hashing wouldn't necessarily be better since it would still need to process all chunks.
    The comment should be deleted as it raises a theoretical performance concern that is unlikely to be significant in practice and suggests a solution that may not be better.

Workflow ID: wflow_9l0szAJniNJPjvvV


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@NolanTrem NolanTrem merged commit d4aa834 into main Jan 14, 2025
15 checks passed
@NolanTrem NolanTrem deleted the Nolan/DocumentIdOnChunks branch January 14, 2025 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant