-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix: passing document id to chunks #1815
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Reviewed everything up to 2348cc9 in 37 seconds
More details
- Looked at
355
lines of code in6
files - Skipped
0
files when reviewing. - Skipped posting
5
drafted comments based on config settings.
1. py/core/main/api/v3/documents_router.py:406
- Draft comment:
The document ID generation logic has changed from using chunks to using the filename. Ensure that filenames are unique to avoid ID collisions. - Reason this comment was not posted:
Comment did not seem useful.
2. js/sdk/src/baseClient.ts:52
- Draft comment:
Trailing commas in function parameters improve readability and ease future modifications. Good practice! - Reason this comment was not posted:
Confidence changes required:10%
The trailing commas added in the constructor and method parameters improve code readability and maintainability, especially when adding new parameters.
3. js/sdk/src/baseClient.ts:74
- Draft comment:
Trailing commas in function parameters improve readability and ease future modifications. Good practice! - Reason this comment was not posted:
Confidence changes required:10%
The trailing commas added in the constructor and method parameters improve code readability and maintainability, especially when adding new parameters.
4. js/sdk/src/r2rClient.ts:56
- Draft comment:
Trailing commas in function parameters improve readability and ease future modifications. Good practice! - Reason this comment was not posted:
Confidence changes required:10%
The trailing commas added in the constructor and method parameters improve code readability and maintainability, especially when adding new parameters.
5. js/sdk/src/r2rClient.ts:32
- Draft comment:
Trailing commas in function parameters improve readability and ease future modifications. Good practice! - Reason this comment was not posted:
Confidence changes required:10%
The trailing commas added in the constructor and method parameters improve code readability and maintainability, especially when adding new parameters.
Workflow ID: wflow_G12yqi6t7keb8xq0
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good to me! Incremental review on 3bc5b3c in 42 seconds
More details
- Looked at
13
lines of code in1
files - Skipped
0
files when reviewing. - Skipped posting
1
drafted comments based on config settings.
1. py/core/main/api/v3/documents_router.py:407
- Draft comment:
Concatenating all chunks into a single string for document ID generation can be inefficient and may cause performance issues if the chunks list is large. Consider using a more efficient method, such as hashing the chunks. - Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
Looking at the context:
- This is a document ID generation function call
- The chunks are already in memory since they're passed as a parameter
- String concatenation in Python is optimized for lists of strings
- The performance impact would be negligible compared to the rest of the document processing
- Hashing wouldn't necessarily be better since you'd still need to process all chunks
The comment raises a valid concern about performance, but may be overestimating the impact. String concatenation in Python is actually quite efficient for lists.
The performance impact would be minimal in practice, and the suggested solution of hashing wouldn't necessarily be better since it would still need to process all chunks.
The comment should be deleted as it raises a theoretical performance concern that is unlikely to be significant in practice and suggests a solution that may not be better.
Workflow ID: wflow_9l0szAJniNJPjvvV
You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
Important
Fix document ID handling in
create_document
for chunk-based document creation and update related tests.create_document
indocuments_router.py
, use providedid
for document creation from chunks, or generate a new one if not provided.Create a document from chunks with an id
inChunksIntegrationSuperUser.test.ts
to verify document creation with a specified ID.Create document with file path
test inDocumentsIntegrationSuperUser.test.ts
to use a specific ID.Delete a document
inChunksIntegrationSuperUser.test.ts
to verify document deletion by ID.This description was created by for 3bc5b3c. It will automatically update as commits are pushed.