Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move from user_id to owner_id for contextual enrichment #1784

Merged
merged 1 commit into from
Jan 8, 2025

Conversation

NolanTrem
Copy link
Collaborator

@NolanTrem NolanTrem commented Jan 8, 2025

Important

Change filter_user_ids to filter_user_ids in parse() method of HatchetIngestFilesWorkflow to use owner_id instead of user_id.

  • Behavior:
    • In ingestion_workflow.py, change filter_user_ids from document_info.user_id to document_info.owner_id in parse() method of HatchetIngestFilesWorkflow class.
    • Affects document overview retrieval, aligning with owner-based filtering instead of user-based.

This description was created by Ellipsis for b093014. It will automatically update as commits are pushed.

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to b093014 in 29 seconds

More details
  • Looked at 13 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 1 drafted comments based on config settings.
1. py/core/main/orchestration/hatchet/ingestion_workflow.py:228
  • Draft comment:
    Consider changing filter_user_ids to use owner_id instead of user.id for consistency with the PR's intent.
  • Reason this comment was not posted:
    Comment looked like it was already resolved.

Workflow ID: wflow_SelbpRB6mw40eKJ5


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@NolanTrem NolanTrem merged commit 240047b into main Jan 8, 2025
13 of 14 checks passed
@NolanTrem NolanTrem deleted the Nolan/OwnerID branch January 8, 2025 22:48
@qdrddr
Copy link

qdrddr commented Jan 8, 2025

FYI after I have applied this patch to the docker container v3.3.22, I see a new error message.
A bit about my config: I'm using a hatchet orchestration provider with an unstructured_local ingestion provider. My [ingestion.chunk_enrichment_settings] in r2r.toml is enable_chunk_enrichment = true

Previously with unchanged v3.3.22 I got an error message, but the chunk was successfully processed somehow.
Now, after this patch, my ingestion status is 'failed' in the UI @NolanTrem:

2025-01-08 17:32:46 [ERROR]     🪓 -- 2025-01-08 23:32:46,471 - exception raised in action (ingest-files:parse, retry=0):
2025-01-08 17:32:46 500: Error during ingestion: 'chunk_id'
2025-01-08 17:32:46 Traceback (most recent call last):
2025-01-08 17:32:46   File "/app/core/main/orchestration/hatchet/ingestion_workflow.py", line 238, in parse
2025-01-08 17:32:46     await self.ingestion_service.chunk_enrichment(
2025-01-08 17:32:46   File "/app/core/main/services/ingestion_service.py", line 596, in chunk_enrichment
2025-01-08 17:32:46     chunk["chunk_id"]: chunk for chunk in list_document_chunks
2025-01-08 17:32:46     ~~~~~^^^^^^^^^^^^
2025-01-08 17:32:46 KeyError: 'chunk_id'
2025-01-08 17:32:46 
2025-01-08 17:32:46 During handling of the above exception, another exception occurred:
2025-01-08 17:32:46 
2025-01-08 17:32:46 Traceback (most recent call last):
2025-01-08 17:32:46   File "/usr/local/lib/python3.12/site-packages/hatchet_sdk/worker/runner/runner.py", line 213, in async_wrapped_action_func
2025-01-08 17:32:46     return await action_func(context)
2025-01-08 17:32:46            ^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-01-08 17:32:46   File "/app/core/main/orchestration/hatchet/ingestion_workflow.py", line 259, in parse
2025-01-08 17:32:46     raise HTTPException(
2025-01-08 17:32:46 fastapi.exceptions.HTTPException: 500: Error during ingestion: 'chunk_id'
2025-01-08 17:32:46 
2025-01-08 17:32:46 [ERROR]     🪓 -- 2025-01-08 23:32:46,472 - failed step run: ingest-files:parse/b828d488-60a1-4adc-b3a8-3b7f68b1feaf
2025-01-08 17:32:46 [DEBUG]     🪓 -- 2025-01-08 23:32:46,473 - tx: event: ingest-files:parse/3
2025-01-08 17:32:46 [INFO]      🪓 -- 2025-01-08 23:32:46,503 - rx: start step run: 47ffaf1c-24d4-45af-981d-9866cf152d51/ingest-files:on_failure
2025-01-08 17:32:46 [DEBUG]     🪓 -- 2025-01-08 23:32:46,504 - tx: event: ingest-files:on_failure/1
2025-01-08 17:32:46 [INFO]      🪓 -- 2025-01-08 23:32:46,504 - run: start step: ingest-files:on_failure/47ffaf1c-24d4-45af-981d-9866cf152d51
2025-01-08 17:32:46 [DEBUG]     🪓 -- 2025-01-08 23:32:46,505 - tx: event: ingest-files:on_failure/1
2025-01-08 17:32:46 [DEBUG]     🪓 -- 2025-01-08 23:32:46,505 - start time: 0.0010957717895507812
2025-01-08 17:32:46 [INFO]      🪓 -- 2025-01-08 23:32:46,510 - finished step run: ingest-files:on_failure/47ffaf1c-24d4-45af-981d-9866cf152d51
2025-01-08 17:32:46 [DEBUG]     🪓 -- 2025-01-08 23:32:46,510 - tx: event: ingest-files:on_failure/2
2025-01-08 17:32:47 2025-01-08 23:32:47 - INFO - 127.0.0.1:39996 - "GET /v3/health HTTP/1.1" 200

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants