Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source Shopify: dynamically adjust the size of the slice for Bulk API streams #36788

Merged

Conversation

bazarnov
Copy link
Collaborator

@bazarnov bazarnov commented Apr 3, 2024

What

Resolving:

How

  • increased the retrieve_chunk_size from 5Mb to 10Mb to download the COMPLETED Job results faster
  • reduced the amount of RUNNING logged messages for BULK API streams, we now show the RUNNING status each 3rd time we receive it, having the status interval checks intact (show the message every 15 sec, while the status check is every 5 sec)
  • Added retry for ShopifyBulkExceptions.BulkJobBadResponse, ShopifyBulkExceptions.BulkJobUnknownError errors, we should retry here at least once
  • added the ability to expand and reduce the slice size for BULK API streams
  • added the main job_elapsed_time_threshold_sec threshold to control the size of the slice, based on the time spent on the last job COMPLETED.
  • kept the possibility to provide the bulk_window_in_days option from the config, in this case, the first request will be made using the value provided from the config, which then we optimized automatically, based on the job timings.
  • added the slice size of the job for readability.
  • We now raise an Exception when the Bulk Job was CANCELED by the server (system error)
  • We now handle the long-running Bulk Jobs by Canceling them intentionally and retrying it with the smaller Slice Size
  • Fixed the parent stream STATE regression for the Nested Substreams, causing parent STATE to drop once the deleted record has a smaller cursor value
  • covered the slice size adjusting logic with the unit_test

User Impact

No impact is expected.

@bazarnov bazarnov self-assigned this Apr 3, 2024
Copy link

vercel bot commented Apr 3, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 16, 2024 5:09pm

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Apr 3, 2024
@bazarnov bazarnov changed the title 🎉 Source Shopify: dynamically adjust slice size 🎉 Source Shopify: dynamically adjust the size of the slice for Bulk API streams Apr 3, 2024
@bazarnov
Copy link
Collaborator Author

Another pre-release: 2.0.5-dev.8e6bf87c15 with more logs about the processed/emitted records for nested substreams like order refunds, added at least 1 record guarantee for nested substreams to emit state message and checkpoint the sync when the STATE value for the nested substreams is bigger than the records cursor value observed, we might wait a lot before we collect enough data for the nested substream records buffer to flush them.

Continue testing the fix using this connection

@bazarnov
Copy link
Collaborator Author

bazarnov commented Apr 13, 2024

Another pre-release with Job Cancelation and Nested substream checkpointing has been published: 2.0.5-dev.92c677f79b


Given:

  • TIMEOUT: the 420 sec, or 7 mins

We now CANCEL the Bulk Job when it takes longer than the source expects:
Once the TIMEOUT for RUNNING Bulk Job is reached:

  • Cancel the Bulk Job
  • Revert the Slice to the previous period but have a smaller Slice Size now
  • Retry the Bulk Job
  • Repeat until the Bulk Job is COMPLETED within the TIMEOUT
  • Increase the Slice Size if the Job Is COMPLETED within TIMEOUT

@bazarnov
Copy link
Collaborator Author

Another pre-release covers all the fixes of: 2.0.5-dev.92c677f79b + adds a fix for parent state regression for nested substreams: 2.0.5-dev.3531956447

@bazarnov
Copy link
Collaborator Author

Fresh pre-release is out: 2.0.5-dev.823efc6c17, minimized the logs noise, finalized the cancelation logic.

@bazarnov bazarnov merged commit dccb6c0 into master Apr 17, 2024
29 checks passed
@bazarnov bazarnov deleted the baz/source/shopify/adjust-stream-slice-size-dynamically branch April 17, 2024 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/shopify team/critical-connectors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants