Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🎉 Source Shopify: migrate products, product images and product variants to GraphQL BULK #37767

Merged

Conversation

bazarnov
Copy link
Collaborator

@bazarnov bazarnov commented May 2, 2024

What

Resolving:

How

  • migrated Products, Product Images and Product Variants to Shopify GQL BULK
  • removed the parent dependency (Product) for the product images and product variants streams, they should be independent to have the proper migration process
  • extended the query.py (BULK Queries module) with 3 more classes for Product, ProductImage and ProductVariant queries
  • changed the references for the stream.py to make the migrated streams inherit from the IncrementalShopifyGraphQlBulkStream
  • added 3 test cases to cover the change
  • fixed the tests that were affected by the change

User Impact

There is a BREAKING CHANGE for the product variants stream STATE:

Before:

"id": 112233

Now:

"updated_at": "2024-01-01T01:01:01+00:00"

Can this PR be safely reverted and rolled back?

  • YES 💚,

Because the old STATE for id based values is still there for the STATE object, we can safely roll back when something is wrong.

@bazarnov bazarnov added breaking-change Don't merge me unless you are ready. team/critical-connectors labels May 2, 2024
@bazarnov bazarnov self-assigned this May 2, 2024
Copy link

vercel bot commented May 2, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 14, 2024 8:38am

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label May 2, 2024
@bazarnov
Copy link
Collaborator Author

bazarnov commented May 2, 2024

@katmarkham Could you please take a look at the user-facing message about the breaking change? Thank you.

@bazarnov bazarnov marked this pull request as ready for review May 2, 2024 11:28
@bazarnov bazarnov requested a review from a team May 2, 2024 11:28
@octavia-squidington-iv octavia-squidington-iv requested a review from a team May 2, 2024 11:29
@bazarnov bazarnov changed the title 🎉 Source Shopify: migrate product images and product variants to GraphQL BULK 🎉 Source Shopify: migrate products, product images and product variants to GraphQL BULK May 2, 2024
@bazarnov

This comment was marked as outdated.

@bazarnov
Copy link
Collaborator Author

bazarnov commented May 9, 2024

The pre-release version has been published successfully: 2.1.0-dev.805cd2979d
Action run: https://github.com/airbytehq/airbyte/actions/runs/9014252053

@bazarnov

This comment was marked as outdated.

@maxi297
Copy link
Contributor

maxi297 commented May 10, 2024

I feel uneasy to accept those live testing results without explaining those gaps:

1adee9d6-d8b1-4425-85a7-e13aebf7331e and 80be7c99-7a87-41e9-bae9-b0e61e7be4c4

The HTTP requests are still all the same i.e. I still don't see any request to products.json in the control version which makes me wonder if the stream is tested at all... Can you explain why we don't see HTTP requests to /products.json

d82df6a0-1b94-4608-9e63-e9f0d57ec652

It says product_variants stream: records with primary key in target & control whose values differ but the diff does not highlight this change. I would also expect to see this discrepancy in 81a5acbb-4374-497d-9af2-297ea8e7f6a7 and a7d4dcfe-39bb-4803-8e49-18ae07e838b2. Do we know why this change is not picked up by the live testing?

@bazarnov
Copy link
Collaborator Author

Because of the inability to test using live-testing tool (permissions issues), the manual changes report was conducted.

CHANGES (REGRESSION) TEST PLAN

  1. Make common CONFIG
  2. Make common CATALOG with isolated target (affected) stream
  3. Run-Test each affected stream against the Test Criteria. Affected streams: Products, Product Images, Product Variants

COMMANDS:

FULL-REFRESH

## CONTROL

docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:latest read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --debug > control.json
## TARGET

docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --debug > target.json

INCREMENTAL

## CONTROL

docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:latest read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --state /integration_tests/control_state.json --debug > control_inc.json
## TARGET

docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/source-shopify:dev read --config /secrets/config.json --catalog /integration_tests/configured_catalog.json --state /integration_tests/target_state.json --debug > target_inc.json

Test Criteria:

  1. Outgoing API request example:

    • 1.1) CONTROL
    • 1.2) TARGET
    • Explain the diff
  2. Sync mode: Full-Refresh

    • 2.1) Number of records:

      • 2.1.1) CONTROL
      • 2.1.2) TARGET
      • Explain the diff
    • 2.2) STATE comparison:

      • 2.2.1) CONTROL
      • 2.2.2) TARGET
      • Explain the diff
    • 2.3) Missing records:

      • 2.3.1) CONTROL
      • 2.3.2) TARGET
      • Explain the diff
  3. Sync mode: Incremental

    • 3.1) Number of records:

      • 3.1.1) CONTROL
      • 3.1.2) TARGET
      • Explain the diff
    • 3.2) STATE comparison:

      • 3.2.1) CONTROL
      • 3.2.2) TARGET
      • Explain the diff
    • 3.3) Missing records:

      • 3.3.1) CONTROL
      • 3.3.2) TARGET
      • Explain the diff
  4. Record example:

    • 4.1) CONTROL
    • 4.2) TARGET
  5. Schema changes:

    • 5.1) Added fields
    • 5.2) Changes types
    • 5.3) Removed fields
    • Explain the diff
  6. Is this a breaking change:

    • [] Yes
    • [] No

Test Results ( PII - free ):

cc @maxi297

@bazarnov bazarnov requested a review from maxi297 May 13, 2024 11:42
Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shit Baz! Big thumbs up on documenting the changes for this PR. I really appreciate it as it help review and go back to those changes so that we understand why it was done like that. Kudos!

Based on the report:

The difference in records is explained by the fact we don't have the possibility to fetch the deleted records, using GraphQL. The deleted records, are available using REST only.

Should we add that to the migration notes?

* In the `Product Variants` stream, the `presentment_prices.compare_at_price` property has changed from a `number` to an `object of strings`. This field was not populated in the `REST API` stream version, but it is correctly covered in the GraphQL stream version.
* The `Product Variants` stream's `inventory_policy` and `inventory_management` properties now contain `uppercase string` values, instead of `lowercase`.
* In the `Product Images` stream, the date-time fields, such as `created_at` and `updated_at`, now use `UTC` format without a timezone component.
* In the `Product Images` stream, the `variant_ids` property is no longer available. Refer to the `Product variants` stream instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think position is also missing based on the output you provided, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not present in the record for the TARGET version, but schema didn't change. I'll add this mention.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, position is not missing but it always returns null values. Could you please investigate?

@@ -121,8 +121,17 @@
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the sample you provided, the control version has field image_id while the target has image. Is this expected? Should we document this as well?

Copy link
Collaborator Author

@bazarnov bazarnov May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, added the handling of that, we basically need to have image_id at the root level, but graphql nests it in "image":{"image_id": "gid://shopify/ProductImage/123456"}. We don't have this populated for our sandbox, therefore it was Null for the data and wasn't processed by far. Thanks for the notice!

Corrected. Updated the dedicated unit_test to reflect this and updated the branch.

@bazarnov bazarnov merged commit 80cf249 into master May 14, 2024
33 of 34 checks passed
@bazarnov bazarnov deleted the baz/source/shopify/migrate-product-images-variants-to-bulk branch May 14, 2024 09:50
@loris
Copy link

loris commented May 31, 2024

@bazarnov @maxi297 quick question: since there are breaking changes, it should have been included in a 3.0 release no? I can't find how Airbyte Cloud handles connector upgrade, is this automated?

@maxi297
Copy link
Contributor

maxi297 commented May 31, 2024

Hi @loris ! The breaking changes were introduced as part of 2.1.0. As this is a breaking change, the opt-in mechanism on cloud should kick in and you should see a prompt asking you to upgrade before 2024-06-10

@loris
Copy link

loris commented Jun 2, 2024

@maxi297 Thanks for the clarification, I'm in discussion with cloud support because this is clearly not working as expected. In our production workspace, connections have been upgraded to 2.2 without any prompt (and thus broke our pipeline), while on our other workspaces, connections are still running Shopify 2.0 connector and we don't see any Upgrade prompt popping up.

@loris
Copy link

loris commented Jun 4, 2024

@maxi297 Actually I might have found the issue:


      2.1.0:
        message:
          "This upgrade changes the `Products`, `Product Images` and `Product Variants` streams to use `Shopify GraphQL BULK`.
          More details here: https://github.com/airbytehq/airbyte/pull/37767."
        upgradeDeadline: "2024-06-10"
        scopedImpact:
          - scopeType: stream
            impactedScopes: ["product_variants"]

The breaking change message only states product_variants in the impactedScopes, while products was also clearly impacted. Our Airbyte connections only needed to sync products (as the variants where part of that stream too). But with the change, we lost the variant data and were not informed of the breaking change because the impactedScopes did not included products, does this seem right?

@bazarnov
Copy link
Collaborator Author

bazarnov commented Jun 4, 2024

@loris This message included the breaking change that breaks the source sync operations only, the info about the data was re-arranged based on the API changes are not included in this message.

In the migration guide here: https://github.com/airbytehq/airbyte/blob/e9e6cf3778caf804bc5d433ccda1530de1373a6d/docs/integrations/sources/shopify-migrations.md which comes with this release, we included all the details about the actual data change we discovered. Have you had a chance to observe this in the UI?

Hope that answers your question and thank you for your feedback.

@loris
Copy link

loris commented Jun 4, 2024

Very clear!

Have you had a chance to observe this in the UI?

This is the part which is confusing to me: No, I did not observe anything in the UI. I also have a workspace where Shopify connector are still running the 2.0 version and did not see any Upgrade banner. I have the email/webhook notifications set up, but nothing there too.
We are waiting for a reply of Airbyte Cloud support and will let you know

@maxi297
Copy link
Contributor

maxi297 commented Jun 4, 2024

This information will be helpful for us a well as maybe they is a gap we are not aware related to the opt-in mechanism. Please, keep us up to date on this one

class Product(ShopifyBulkQuery):
"""
{
products(query: "updated_at:>='2020-01-20T00:00:00+00:00' AND updated_at:<'2024-04-25T00:00:00+00:00'", sortKey:UPDATED_AT) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bazarnov Hi, out of curiosity, is this normal to have hard-coded updated_at values here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the 'docstring' query example, I believe - yes.

Does this look wrong to you?
Should we have dynamic docstring for such cases?

If you can contribute to have the examples like this constructed dynamically, we would appreciate such improvements.

Thank you for your question, and feel free to add more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation breaking-change Don't merge me unless you are ready. connectors/source/shopify team/critical-connectors
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants