Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Source Harvest: Migrate to Low Code #35863

Merged
merged 61 commits into from
Apr 15, 2024

Conversation

pnilan
Copy link
Contributor

@pnilan pnilan commented Mar 6, 2024

What

  • Migrates Harvest to Low Code per How To Migrate A Python Connector To Low Code guide
  • Unpins CDK, setting airbyte-cdk version ^0
  • Adds config_migrations.py to add new required auth_type property to config
  • Adds unit tests for config_migrations.py
  • Updates test_source.py, unit_test.py, and test_streams.py tests. Adds freezegun dev dependency.
  • Bumps major version number, updates metadata, changelog, and harvest-migrations.md accordingly
  • Closes https://github.com/airbytehq/airbyte-internal-issues/issues/6620

🚨 Breaking changes

  • Incremental substreams now use per-partition state which is a breaking change affecting the following streams:
    • estimate_messages
    • invoice_messages
    • invoice_payments
    • project_assignments
  • Updates report stream slices to use 365-day slice durations. This affects following streams:
    • expenses_clients
    • expenses_categories
    • expenses_projects
    • expenses_team
    • time_clients
    • time_projects
    • time_tasks
    • time_team
    • uninvoiced

Copy link

vercel bot commented Mar 6, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
airbyte-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 15, 2024 3:47pm

@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Mar 9, 2024
@pnilan pnilan marked this pull request as ready for review March 9, 2024 00:18
@octavia-squidington-iv octavia-squidington-iv requested a review from a team March 9, 2024 00:19
…:airbytehq/airbyte into pnilan/source-harvest-low-code-migration
@alafanechere
Copy link
Contributor

@alafanechere

The shift in slicing logic was intentional.

The report endpoints allow a max 1-year (365 day) duration with inclusive boundaries. So valid slices would be [01/01/2021, 12/31/2021], [01/01/2022, 12/31/2021], etc.

Previously the connector was slicing from 01/01/2023 to 01/01/2024, or 366 day duration and then using the “from” date as the new “to” date, for example: [01/01/2021, 01/01/2022], [01/01/2022, 01/01/2023], etc. (In your example above, 01/01/2020 to 12/31/2020 is 366 days because 2020 was a leap year). Unclear if the API actually allows for 366-day durations for leap years but it does explicitly say 365-day max duration.

   while start_date < end_date:
            # Max size of date chunks is 365 days
            # Docs: https://help.getharvest.com/api-v2/reports-api/reports/time-reports/
            end_date_slice = end_date if start_date >= end_date.subtract(days=365) else start_date.add(days=365)
            date_slice = {"from": start_date.strftime(self.date_param_template), "to": end_date_slice.strftime(self.date_param_template)}

            start_date = end_date_slice

            yield date_slice

Originally implemented slicing logic. Uses previous end date as new start date.

Updating the low-code implementation will require me to override the low-code slicing logic.

Given the above I think we should stick with the 365-day slice duration, but if you think it's more valuable to reflect the original implementation more closely, I'm happy to update. Let me know your thoughts.

I appreciate the detailed explanation. I'm in as long as we are aware of the reason of this behavioral change and you think it's more correct 👍 .
@pnilan do you think the to and from fields which are passed to records according to the requests parameters are of any use for customers? I'm wondering if we should remove them so that future changes to the slicing logic does not impact the destination.
I am asking now because it'd be a breaking change and might be interesting to bundle two breaking changes in the same release to avoid disturbing users multiple times.

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving! From a behavioral standpoint our regression test tool did not detect any alarming anomaly, quite the opposite as I think you fixed missing records on the following streams:

  • invoice_payments
  • project_assignments
  • invoice_messages
    (you might want to add that to the release notes).

I'll let @katmarkham validate the breaking change messaging is correct.

@pnilan
Copy link
Contributor Author

pnilan commented Apr 2, 2024

@alafanechere

I do think they are probably used in the destination as otherwise the report records do not provide any cursor values, so it could be used to track changes in reporting values over time.

See Expense Reports documentation.

Copy link
Contributor

@maxi297 maxi297 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't rechecked but I'll approve based on @alafanechere's approval to unblock the merge

@artem1205
Copy link
Collaborator

@pnilan, LGTM!
Could you please clarify something about breaking changes?
I don't see any changes in abnormal_state and If the test is still green, what changes have occurred in the state?

@pnilan
Copy link
Contributor Author

pnilan commented Apr 9, 2024

@artem1205

Updated and confirmed the new per-partition state causes a breaking change.

Your comment uncovered a potential issue with test_state_with_abnormally_large_values -- I was using the old state format but the test was passing, I even changed the date to be "reasonable" (non-abnormal?) and it would still pass. I could only get the test to fail once I updated to the correct new per-partition format and set the date to the a "reasonable" date. So now the format is updated for the relevant streams and I have confirmed that the test is functioning properly and passing.

I'll document this in an issue and dig a little deeper -- but it seems like a state format validation may be necessary. My hypothesis is that the incorrect state format was causing the read function to "error out" and return 0 records, resulting in a passing test.

Copy link
Collaborator

@lazebnyi lazebnyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@alafanechere alafanechere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pnilan I re-ran the regression test tool on the PR (to benefit from the latest feature addition to the tool).
I noticed a change in the catalog and wanted to make sure you are aware of it and it's fine:

{
  "type_changes": {
    "root[1]['default_cursor_field']": {
      "old_type": "list",
      "new_type": "NoneType",
      "old_value": [
        "updated_at"
      ],
      "new_value": null
    },
    "root[1]['source_defined_cursor']": {
      "old_type": "bool",
      "new_type": "NoneType",
      "old_value": true,
      "new_value": null
    }
  },
  "iterable_item_removed": {
    "root[1]['supported_sync_modes'][1]": "incremental"
  }
}

root[1] refers to the second stream defined in the catalog.

@pnilan
Copy link
Contributor Author

pnilan commented Apr 11, 2024

@alafanechere Thanks for the catch. I believe this was a mistake on my end. I think it should be fixed now but would you be able to confirm?

@alafanechere
Copy link
Contributor

@alafanechere Thanks for the catch. I believe this was a mistake on my end. I think it should be fixed now but would you be able to confirm?

I confirm the catalogs are matching now 👍

@lazebnyi lazebnyi merged commit b16590e into master Apr 15, 2024
31 checks passed
@lazebnyi lazebnyi deleted the pnilan/source-harvest-low-code-migration branch April 15, 2024 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/harvest low-code-migration This connector has been migrated to the low-code CDK
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants