-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add an Executor for Docker sources #209
Feat: Add an Executor for Docker sources #209
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tarekk - Thanks so much for this contribution. We made a deliberate decision to not include Docker support in our initial launch, in order to prioritize the best-possible Python-native experience. That said, this contribution is an exciting one and would be welcomed by many users who do have a Docker runtime and who would like to leverage the broadest possible set of connectors.
In terms of path forward, we will discus internally on how best to deliver/position this, but for now I've left some inline feedback re: a few small comments on the implementation details. Lmk what you think on those.
On the bigger-picture roadmap/delivery topic, I will circle back later this week. I'm optimistic we can leverage this - and will confirm once I've got confirmation either way.
Thanks so much - and congrats on your first (BIG) open source contribution! 🙌
/test-pr
|
@Tarekk - Thanks for your patience. We're going to move forward and merge this, but we'll keep this functionality (for a short while) under a new I did see when invoking tests that the we're seeing failures on the Windows runners - but Ubuntu runners are passing all tests 🎉 . We already skip other docker-dependent user tests on Windows, so I would suggest the following:
|
Thank you @aaronsteers for reviewing this PR and bringing it up for review with the team, it's great to hear that it will be merged in 😃 And thank you fro the valuable feedback, I replied to your comments above and they all make sense to me! I am not very familiar with the testing infra but sounds like we would exclude running tests on Windows somewhere here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second round of feedback, just focused on testing this time.
I am not very familiar with the testing infra but sounds like we would exclude running tests on Windows somewhere here?
Yes, that's correct. There are probably many ways to do this (TIMTOWDI), but making distinct fixture names for docker-based source execution, and then riding the same rails to auto-skip those as you call out above is a good path, I think. 👍
Thank you for the guidance @aaronsteers! Updated to add the docker fixtures to the exclusion list. |
/test-pr
|
@Tarekk - I just ran the test suite and all tests are green ✅ 🎉 🙌 We'll try to merge this in within the next 24-48 hours. As noted above, I'll take a follow-on PR to move this into an CC @bindipankhudi - Do you mind taking a pass in the code review also? Any follow-ups we can optionally take internally before release. |
That's great to hear @aaronsteers 🥳 appreciate your help with getting this landed!! |
@aaronsteers, just took a pass at this PR. Looks great to me! No additional comments from me. The implementation feels very clean and loving the integration tests :). Nicely done @Tarekk |
What
Adding a
DockerExecutor
class to theget_source
factory.Why
I absolutely love this library as it allows me to rapidly pull data from a variety of different sources and load them into a "local DW" in duckDB for quick analysis using SQL. All this without having to deploy or startup any servers!! However, on numerous occasions I needed to pull in data from Java based Airbyte sources such as PostgreSQL and MySQL and couldn't use PyAirbyte to do that. Therefore, it would be great if the library supported the execution of Docker images that adhere to the Airbyte protocol. This will greatly increase the number of compatible sources and allow users to develop additional sources in their language of choice.
How
DockerExecutor
is almost identical toPathExecutor
but instead of receiving a path to the executable it expects a list of docker commands. These commands are passed down from the factory and include the mounting of the thetemp_dir
used for materializing the JSON config files.Example Usage
Tested via
test_docker_executable.py
which is a replica oftest_source_faker_integration.py
withdocker_executable=True
P.S. This is my first ever open source contribution 😃