Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add an Executor for Docker sources #209

Merged
merged 3 commits into from
May 14, 2024

Conversation

Tarekk
Copy link
Contributor

@Tarekk Tarekk commented Apr 26, 2024

What

Adding a DockerExecutor class to the get_source factory.

Why

I absolutely love this library as it allows me to rapidly pull data from a variety of different sources and load them into a "local DW" in duckDB for quick analysis using SQL. All this without having to deploy or startup any servers!! However, on numerous occasions I needed to pull in data from Java based Airbyte sources such as PostgreSQL and MySQL and couldn't use PyAirbyte to do that. Therefore, it would be great if the library supported the execution of Docker images that adhere to the Airbyte protocol. This will greatly increase the number of compatible sources and allow users to develop additional sources in their language of choice.

How

DockerExecutor is almost identical to PathExecutor but instead of receiving a path to the executable it expects a list of docker commands. These commands are passed down from the factory and include the mounting of the the temp_dir used for materializing the JSON config files.

if docker_executable:
    version = version or "latest"

    temp_dir = tempfile.gettempdir()
    return Source(
        name=name,
        config=config,
        streams=streams,
        executor=DockerExecutor(
            name=name,
            executable=[
                "docker",
                "run",
                "--rm",
                "-i",
                "--volume",
                f"{temp_dir}:{temp_dir}",
                f"airbyte/{name}:{version}",
            ],
        ),
    )

Example Usage

source = ab.get_source(
    "source-faker",
    docker_executable=True,
    config={
        "count": 10,
    },
)

Tested via test_docker_executable.py which is a replica of test_source_faker_integration.py with docker_executable=True

P.S. This is my first ever open source contribution 😃

@aaronsteers aaronsteers self-requested a review May 1, 2024 01:59
Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tarekk - Thanks so much for this contribution. We made a deliberate decision to not include Docker support in our initial launch, in order to prioritize the best-possible Python-native experience. That said, this contribution is an exciting one and would be welcomed by many users who do have a Docker runtime and who would like to leverage the broadest possible set of connectors.

In terms of path forward, we will discus internally on how best to deliver/position this, but for now I've left some inline feedback re: a few small comments on the implementation details. Lmk what you think on those.

On the bigger-picture roadmap/delivery topic, I will circle back later this week. I'm optimistic we can leverage this - and will confirm once I've got confirmation either way.

Thanks so much - and congrats on your first (BIG) open source contribution! 🙌

airbyte/sources/util.py Show resolved Hide resolved
airbyte/sources/util.py Show resolved Hide resolved
airbyte/_executor.py Show resolved Hide resolved
@aaronsteers
Copy link
Contributor

aaronsteers commented May 7, 2024

/test-pr

PR test job started... Check job output.

❌ Tests failed.

@aaronsteers
Copy link
Contributor

@Tarekk - Thanks for your patience. We're going to move forward and merge this, but we'll keep this functionality (for a short while) under a new airbyte.experimental module. No need to make any changes here - I'll tackle that work post-merge of this PR.

I did see when invoking tests that the we're seeing failures on the Windows runners - but Ubuntu runners are passing all tests 🎉 . We already skip other docker-dependent user tests on Windows, so I would suggest the following:

  1. We should try to catch docker-executor failures when on Windows is detected, we report these under an Exception class that messages that Windows isn't supported.
    • Technically it is possible with WSL, but by default, Windows docker backends will be Windows-based containers.
  2. We can skip integration tests related to docker executable if we detect we are running on Windows. We have this pattern in place for other tests, so we can replicate it for this case also.

@Tarekk
Copy link
Contributor Author

Tarekk commented May 7, 2024

Thank you @aaronsteers for reviewing this PR and bringing it up for review with the team, it's great to hear that it will be merged in 😃 And thank you fro the valuable feedback, I replied to your comments above and they all make sense to me!

I am not very familiar with the testing infra but sounds like we would exclude running tests on Windows somewhere here?

Copy link
Contributor

@aaronsteers aaronsteers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second round of feedback, just focused on testing this time.

I am not very familiar with the testing infra but sounds like we would exclude running tests on Windows somewhere here?

Yes, that's correct. There are probably many ways to do this (TIMTOWDI), but making distinct fixture names for docker-based source execution, and then riding the same rails to auto-skip those as you call out above is a good path, I think. 👍

tests/integration_tests/test_docker_executable.py Outdated Show resolved Hide resolved
tests/integration_tests/test_docker_executable.py Outdated Show resolved Hide resolved
tests/integration_tests/test_docker_executable.py Outdated Show resolved Hide resolved
@Tarekk
Copy link
Contributor Author

Tarekk commented May 8, 2024

Thank you for the guidance @aaronsteers! Updated to add the docker fixtures to the exclusion list.

@aaronsteers
Copy link
Contributor

aaronsteers commented May 8, 2024

/test-pr

PR test job started... Check job output.

✅ Tests passed.

@aaronsteers
Copy link
Contributor

aaronsteers commented May 8, 2024

@Tarekk - I just ran the test suite and all tests are green ✅ 🎉 🙌

We'll try to merge this in within the next 24-48 hours. As noted above, I'll take a follow-on PR to move this into an experimental module while we iterate and gather feedback from the community, but this won't disrupt your ability to use the functionality in the meanwhile post-release.

CC @bindipankhudi - Do you mind taking a pass in the code review also? Any follow-ups we can optionally take internally before release.

@Tarekk
Copy link
Contributor Author

Tarekk commented May 8, 2024

That's great to hear @aaronsteers 🥳 appreciate your help with getting this landed!!

@bindipankhudi
Copy link
Contributor

@aaronsteers, just took a pass at this PR. Looks great to me! No additional comments from me. The implementation feels very clean and loving the integration tests :). Nicely done @Tarekk

@aaronsteers aaronsteers merged commit ddde34c into airbytehq:main May 14, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants