-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression tests: run with airbyte-ci
#37440
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 1 Ignored Deployment
|
@clnoll this makes sense to me and I think this is where I miss details to give a review: how do you plan to orchestrate the CI run? I'd like to avoid the proliferation of commands on airbyte-ci and would prefer:
This will make it easier to eventually integrate the regression test as part of the automated test suite and avoid introducing new command and code. |
@alafanechere my hesitation with If we're agreed on that, I'm happy either keeping them in |
I think a reasonable tradeoff could be to introduce a We could also make the default |
Just closing the loop on the conversation - we've decided to keep regression tests in the main test pipeline. To avoid regression tests getting run unintentionally, we'll always add them to the |
80b808d
to
df0ed5b
Compare
@@ -96,10 +96,9 @@ async def get_connector_container(dagger_client: dagger.Client, image_name_with_ | |||
# If a container_id.txt file is available, we'll use it to load the connector container | |||
# We use a txt file as container ids can be too long to be passed as env vars | |||
# It's used for dagger-in-dagger use case with airbyte-ci, when the connector container is built via an upstream dagger operation | |||
connector_container_id_path = Path("/tmp/container_id.txt") | |||
if connector_container_id_path.exists(): | |||
# If the CONNECTOR_CONTAINER_ID env var is set, we'll use it to load the connector container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not getting it from the env var so I deleted this comment.
self.should_read_with_state = self._get_item_or_default(options, "should-read-with-state", True) | ||
|
||
@staticmethod | ||
def _get_item_or_default(options: Dict[str, Any], key: str, default: Any): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere is this something we're already doing elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mainly requesting changes for the minor suggestions. LGTM logic wise.
Can you please bump both project versions and changelogs 🙏
if user_id: | ||
analytics.identify(user_id) | ||
else: | ||
user_id = "airbyte-ci" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find it cleaner to explicitely pass airbyte-ci
as the user id.
The user_id
could also be fetched from an environment variable as it'd make it possible to track different CI use of the tool.
start_timestamp = int(time.time()) | ||
start_timestamp = int(config.getoption("--start-timestamp") or time.time()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we want to override the start_timestamp
? Is it to get predictable session path?
I would suggest creating a function which would produce the test_artifacts_directory
and uses the github run id if we're in CI.
Something like:
def get_artifacts_directory(start_timestamp):
suffix = os.environ["GITHUB_RUN_ID"] if "GITHUB_RUN_ID" in os.environ else str(start_timestamp)
return MAIN_OUTPUT_DIRECTORY / f"session_{suffix}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it to get predictable session path?
Yes that's right.
Using "GITHUB_RUN_ID" will work when this is run in GhA, but when we run it locally with airbyte-ci
we don't have that. I'm not seeing anything else in the environment that makes sense to use.
I agree it feels a little hacky to pass in start time, but we do need something that's known by both the external dagger process and the conftest.
To address this I made a change to pass --run-id
to pytest instead of --start-timestamp
. Then I let the dagger process look for GITHUB_RUN_ID
and fall back to a start timestamp if it's not present. Does this look better to you?
airbyte-ci/connectors/live-tests/src/live_tests/regression_tests/conftest.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why this file changed
airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connectors/commands.py
Outdated
Show resolved
Hide resolved
self.should_read_with_state = self._get_item_or_default(options, "should-read-with-state", True) | ||
|
||
@staticmethod | ||
def _get_item_or_default(options: Dict[str, Any], key: str, default: Any): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so but it can go to the helpers / utils. I'd be more inclined to use dpath
instead of writing this kind of function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use dpath and made it a staticmethod
on RunStepOptions
.
""" | ||
super().__init__(context) | ||
self.connector_image = context.docker_image.split(":")[0] | ||
print(self.context.run_step_options.step_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we avoid printing here in case there's a secret?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, deleted.
await container.directory(regression_tests_artifacts_dir).export(regression_tests_artifacts_dir) | ||
path_to_report = f"{regression_tests_artifacts_dir}/session_{int(start_timestamp)}/report.html" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just export the report in CI to minimize sensitive data exposure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done.
.with_mounted_file("/root/.ssh/id_rsa", self.dagger_client.host().file(str(Path("~/.ssh/id_rsa").expanduser()))) # TODO | ||
.with_mounted_file( | ||
"/root/.ssh/known_hosts", self.dagger_client.host().file(str(Path("~/.ssh/known_hosts").expanduser())) # TODO | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not 💯 sure this will work in GHA runners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't 😄 . But I'll make the needed update in the gha PR instead of here.
container = ( | ||
( | ||
container.with_exec(["apt-get", "update"]) | ||
.with_exec(["apt-get", "install", "-y", "git", "openssh-client", "curl", "docker.io"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit annoying that we have to install git
and openssh-client
just because we have a private dependency to install...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'm trying to make this go away in the gha follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I did not mean I have a better suggestion. 😄
bd1287b
to
d57c2dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your replies and changes 🙏 Please bump the versions and add changelog entries before merging.
container = ( | ||
( | ||
container.with_exec(["apt-get", "update"]) | ||
.with_exec(["apt-get", "install", "-y", "git", "openssh-client", "curl", "docker.io"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I did not mean I have a better suggestion. 😄
1ec9282
to
51bcc33
Compare
# If a package from `airbyte-platform-internal` is required, modify the entry in pyproject.toml to use https instead of ssh, when run in Github Actions | ||
if is_ci: | ||
container = container.with_exec( | ||
["sed", "-i", "-E", r"s,git@github\.com:airbytehq/airbyte-platform-internal,https://github.com/airbytehq/airbyte-platform-internal.git,", "pyproject.toml"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the URL https://
in the checked in pyproject.toml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we did that poetry install
wouldn't work for regression tests locally (whether running with pytest
or airbyte-ci
) unless the user has a PAT in their environment.
).with_exec( | ||
["poetry", "source", "add", "--priority=supplemental", "airbyte-platform-internal-source", | ||
"https://github.com/airbytehq/airbyte-platform-internal.git"] | ||
).with_exec( | ||
[ | ||
"poetry", "config", "http-basic.airbyte-platform-internal-source", "octavia-squidington-iii", | ||
pipeline_context_params["ci_github_access_token"], | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain in a comment why this is for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 done.
@@ -190,6 +191,20 @@ def prepare_container_for_poe_tasks( | |||
# Set working directory to the poetry package directory | |||
container = container.with_workdir(f"/airbyte/{poetry_package_path}") | |||
|
|||
# If a package from `airbyte-platform-internal` is required, modify the entry in pyproject.toml to use https instead of ssh, when run in Github Actions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simpler but hackier option would be to download the private package from the private repo outside of airbyte-ci in the GHA workflow and edit the pyproject.toml
to point to a local path instead of using a git URL.
The best approach would be to host the private packages in a private PyPi repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into moving it into the GHA when I'm wrapping up the GHA-specific PR.
Agree re hosting the private package, but it didn't seem worth it for just this one thing.
51bcc33
to
d819889
Compare
853f9ca
to
363cbb6
Compare
363cbb6
to
fe882c5
Compare
892a88c
to
c37aad4
Compare
/format-fix
|
e5e899f
to
7c2b897
Compare
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
[tool.airbyte_ci] | ||
optional_poetry_groups = ["dev"] | ||
poe_tasks = [] | ||
required_environment_variables = ["DOCKER_HUB_USERNAME", "DOCKER_HUB_PASSWORD"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me more about how we are using those credentials in live-tests? Are we using any private images today, and/or publishing? Are we logging in to dockerhub in the flow, and do we have to do it? /cc @alafanechere
@@ -1,4 +1,5 @@ | |||
# Copyright (c) 2023 Airbyte, Inc., all rights reserved. | |||
from __future__ import annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What version of Python are we running this on, mainly? Seemingly that is obsolete in 3.10, right?
Adds a new command and pipeline for running regression tests in
airbyte-ci
.This requires us to make a couple of changes to the live tests to bypass prompts. For this reason, test artifacts are cleaned up immediately when the test run is complete. If artifacts are needed the user should run this with pytest (as documented here). (When run in Github Actions, artifacts will be uploaded to GCS.)
TODO:
@alafanechere responding to a couple of your questions/comments:
PytestStep
mostly concerns itself with running tests that exist in a connector container. We'd have to refactor/overwrite a lot of the existing functionality to make it reusable here.To me it feels premature to have regression tests be part of the
test
suite so IMO we should hold off until we've decided on a cadence to run them on and can verify that we're getting meaningful, non-noisy results. I'm not personally totally sure we'll ever want to run them as part of "normal" tests, so would rather give them their own command than pollutetest
with conditional logic.