Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing oracle integration tests #5996

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

a-dubs
Copy link
Collaborator

@a-dubs a-dubs commented Jan 31, 2025

Proposed Commit Message

I should probably squash these once this is done being reviewed and ready to be merged right?

Additional Context

Test Steps

Tested locally using the following script:


declare -A tests
tests["focal"]="tests/integration_tests/modules/test_frequency_override.py::test_frequency_override \
tests/integration_tests/modules/test_ansible.py::test_ansible_pull_pip \
tests/integration_tests/modules/test_ubuntu_drivers.py::test_ubuntu_drivers_installed \
tests/integration_tests/modules/test_apt_functionality.py::test_install_missing_deps \
tests/integration_tests/modules/test_boothook.py::TestBoothook::test_boothook_header_runs_part_per_instance \
tests/integration_tests/modules/test_boothook.py::TestBoothook::test_boothook_waits_for_network"

tests["jammy"]="tests/integration_tests/modules/test_frequency_override.py::test_frequency_override \
tests/integration_tests/modules/test_ansible.py::test_ansible_pull_pip \
tests/integration_tests/modules/test_ubuntu_drivers.py::test_ubuntu_drivers_installed \
tests/integration_tests/modules/test_apt_functionality.py::test_install_missing_deps \
tests/integration_tests/modules/test_boothook.py::TestBoothook::test_boothook_header_runs_part_per_instance \
tests/integration_tests/modules/test_boothook.py::TestBoothook::test_boothook_waits_for_network"

tests["noble"]="tests/integration_tests/modules/test_ansible.py::test_ansible_pull_pip \
tests/integration_tests/modules/test_lxd.py::test_storage_lvm \
tests/integration_tests/modules/test_boothook.py::TestBoothook::test_boothook_header_runs_part_per_instance \
tests/integration_tests/modules/test_boothook.py::TestBoothook::test_boothook_waits_for_network \
tests/integration_tests/modules/test_ubuntu_drivers.py::test_ubuntu_drivers_installed \
tests/integration_tests/modules/test_ansible.py::test_ansible_pull_distro"

for suite in "${!tests[@]}"; do
    test_paths="${tests[$suite]}"
    log_file="integration-tests-logs/oracle-verification/$suite.log"
    # make sure directory exists
    mkdir -p "$(dirname "$log_file")"
    CLOUD_INIT_PLATFORM="oci" \
    CLOUD_INIT_OS_IMAGE="$suite" \
    CLOUD_INIT_CLOUD_INIT_SOURCE="ppa:cloud-init-dev/daily" \
    tox -e integration-tests-fast -- $test_paths \
    | tee "$log_file"
done

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

This test failed due to thinpool error that also occurs on LXD. This
commit just copies that expected/ignored warning message from LXD to
Oracle in verify_clean_boot.
@a-dubs a-dubs force-pushed the fixing-oracle-integration-tests branch from de4d355 to cce143d Compare January 31, 2025 21:22
@TheRealFalcon TheRealFalcon self-assigned this Feb 3, 2025
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates here. I left a few comments inline.

@@ -24,6 +27,20 @@

@pytest.mark.user_data(USER_DATA)
class TestBoothook:

@pytest.fixture(autouse=True)
def check_ci_version(self, class_client: IntegrationInstance):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things here:

  1. We don't really have a need to test older versions of cloud-init. We should generally assume we're always testing the latest version.
  2. If possible, we should use pytest's skip decorator rather than skipping in the body of the test. In this case, we're spending the time launching an instance only to skip it and not really do anything.

If there's a reason a test needs to be different based on the platform we're running, we should skip like this instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the assumption that we are always testing the newest version tbh. Would like to discuss this further face to face. Given that I might want to test how cloud-init behaves on a stable release / publicly available image, I think it would be nice to start adding these gates where applicable moving forward. And the reason the skip must occur inside the test like this is because how else are we supposed to know what version of cloud-init is being used besides for checking on the live instance itself :P

And don't worry, for 99% of use cases this will be a no-op and the test will run anyways (like Jenkins), unless you're a freak like me and like setting CLOUD_INIT_SOURCE="NONE" hehe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should bring this one up at standup tomorrow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeahh i forgot to today lol my b

tests/integration_tests/modules/test_ubuntu_drivers.py Outdated Show resolved Hide resolved
tests/integration_tests/util.py Show resolved Hide resolved
@TheRealFalcon
Copy link
Member

Oh also, mypy is angry. That'll need to be fixed too

@a-dubs a-dubs force-pushed the fixing-oracle-integration-tests branch from d68f5d1 to b7652f9 Compare February 5, 2025 22:07
Since the cc_ubuntu_drivers module runs the ubuntu-drivers command with
the "--gpgpu" flag, only a small subset of nvidia packages are installed
which do not include "linux-modules-nvidia" that the test originally
asserted were installed. Now, just look for any nvidia-*-server packages
that are installed.
Updated the test so that if running on Oracle Cloud, it will add a
command to the user-data for the initial instance where gpg gets
uninstalled to disable the oracle-cloud-agent snap to prevent it from
interfering with apt.
@a-dubs a-dubs force-pushed the fixing-oracle-integration-tests branch from b7652f9 to 4ccf16a Compare February 6, 2025 18:43
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more comments

instance type is used. The "VM.GPU.A10.1" instance type is used because
it is the most widely available GPU instance type on OCI.

Additionally, in case there is limited availability of GPU instances, this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this approach is that it turns ANY failure into an xfail, so then the test turns out to be not very useful.

What if instead it looked something like:

@pytest.mark.skipif(PLATFORM != "oci", reason="Test is OCI specific")
def test_ubuntu_drivers_installed(session_cloud: IntegrationCloud):
    """..."""
    try:
        client = session_cloud.launch(
            launch_kwargs={"instance_type": "VM.GPU.A10.1"},
            user_data=USER_DATA,
        )
    except Exception as e:
        pytest.xfail(f"...{e}...")

    try:
        _do_test(client)
    finally:
        client.destroy()


def _do_test(client: IntegrationInstance):
    log = client.read_from_file("/var/log/cloud-init.log")
    verify_clean_log(log)
    verify_clean_boot(client)
    assert 1 == log.count(
        "Installing and activating NVIDIA drivers "
        "(nvidia/license-accepted=True, version=latest)"
    )
    result = client.execute("dpkg -l | grep nvidia")
    assert result.ok, "No nvidia packages found"
    assert re.search(r"ii\s+nvidia.*-\d+-server", result.stdout), (
        f"Did not find specific nvidia drivers packages in:"
        f" {result.stdout}"
    )

def test_instance_json_ec2(
self,
class_client: IntegrationInstance,
session_cloud: Ec2Cloud,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love introducing this fixture here just to fix some typing. Can we instead do something along the lines of assert v1_data["region"] == cast(Ec2Instance, client.cloud.cloud_instance).region for the assert?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally feel like that is a quite messy way to do it, and I would rather rely on using the actual underlying session_cloud fixture.

But if you disagree, I am happy to make your change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holmanb , any strong opinions? Wanna be a tiebreaker?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants