-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing oracle integration tests #5996
base: main
Are you sure you want to change the base?
Conversation
This test failed due to thinpool error that also occurs on LXD. This commit just copies that expected/ignored warning message from LXD to Oracle in verify_clean_boot.
de4d355
to
cce143d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates here. I left a few comments inline.
@@ -24,6 +27,20 @@ | |||
|
|||
@pytest.mark.user_data(USER_DATA) | |||
class TestBoothook: | |||
|
|||
@pytest.fixture(autouse=True) | |||
def check_ci_version(self, class_client: IntegrationInstance): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things here:
- We don't really have a need to test older versions of cloud-init. We should generally assume we're always testing the latest version.
- If possible, we should use pytest's skip decorator rather than skipping in the body of the test. In this case, we're spending the time launching an instance only to skip it and not really do anything.
If there's a reason a test needs to be different based on the platform we're running, we should skip like this instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the assumption that we are always testing the newest version tbh. Would like to discuss this further face to face. Given that I might want to test how cloud-init behaves on a stable release / publicly available image, I think it would be nice to start adding these gates where applicable moving forward. And the reason the skip must occur inside the test like this is because how else are we supposed to know what version of cloud-init is being used besides for checking on the live instance itself :P
And don't worry, for 99% of use cases this will be a no-op and the test will run anyways (like Jenkins), unless you're a freak like me and like setting CLOUD_INIT_SOURCE="NONE"
hehe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should bring this one up at standup tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeahh i forgot to today lol my b
Oh also, mypy is angry. That'll need to be fixed too |
d68f5d1
to
b7652f9
Compare
Since the cc_ubuntu_drivers module runs the ubuntu-drivers command with the "--gpgpu" flag, only a small subset of nvidia packages are installed which do not include "linux-modules-nvidia" that the test originally asserted were installed. Now, just look for any nvidia-*-server packages that are installed.
Updated the test so that if running on Oracle Cloud, it will add a command to the user-data for the initial instance where gpg gets uninstalled to disable the oracle-cloud-agent snap to prevent it from interfering with apt.
b7652f9
to
4ccf16a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple more comments
instance type is used. The "VM.GPU.A10.1" instance type is used because | ||
it is the most widely available GPU instance type on OCI. | ||
|
||
Additionally, in case there is limited availability of GPU instances, this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with this approach is that it turns ANY failure into an xfail, so then the test turns out to be not very useful.
What if instead it looked something like:
@pytest.mark.skipif(PLATFORM != "oci", reason="Test is OCI specific")
def test_ubuntu_drivers_installed(session_cloud: IntegrationCloud):
"""..."""
try:
client = session_cloud.launch(
launch_kwargs={"instance_type": "VM.GPU.A10.1"},
user_data=USER_DATA,
)
except Exception as e:
pytest.xfail(f"...{e}...")
try:
_do_test(client)
finally:
client.destroy()
def _do_test(client: IntegrationInstance):
log = client.read_from_file("/var/log/cloud-init.log")
verify_clean_log(log)
verify_clean_boot(client)
assert 1 == log.count(
"Installing and activating NVIDIA drivers "
"(nvidia/license-accepted=True, version=latest)"
)
result = client.execute("dpkg -l | grep nvidia")
assert result.ok, "No nvidia packages found"
assert re.search(r"ii\s+nvidia.*-\d+-server", result.stdout), (
f"Did not find specific nvidia drivers packages in:"
f" {result.stdout}"
)
def test_instance_json_ec2( | ||
self, | ||
class_client: IntegrationInstance, | ||
session_cloud: Ec2Cloud, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love introducing this fixture here just to fix some typing. Can we instead do something along the lines of assert v1_data["region"] == cast(Ec2Instance, client.cloud.cloud_instance).region
for the assert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally feel like that is a quite messy way to do it, and I would rather rely on using the actual underlying session_cloud fixture.
But if you disagree, I am happy to make your change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@holmanb , any strong opinions? Wanna be a tiebreaker?
Proposed Commit Message
I should probably squash these once this is done being reviewed and ready to be merged right?
Additional Context
Test Steps
Tested locally using the following script:
Merge type