Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHOAIENG-18979: chore(test/containers): check workbench startup logs #886

Merged

Conversation

jstourac
Copy link
Member

@jstourac jstourac commented Feb 5, 2025

This adds a new check for the content of the workbench startup logs searching for some keywords that may indicate some problem or issue.

https://issues.redhat.com/browse/RHOAIENG-18979

Description

How Has This Been Tested?

DOCKER_HOST=unix:///run/user/$UID/podman/podman.sock poetry run pytest tests/containers --image quay.io/modh/odh-generic-data-science-notebook@sha256:0efbb3ad6f8f342360cf1f002d40716a39d4c58f69163e053d5bd19b4fe732d4 -k test_image_entrypoint_starts

DOCKER_HOST=unix:///run/user/$UID/podman/podman.sock poetry run pytest tests/containers --image quay.io/opendatahub/workbench-images@sha256:27d87f50f7aad6185344368b1b00c446773cff12bbea2f87e8d0c4ece2b13d87 -k test_image_entrypoint_starts

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

Copy link
Contributor

openshift-ci bot commented Feb 5, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci bot added size/m and removed size/m labels Feb 5, 2025
for keyword in blocked_keywords:
if keyword in line:
if not any(allowed in line for allowed in allowed_messages):
logging.debug(f"Unexpected keyword '{keyword}' in the following message: '{line}'")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging.info? or maybe logging.warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, I updated it to error actually


# let's check that logs don't contain any error or unexpected warning message
with subtests.test("Checking the log in the workbench..."):
failed_lines = 0
Copy link
Member

@jiridanek jiridanek Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have three ideas to avoid counting lines ourselves

don't really have a preference, this counter is not exactly a bad solution either

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I kept this as one test since we probably don't want different number of subtests depending on how many log messages are there 🤔 I chose to alternate this part of code based on your other suggestion.

Comment on lines 68 to 77
for keyword in blocked_keywords:
if keyword in line:
if not any(allowed in line for allowed in allowed_messages):
logging.debug(f"Unexpected keyword '{keyword}' in the following message: '{line}'")
failed_lines += 1
# Let's go on another line now as we don't care if multiple blocked keywords are present in this line
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just this small idea that saves one level of indentation and comment is no longer necessary imo

Suggested change
for keyword in blocked_keywords:
if keyword in line:
if not any(allowed in line for allowed in allowed_messages):
logging.debug(f"Unexpected keyword '{keyword}' in the following message: '{line}'")
failed_lines += 1
# Let's go on another line now as we don't care if multiple blocked keywords are present in this line
break
if any(keyword in line for keyword in blocked_keywords):
if not any(allowed in line for allowed in allowed_messages):
logging.debug(f"Unexpected keyword '{keyword}' in the following message: '{line}'")
failed_lines += 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a disadvantage as I am not able to print the keyword that caused this to fail... but yeah, I'll probably amend that this way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has a disadvantage as I am not able to print the keyword that caused this to fail...

you could run this twice maybe, when you have wrong line, do this because it's quick and easy

line = "error info warning"
blocked_keywords = ["error", "warning"]
print("failures:", {keyword: keyword in line for keyword in blocked_keywords})
>>> failures:  {'error': True, 'warning': True}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would be too verbose... let's not complicate it for now, we can always improve

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Checking the log in the workbench...] SUBFAIL tests/containers/workbenches/workbench_image_test.py::TestWorkbenchImage::test_image_entrypoint_starts[ghcr.io/opendatahub-io/notebooks/workbench-images:jupyter-trustyai-ubi9-python-3.11-889_merge_ad1bfa2899f33cd9962985d35165c6b6b3b5caf3-sysctls1] - Failed: Log message(s) (3) that violate our checks occurred during the workbench startup:
[W 2025-02-06 08:26:45.534 LabApp] Failed to instantiate the extension manager pypi. Falling back to read-only manager.
    Traceback (most recent call last):
    TypeError: AsyncClient.__init__() got an unexpected keyword argument 'proxies'

This is good output, I like it. From https://github.com/opendatahub-io/notebooks/actions/runs/13174582770/job/36771027197?pr=889#step:25:151

skip_if_not_workbench_image(image)

# Here is a list of blocked keywords we don't want to see in the log messages during the container/workbench
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put all this into a helper function (logs and subtests object can be a parameter to it) and also call it from the ipv6 only test below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is a good point, didn't think about that - updated per your suggestion now


# let's check that logs don't contain any error or unexpected warning message
with subtests.test("Checking the log in the workbench..."):
failed_lines = 0
Copy link
Member

@jiridanek jiridanek Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, maybe I'd make this

Suggested change
failed_lines = 0
failed_lines: list[str] = []

and .append to it the failed lines

why should people hunt for them in the log when they can be kept here and produced in the final failure message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, I like it - updated

@jstourac jstourac force-pushed the checkWorkbenchStartupLogs branch from 96ccd47 to bba8309 Compare February 5, 2025 13:15
@openshift-ci openshift-ci bot added size/m and removed size/m labels Feb 5, 2025
finally:
# try to grab logs regardless of whether container started or not
# TODO - do we want to enforce couple of seconds wait to be sure nothing extra suspicious is logged for the workbench???
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either that, or possibly we may even want to try to use jupyterlab a bit

every playwright test that we're going to eventually create could be checking logs like this, even, maybe

future work

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit - but i thought we were told we needed to write cypress tests (?)

Copy link
Member

@jiridanek jiridanek Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jupyterlab and codeserver upstreams use playwright

elyra and odh dashboard (and notebooks 2.00 use cypress

dunno about rstudio

most importantly we need to use what's already present

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, so I put a 3 second wait there for now... we may alternate this in the future or remove eventually

@jstourac jstourac force-pushed the checkWorkbenchStartupLogs branch from bba8309 to cff8e0d Compare February 6, 2025 06:33
@openshift-ci openshift-ci bot added size/m and removed size/m labels Feb 6, 2025
@jstourac jstourac force-pushed the checkWorkbenchStartupLogs branch from cff8e0d to bb25555 Compare February 6, 2025 06:40
@openshift-ci openshift-ci bot added size/m and removed size/m labels Feb 6, 2025
@jstourac jstourac force-pushed the checkWorkbenchStartupLogs branch from bb25555 to 8bf8ab0 Compare February 6, 2025 07:03
This adds a new check for the content of the workbench startup logs
searching for some keywords that may indicate some problem or issue.
@openshift-ci openshift-ci bot added size/m and removed size/m labels Feb 6, 2025
@jstourac jstourac force-pushed the checkWorkbenchStartupLogs branch from 8bf8ab0 to 466b468 Compare February 6, 2025 07:03
@openshift-ci openshift-ci bot added size/m and removed size/m labels Feb 6, 2025
@jstourac jstourac marked this pull request as ready for review February 6, 2025 07:05
@openshift-ci openshift-ci bot requested review from atheo89 and harshad16 February 6, 2025 07:05
@jstourac
Copy link
Member Author

jstourac commented Feb 6, 2025

Thank you for your preliminary review, guys. I addressed all comments hopefully and marked this as ready for review now.

Also, I'll wait with merging this until the https://issues.redhat.com/browse/RHOAIENG-18930 is addressed also, otherwise we would have failing tests in our CI in the meantime.

@jiridanek
Copy link
Member

/lgtm

@jstourac
Copy link
Member Author

jstourac commented Feb 6, 2025

Okay, so the fix for the Jupyter is prepared here #889. I would go with the following steps:

  1. merge this thing
  2. then rebase the RHOAIENG-18930: update Jupyterlab package to 4.2.7 #889 and follow there

@atheo89
Copy link
Member

atheo89 commented Feb 7, 2025

/lgtm

@jstourac
Copy link
Member Author

jstourac commented Feb 7, 2025

Thank you for your reviews and comments, guys!

/approve

Copy link
Contributor

openshift-ci bot commented Feb 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jstourac

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Feb 7, 2025
@jstourac
Copy link
Member Author

jstourac commented Feb 7, 2025

/override ci/prow/images
/override "build (rocm-jupyter-tensorflow-ubi9-python-3.11) / build (ubuntu-22.04)"
/override "build (rocm-jupyter-pytorch-ubi9-python-3.11) / build (ubuntu-22.04)"
/override "build (jupyter-pytorch-ubi9-python-3.11) / build (ubuntu-22.04)"
/override "build (jupyter-trustyai-ubi9-python-3.11) / build (ubuntu-22.04)"
/override "build (cuda-jupyter-tensorflow-ubi9-python-3.11) / build (ubuntu-22.04)"

Copy link
Contributor

openshift-ci bot commented Feb 7, 2025

@jstourac: Overrode contexts on behalf of jstourac: build (cuda-jupyter-tensorflow-ubi9-python-3.11) / build (ubuntu-22.04), build (jupyter-pytorch-ubi9-python-3.11) / build (ubuntu-22.04), build (jupyter-trustyai-ubi9-python-3.11) / build (ubuntu-22.04), build (rocm-jupyter-pytorch-ubi9-python-3.11) / build (ubuntu-22.04), build (rocm-jupyter-tensorflow-ubi9-python-3.11) / build (ubuntu-22.04), ci/prow/images

In response to this:

/override ci/prow/images
/override "build (rocm-jupyter-tensorflow-ubi9-python-3.11) / build (ubuntu-22.04)"
/override "build (rocm-jupyter-pytorch-ubi9-python-3.11) / build (ubuntu-22.04)"
/override "build (jupyter-pytorch-ubi9-python-3.11) / build (ubuntu-22.04)"
/override "build (jupyter-trustyai-ubi9-python-3.11) / build (ubuntu-22.04)"
/override "build (cuda-jupyter-tensorflow-ubi9-python-3.11) / build (ubuntu-22.04)"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit 8f2349b into opendatahub-io:main Feb 7, 2025
18 of 23 checks passed
@jstourac jstourac deleted the checkWorkbenchStartupLogs branch February 7, 2025 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants