Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Rework k8s_tshark functionality and apply the changes in tests #2097

Merged
merged 1 commit into from
Jul 9, 2024

Conversation

svteb
Copy link
Collaborator

@svteb svteb commented Jul 1, 2024

Description

Commit message:

  • Prior functionality was bound to fixed time of execution (120s), which introduced problems in testing (tshark session ending before the test began).
  • New functionality mainly implements infinite tshark execution along with the possibility of terminating it when deemed appropriate. This is complemented with robust error handling and termination of the tshark process on unexpected crashes during initialization. NOTE: The main tests currently do not handle states where a crash could occur elsewhere and thus a hanging tshark session can still happen (although unlikely).
  • The module is properly commented which should allow the user to get a quick understanding of its functionality.
  • The user functionality remains the same with easier-to-comprehend function names.
  • Handling of PIDs is rather problematic due to the nature of exec_by_node_bg function, which does not return the PID of the tshark process but rather the PID of the shell executing it (unverified). This is why the retrieval of PID may seem rather complicated (especially the pid_command variable). Possible solutions are listed in a comment, but these don't quite work for various reasons (globbing issues, return of incorrect PID, etc.). As for the kill -15 and kill -9 repetition, some tshark session would get stuck in a zombie state if the commands were not executed in this order.

Additional info for reviewers:

The spec tests that verify the correctness of this change are: 5g, oran, core. For some reason I cannot deploy the oran CNF properly so the testing will probably have to be left for github actions. Some of the changes that were made removed the necessity of nesting in the main workload functions (which was thus removed). The class oriented design was somewhat necessary to allow for saving the state of the capture and its future termination. This should also resolve #2072 .

As for the code, a non-trivial amount of time was spent fine-tuning the behavior around PIDs and this is the only configuration I got to behave correctly, although I admit some parts don't look all that pretty. The main problems stem from kubectl_client which is eventually called to spawn the process, since the Process.new/run functions do not return the correct PID/print output I could not really achieve this any other way.

I also discovered that the smf_upf_heartbeat task essentially tests nothing.

Issues:

Refs: #2072 #2087

How has this been tested:

  • Covered by existing integration testing
  • Added integration testing to cover
  • Verified all A/C passes
    • develop
    • master
    • tag/other branch
  • Test environment
    • Shared Packet K8s cluster
    • New Packet K8s cluster
    • Kind cluster
  • Have not tested

Types of changes:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Checklist:

Documentation

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • No updates required.

Code Review

  • Does the test handle fatal exceptions, ie. rescue block

Issue

  • Tasks in issue are checked off

Copy link
Collaborator

@kosstennbl kosstennbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As was discussed online, changes for timeout design and possible nil value of "capture" variable

src/tasks/utils/cnf_manager.cr Outdated Show resolved Hide resolved
src/tasks/utils/oran_monitor.cr Outdated Show resolved Hide resolved
src/tasks/utils/k8s_tshark.cr Outdated Show resolved Hide resolved
src/tasks/workload/5g_validator.cr Outdated Show resolved Hide resolved
@svteb svteb force-pushed the k8stshark_refactor branch from 4b0b66b to 567bf60 Compare July 4, 2024 08:34
@martin-mat martin-mat requested a review from kosstennbl July 4, 2024 10:52
@svteb svteb force-pushed the k8stshark_refactor branch from 567bf60 to 1588ea1 Compare July 9, 2024 08:17
Refs: cnti-testcatalog#2072 cnti-testcatalog#2087
- Prior functionality was bound to fixed time of execution (120s), which introduced problems
in testing (tshark session ending before the test began).
- New functionality mainly implements infinite tshark execution along with the possibility
of terminating it when deemed appropriate. This is complemented with robust error handling
and termination of the tshark process on unexpected crashes during initialization.
NOTE: The main tests currently do not handle states where a crash could occur elsewhere and
thus a hanging tshark session can still happen (although unlikely).
- The module is properly commented which should allow the user to get a quick understanding
of its functionality.
- The user functionality remains the same with easier-to-comprehend function names.
- Handling of PIDs is rather problematic due to the nature of exec_by_node_bg function, which
does not return the PID of the tshark process but rather the PID of the shell executing it
(unverified). This is why the retrieval of PID may seem rather complicated (especially the
pid_command variable). Possible solutions are listed in a comment, but these don't quite
work for various reasons (globbing issues, return of incorrect PID, etc.).
As for the kill -15 and kill -9 repetition, some tshark session would
get stuck in a zombie state if the commands were not executed in this order.

Signed-off-by: svteb <slavo.valko@tietoevry.com>
@svteb svteb force-pushed the k8stshark_refactor branch from 1588ea1 to 237fc9b Compare July 9, 2024 08:34
Copy link
Collaborator

@kosstennbl kosstennbl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Usage in CNF-Manager is different from usage in tests because of the flawed structure of sample_setup method. For the future - i think we should consider removing this O-RAN E2 connection check from CNF installation, as it seems to be too specific to be in core CNF-Testsuite functionality

@martin-mat martin-mat self-requested a review July 9, 2024 14:01
Copy link
Collaborator

@martin-mat martin-mat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Collaborator

@Smitholi67 Smitholi67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@martin-mat martin-mat merged commit 2a133f8 into cnti-testcatalog:main Jul 9, 2024
87 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Unstable 5G tests(suci_enabled) in github actions
5 participants