Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HLD] Event Driven TechSupport Invocation & CoreDump Mgmt #818

Merged
merged 72 commits into from
Nov 11, 2021

Conversation

vivekrnv
Copy link
Contributor

@vivekrnv vivekrnv commented Jul 22, 2021

Signed-off-by: Vivek Reddy Karri vkarri@nvidia.com

The high-level design document for the "Event Driven TechSupport Invocation & CoreDump Mgmt" enhancement

Repo PR title State
sonic-buildimage [Auto Techsupport] Event driven Techsupport Changes GitHub issue/pull request detail
sonic-utilities [Auto Techsupport] Event driven Techsupport Changes GitHub issue/pull request detail

state = enabled|disabled; # Enable this to make the Techsupport Invocation event driven based on core-dump generation
rate_limit_interval = 300; # Minimum Time in seconds, between two successive techsupport invocations.
Manual Invocations will be considered as well in the cooloff calculation
max_techsupport_size = 10; # A perentage value should be specified.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekreddynv typo? Did you mean percentage?

This signifies maximum Size to which /var/dump/ directory can be grown until.
The actual value in bytes is calculate based on the available space in the filesystem hosting /var/dump
When the limit is crossed, the older techsupport dumps are incrementally deleted
max_core_size = 5; # A perentage value should be specified.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekreddynv typo? Did you mean percentage?


Where `<comm>` value in the command name associated with a process. comm value of a running process can be read from `/proc/[pid]/comm` file

## 4. Schema Additions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekreddynv suggest to follow SONiC DB schema description conventions (ABNF):

  1. https://github.com/Azure/sonic-swss/blob/master/doc/swss-schema.md
  2. https://datatracker.ietf.org/doc/html/rfc5234

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

description "First Revision";
}

typedef enable-knob {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekreddynv we already have this type in SONiC:
https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-types.yang#L54

    typedef admin_status {
        type enumeration {
            enum up;
            enum down;
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel "enabled|disabled" makes better sense here over "up|down".

Copy link
Collaborator

@nazariig nazariig Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekreddynv sorry, a typo:
https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-types.yang#L119

    typedef admin_mode {
        type enumeration {
            enum enabled;
            enum disabled;
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh okay, I'll change this

leaf feature_name {
description "The name of this feature";
/* TODO: Leafref once the FEATURE YANG is added*/
type string;
Copy link
Collaborator

@nazariig nazariig Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vivekreddynv please add a key length constraint

nazariig
nazariig previously approved these changes Sep 2, 2021
@qiluo-msft qiluo-msft requested a review from ganglyu September 21, 2021 08:22
@liat-grozovik
Copy link
Collaborator

@nazariig do you have any further comments? if not can you please approve?

@liat-grozovik liat-grozovik merged commit 23545e2 into sonic-net:master Nov 11, 2021
qiluo-msft pushed a commit to sonic-net/sonic-utilities that referenced this pull request Nov 16, 2021
#### What I did

sonic-utilities changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](sonic-net/SONiC#818 )

Summary of the changes:

- Added the AUTO GEN CLI for the CFG DB tables required for this feature
- Added the coredump_gen_handler.py & techsupport_cleanup.py scripts.
- Added the UT's required for these scripts.
- Enhanced coredump-compress & generate-dump scripts
qiluo-msft pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Nov 16, 2021
#### Why I did it

Changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](sonic-net/SONiC#818 )

Requires: sonic-net/sonic-utilities#1796.
Merging in any order would be fine.

Summary of the changes:

- Added the YANG Models for the new tables introduces as a part of this feature.
- Enhanced init_cfg.json with the default config required
- Added a compile Time flag which enables/disables the config required for this feature inside the init_cfg.json
- Enhanced the supervisor-proc-exit-listener script to populate `<feature>:<critical_proc> = <comm>:<pid>` info in the STATE_DB when it observes an proc exit notification for the critical processes running inside the docker.
malletvapid23 added a commit to malletvapid23/Sonic-Utility that referenced this pull request Aug 3, 2023
#### What I did

sonic-utilities changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](sonic-net/SONiC#818 )

Summary of the changes:

- Added the AUTO GEN CLI for the CFG DB tables required for this feature
- Added the coredump_gen_handler.py & techsupport_cleanup.py scripts.
- Added the UT's required for these scripts.
- Enhanced coredump-compress & generate-dump scripts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants