diff --git a/doc/config-generic-update-rollback/SONiC_Generic_Config_Update_and_Rollback_Design.md b/doc/config-generic-update-rollback/SONiC_Generic_Config_Update_and_Rollback_Design.md index ac26a1f5c6..246b204d6b 100644 --- a/doc/config-generic-update-rollback/SONiC_Generic_Config_Update_and_Rollback_Design.md +++ b/doc/config-generic-update-rollback/SONiC_Generic_Config_Update_and_Rollback_Design.md @@ -908,19 +908,19 @@ N/A | 14 | Dynamic port breakout as described [here](https://github.com/Azure/SONiC/blob/master/doc/dynamic-port-breakout/sonic-dynamic-port-breakout-HLD.md).| | 15 | Remove an item that has a default value. | | 16 | Modifying items that rely depends on each other based on a `must` condition rather than direct connection such as `leafref` e.g. /CRM/acl_counter_high_threshold (check [here](https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-crm.yang)). | -| 17 | Updating Syslog configs. | -| 18 | Updating AAA configs. | -| 19 | Updating DHCP configs. | +| 17 | [Updating Syslog configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_syslog.py) | +| 18 | [Updating AAA configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_aaa.py) | +| 19 | [Updating DHCP configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_dhcp_relay.py) | | 20 | Updating IPv6 configs. | | 21 | Updating monitor configs (EverflowAlaysOn). | | 22 | Updating BGP speaker configs. | -| 23 | Updating BGP listener configs. | -| 24 | Updating Bounce Back Routing configs. | -| 25 | Updating control-plane ACLs (NTP, SNMP, SSH) configs. | +| 23 | [Updating BGP listener configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_bgpl.py) | +| 24 | ~~Updating Bounce Back Routing configs.~~ | +| 25 | [Updating control-plane ACLs (NTP, SNMP, SSH) configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_cacl.py) | | 26 | Updating Ethernet interfaces configs. | -| 27 | Updating VLAN interfaces configs. | -| 28 | Updating port-channel interfaces configs. | -| 29 | Updating loopback interfaces configs. | +| 27 | [Updating VLAN interfaces configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_vlan_interface.py) | +| 28 | [Updating port-channel interfaces configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_portchannel_interface.py) | +| 29 | [Updating loopback interfaces configs.](https://github.com/Azure/sonic-mgmt/blob/master/tests/generic_config_updater/test_lo_interface.py) | | 30 | Updating BGP prefix hijack configs. | | 31 | Updating QoS headroom pool and buffer pool size. | | 32 | Add/Remove Rack. | diff --git a/doc/crm/Critical-Resource-Monitoring-High-Level-Design.md b/doc/crm/Critical-Resource-Monitoring-High-Level-Design.md index 90e4c12048..8aca334897 100644 --- a/doc/crm/Critical-Resource-Monitoring-High-Level-Design.md +++ b/doc/crm/Critical-Resource-Monitoring-High-Level-Design.md @@ -104,7 +104,7 @@ Monitoring process should periodically poll SAI counters for all required resour ```" WARNING : THRESHOLD_EXCEEDED for <%> Used count free count "``` -```" NOTICE : THRESHOLD_CLEAR for <%> Used count free count "``` +```" WARNING : THRESHOLD_CLEAR for <%> Used count free count "``` ``` = ``` diff --git a/doc/event-alarm-framework/event-alarm-framework-alarm-lifecycle.png b/doc/event-alarm-framework/event-alarm-framework-alarm-lifecycle.png new file mode 100644 index 0000000000..9773603e4b Binary files /dev/null and b/doc/event-alarm-framework/event-alarm-framework-alarm-lifecycle.png differ diff --git a/doc/event-alarm-framework/event-alarm-framework-blockdiag.png b/doc/event-alarm-framework/event-alarm-framework-blockdiag.png new file mode 100644 index 0000000000..d735f9b1b3 Binary files /dev/null and b/doc/event-alarm-framework/event-alarm-framework-blockdiag.png differ diff --git a/doc/event-alarm-framework/event-alarm-framework-seqdiag.png b/doc/event-alarm-framework/event-alarm-framework-seqdiag.png new file mode 100644 index 0000000000..5473b2c63a Binary files /dev/null and b/doc/event-alarm-framework/event-alarm-framework-seqdiag.png differ diff --git a/doc/event-alarm-framework/event-alarm-framework.md b/doc/event-alarm-framework/event-alarm-framework.md new file mode 100644 index 0000000000..eb475cb6fd --- /dev/null +++ b/doc/event-alarm-framework/event-alarm-framework.md @@ -0,0 +1,1300 @@ +# Feature Name +Event and Alarm Framework +# High Level Design Document +#### Rev 0.2 + +# Table of Contents + * [Revision](#revision) + * [About This Manual](#about-this-manual) + * [Scope](#scope) + * [1 Feature Overview](#1-feature-overview) + * [1.1 Requirements](#11-requirements) + * [1.1.1 Functional Requirements](#111-functional-requirements) + * [1.2 Design Overview](#12-design-overview) + * [1.2.1 Basic Approach](#121-basic-approach) + * [1.2.2 Container](#122-container) + * [2 Functionality](#2-functionality) + * [2.1 Target Deployment Use Cases](#21-target-deployment-user-cases) + * [2.2 Functional Description](#22-functional-description) + * [3 Design](#3-description) + * [3.1 Overview](#31-overview) + * [3.1.1 Event Producers](#311-event-producers) + * [3.1.1.2 Development Process](#3112-development-process) + * [3.1.2 Event Consumer](#312-event-consumer) + * [3.1.2.1 Severity](#3121-severity) + * [3.1.2.2 Sequence-ID](#3122-sequence-id) + * [3.1.3 Alarm Consumer](#313-alarm-consumer) + * [3.1.4 Event Receivers](#314-event-receivers) + * [3.1.4.1 syslog](#3141-syslog) + * [3.1.4.2 REST](#3142-rest) + * [3.1.4.3 gNMI](#3143-gnmi) + * [3.1.4.4 System LED](#3144-system-led) + * [3.1.4.5 Event/Alarm flooding](#3145-event/alarm-flooding) + * [3.1.4.6 Eventd continuous restart](#3146-event-continuous-restart) + * [3.1.5 Event Profile](#315-event-profile) + * [3.1.6 CLI](#316-cli) + * [3.1.7 Event Table and Alarm Table](#317-event-table-and-alarm-table) + * [3.1.8 Pull Model](#318-pull-model) + * [3.1.9 Supporting third party containers](#319-supporting-third-party-containers) + * [3.2 DB Changes](#32-db-changes) + * [3.2.1 EVENT DB](#321-event-db) + * [3.3 User Interface](#33-user-interface) + * [3.3.1 Data Models](#331-data-models) + * [3.3.2 CLI](#332-cli) + * [3.3.2.1 Exec Commands](#3321-exec-commands) + * [3.3.2.2 Configuration Commands](#3322-configuration-commands) + * [3.3.2.3 Show Commands](#3323-show-commands) + * [3.3.3 REST API Support](#333-rest-api-support) + * [4 Flow Diagrams](#4-flow-diagrams) + * [5 Warm Boot Support](#5-warm-boot-support) + * [5.1 Application warm boot](#51-application-warm-boot) + * [5.2 eventd warm boot](#52-eventd-warm-boot) + * [6 Scalability](#6-scalability) + * [7 Showtech Support](#7-showtech-support) + * [8 Unit Test](#8-unit-test) + + +# Revision +| Rev | Date | Author | Change Description | +|:---:|:-----------:|:------------------:|----------------------------------- | +| 0.1 | 03/20/2021 | Srinadh Penugonda | Initial Version | +| 0.2 | 04/30/2021 | Srinadh Penugonda | Updated with comments from HLD review | +| 0.3 | 04/18/2022 | Bhavesh | Address review comments | + +# About this Manual +This document provides general information on the implementation and functionality of Event and Alarm Framework in SONiC. + +Note: Wherever CLI is specified, it is the CLISH cli that is referred - SONiC native (CLICK) CLI is not updated for this feature. + +# Scope +This document describes the high-level design of Event and Alarm Framework. +It is not in the scope of the framework to update ANY of the applications to raise events and alarms. + +# 1 Feature Overview + +The Event and Alarm Framework feature provides a centralized framework for applications in SONiC to raise notifications and store them for north bound interfaces to listen and fetch to monitor the device. + +Events and Alarms are notifications to indicate a change in the state of the system that operator may be interested in. +Such a change has an important metric called *severity* to indicate how critical it is to the health of the system. + +* Events + + Events are "one shot" notifications to indicate an abnormal/important situation. + + User logging in, authentication failure, configuration changed notification are all examples of events. + +* Alarms + + Alarms are notifications raised for conditions that could be cleared by correcting or removal of such conditions. + + Out of memory, temperature crossing a threshold, and so on, are examples of conditions when the alarms are raised. + Such conditions are dynamic: a faulty software/hardware component encounters the above such condition and **may** come out of that situation when the condition is resolved. + + Events are sent as the condition progresses through being raised and cleared in addition to operator acknowledging/unacknowledging it. + So, these events have a field called *action*: RAISE, CLEAR or ACKNOWLEDGE/UNACKNOWLEDGE. + + Each of such events for an alarm is characterized by "action" in addition to "severity". + + An application *raises* an alarm when it encounters a faulty condition by sending an event with action: *RAISE*. + After the application recovers from the condition, that alarm is *cleared* by sending an event with action: *CLEAR*. + An operator could *ACKNOWLEDGE/UNACKNOWLEDGE* an alarm. This indicates that the operator is aware of the faulty condition. + + The set of alarms and their severities are an indication to health of various applications of the system and System LED can be deduced from alarms. + An acknowledged alarm means that operator is aware of the condition so, acknowledged alarm will be taken out of consideration. + +Both events and alarms get recorded in a new DB called EVENT DB in a new redis instance. + +1. Event Table + + All events get recorded in the event table, by name, "EVENT". EVENT table contains history of all events generated by the system. + This table is persisted across system restarts of any kind, including restore to factory defaults and SW upgrades and downgrades. + +2. Alarm Table + + All events with an action field of *RAISE* get recorded in a table, by name, "ALARM" in addition to getting recorded in Event Table ( only events corresponding to an alarm has action field ). + When an application that raised the alarm clears it ( by sending an event with action *CLEAR* ), the alarm record is removed from ALARM table. + A user acknowledging a particular alarm will NOT remove that alarm record from this table; only when application clears it, the alarm is removed from ALARM table. + + In effect, ALARM table contains outstanding alarms that need to be cleared by those applications who raised them. + This table is NOT persisted and its contents are cleared with a reload. + +In summary, the framework provides both current and historical event status of software and physical entities of the system through ALARM and EVENT tables. + +In addition to the above tables, the framework maintains various statisitcs. + +1. Event Statistics Table + + Statistics on number of events are maintained in EVENT_STATS table. + +2. Alarm Statistics Table + + Statistics on number of alarms per severity are maintained in ALARM_STATS table. + ALARM_STATS table is not persistent as conditions that triggers an alarm gets cleared on bootup. + When application raises an alarm, the counter corresponding to that alarm's severity is increased by 1. + When the alarm is cleared or acknowledged, the corresponding severity counter will be reduced by 1. + This table categorizes "active" alarms per severity. + +As mentioned above, each event has an important characteristic: severity. SONiC uses following severities as defined in opeconfig alarm yang. + +- CRITICAL : Requires immediate action. A critical event may trigger if one or more hardware components fail, or one or more hardware components exceed temperature thresholds. + ( maps to log-alert ) +- MAJOR : Requires escalation or notification. For example, a major alarm may trigger if an interface failure occurs, such as a port channel being down. + ( maps to log-critical ) +- MINOR : If left unchecked, might cause system service interruption or performance degradation. An alarm with minor severity requires monitoring or maintenance. + ( maps to log-error ) +- WARNING : It may or may not result in an error condition. + ( maps to log-warning ) +- INFORMATIONAL : Does not impact performance. NOT applicable to alarms. + ( maps to log-notice ) + +The following describes how an alarm transforms and how various tables are updated. +![Alarm Life Cycle](event-alarm-framework-alarm-lifecycle.png) + +By default every event will have a severity assigned by the component. The framework provides Event Profiles to customize severity of an event and also disable an event. + +Template for event profile is as below: +``` +{ + "events":[ + { + "name" : , + "revision" : , + "severity" : , + "enable" : , + "message" : + } + ] +} +``` +Event Profiles only contains declarations of events and their characteristics. There has to be an application to raise these events using eventnotify API. + +The framework maintains default event profile at /etc/evprofile/default.json. +Operator can download default event profile to a remote host. +This downloaded file can be modified by changing the severity or enable flag of event(s). +This modified file can then be uploaded to the device to /etc/evprofile/. +Operator can select any of these custom event profiles to change default properties of events. +The selected profile is persistent across reboots and will be in effect until operator selects either default or another custom profile. + +In addition to storing events in DB, framework forwards log messages corresponding to all the events to syslog. +Syslog message displays the type (ALARM or EVENT), action (RAISE, CLEAR, ACKNOWLEDGE or UNACKNOWLEDGE) - when the message corresponds to an event of an alarm, name of the event and detailed message. + +gNMI clients can subscribe to receive events as they are raised. Subscribing through REST is being evaluated. + +CLI and REST/gNMI clients can query either table with filters - based on severity, delta based on timestamp, sequence-id etc., + +Application owners need to identify various conditions that would be of interest to the operator and use the framework to raise events/alarms. + +## 1.1 Requirements + + +### 1.1.1 Functional Requirements + +| ID | Requirement | Comment | +| :--- | :---- | :--- | +| 1 | Provide API via library for apps to publish events | | +| 2 | Provide API via library for apps to publish alarms | | +| 3 | Event Infra to write formatted syslog messages corresponding to all events to Syslog. | | +| 4 | Event Infra to persist all events and alarms in DB. | | +| 5 | Event Infra to read Event profile ( severity and enable/disable flag ) from a json file. | | +| 6 | Event Infra to read Event table parameters (size and # of days) from a config file. | | +| 7 | NBI interface (gNMI and REST) and CLI | | +| 7.1 | Events | | +| 7.1.1 | Openconfig interface to pull event information. | | +| 7.1.2 | Openconfig interface to pull event summary information. | | +| | Event summary information to contain cumulative counters for: | | +| | - Raised-count (events) | | +| 7.1.3 | Openconfig interface to pull events using following filters | | +| | - ALL ( pull all events) | | +| | - Severity. | | +| | - Recent records (eg., last 5 minutes, one hour, one day). | | +| | - Records between two timestamps, one timestamp and end, and beginning and a timestamp. | | +| | - All records between two Sequence Numbers (incl begin and end) | | +| 7.2 | Alarms | | +| 7.2.1 | Openconfig interface to pull alarm information. | | +| 7.2.2 | Openconfig interface to pull alarm summary information. | | +| | Counters for Total, Critical, Major, Minor, Warning, Acknowledged | | +| 7.2.3 | Openconfig interface to pull alarms using following filters | | +| | - All (pull all events) | | +| | - Severity. | | +| | - Recent alarms (eg., last 5 minutes, one hour, one day). | | +| | - Records between two timestamps, one timestamp and end, and beginning and a timestamp. | | +| | - All records between two Sequence Numbers (incl end and begin) | | +| 7.2.4 | Openconfig interface to acknowledge an alarm. | | +| 8 | CLI commands | | +| 8.1 | show alarm [ detail \| summary \| severity \| timestamp \| recent <5min\|1hr\|1day> \| sequence-number \| all] | | +| 8.2 | show event [ detail \| summary \| severity \| timestamp \| recent <5min\|1hr\|1day> \| sequence-number ] | | +| 8.3 | show event profile | | +| 8.4 | alarm acknowledge | | +| 8.5 | logging server [ log \| event ] | default is 'log' | +| 8.6 | event profile [ default \| name-of-file ] | | +| 9 | gNMI subscription | | +| 9.1 | Subscribe to openconfig Event container and Alarm container. All events and alarms published to gNMI subscribed clients. | | +| 10 | Clear all events | | +| 11 | Any change in open source should be aligned and upstream. | | + +## 1.2 Design Overview + +![Block Diagram](event-alarm-framework-blockdiag.png) + +### 1.2.1 Basic Approach +The feature involves new development. +Applications act as producers by writing to a table with the help of event notify library. +Eventd reads new record in the table and processes it: +It saves the entry in event table; if the event has an action and if it is *RAISE*, record gets added to alarm table, severity counter in ALARM_STATS is increased. +If the received event action is *CLEAR*, record in the ALARM table is removed and severity counter in ALARM_STATS of that alarm is reduced by 1. +If eventd receives an event with action *ACKNOWLEDGE* from mgmt-framework, severity counter in ALARM_STATS is reduced by 1. +If eventd receives an event with action *UNACKNOWLEDGE* from mgmt-framework, severity counter in ALARM_STATS is increased by 1. +Eventd then informs logging API to format the log message and send the message to syslog. + +Any application like pmon can subscribe to tables like ALARM_STATS to act accordingly. + +### 1.2.2 Container +A new container by name, eventd, is created to hold event consumer logic. + +# 2 Functionality +## 2.1 Target Deployment Use Cases + +The framework assigns an unique sequence number to each of the events sent by applications. + +In addition, the framework provides the following key management services: + +- Push model: Event/Alarm information to remote syslog hosts and subscribed gNMI clients +- Pull model: Event/Alarm information from CLI, REST/gNMI interfaces +- Ability to change severity of events, turn off a particular event +- Ability to acknowledge an alarm + +## 2.2 Functional Description +Event Management Framework allows applications to store "state" of the system for user to query through various north bound interfaces. + +# 3 Design +## 3.1 Overview +There are three players in the event framework. Producers, which raises events; a consumer to receive and process them as they are raised and a set of receivers one for each NBI type. + +Applications act as producers of events. + +Event consumer class in eventd container receives and processes the received event. +Event consumer manages received events, updates event table, alarm table, event_stats table and alarm_stats tables and invokes logging API, which constructs message and sends it over to syslog. + +Operator can chose to change properties of events with the help of event profile. Default +event profile is available at */etc/evprofile/default.json*. User can download the default event profile, +modify and upload it back to the switch to apply it. + +Through event profile, user can change severity of any event and also can enable/disable a event. + +Through CLI, REST or gNMI, event table and alarm table can be retrieved using various filters. + +### 3.1.1 Event Producers +Application that need to raise an event, need to use event notifiy API ( LOG_EVENT ). +This API is part of *libeventnotify* library that applications need to link. + +For one-shot events, applications need to provide event-id (name of the event), source, dynamic message, and event action set to NOTIFY. + +For alarms, applications need to provide event-id (name of the event), source, dynamic message, and event action (RAISE_ALARM / CLEAR_ALARM / ACK_ALARM /UNACK_ALARM). +The ACK_ALARM/UNACK_ALARM action types are used only by mgmt-framework to provide the functionality to acknowledge/unacknowledge the alarms through NBI. + +Eventd maintains a json file of events and alarms at sonic-eventd/etc/evprofile/default.json. This is the default event profile that gets installed on the device at /etc/evprofile/default.json. +Developers of new events or alarms need to update this file by declaring name and other characteristics - severity, enable flag and static message that gets appended with dynamic message. + +``` +{ + "__README__" : "This is default map of events that eventd uses. Developer can modify this file and send + SIGINT to eventd to make it read and use the updated file. Alternatively developer can test + the new event by adding it to a custom event profile and use 'event profile ' command + to apply that profile without sending SIGINT to eventd. Developer need to commit default.json file + with the new event after testing it out. + Supported severities are: CRITICAL, MAJOR, MINOR, WARNING and INFORMATIONAL. + Supported enable flag values are: true and false.", + "events":[ + { + "name" : "CUSTOM_EVPROFILE_CHANGE", + "revision" : 0, + "severity" : "INFORMATIONAL", + "enable" : "true", + "message" : "Custom Event Profile is applied." + }, + { + "name": "TEMPERATURE_EXCEEDED", + "revision" : 0, + "severity": "CRITICAL", + "enable": "true" + "message" : "Temperature threshold is 75 degrees." + } + ] +} +``` +The format of event notify API is: + +definition: +``` + LOG_EVENT(name, source, action, MSG, ...) +``` +- name is name of the event +- source is the object that is generating this event +- action is either NOTIFY, RAISE_ALARM, CLEAR_ALARM, ACK_ALARM or UNACK_ALARM +- MSG can be json string. If json string, it is rendered as is in the syslog. + + +Usage: +For one-shot events: +``` + LOG_EVENT(CUSTOM_EVPROFILE_CHANGE, profile_name.c_str(), NOTIFY, "New event profile is %s", profile_name.c_str()); +``` + +For alarms: +``` + if (temperature >= THRESHOLD) { + LOG_EVENT(TEMPERATURE_EXCEEDED, sensor_name_p, RAISE_ALARM, "Temperature for sensor %s is %d degrees", sensor_name_p, current_temp); + } else { + LOG_EVENT(TEMPERATURE_EXCEEDED, sensor_name_p, CLEAR_ALARM, "Temperature for the sensor %s is %d degrees ", sensor_name_p, current_temp); + } +``` +#### 3.1.1.2 Development Process + +Here is a typical developement process to link eventnotify library to a component and be able to send new events/alarms: + +a. Update buildimage/rules/*app*.mk + + Add $(LIBEVENTNOTIFY_DEV) to compile dependency. + + Add $(LIBEVENTNOTIFY) to runtime dependency. + +``` + Ex: For rules/tam.mk, + + $(SONIC_TAM)_DEPENDS += $(LIBEVENTNOTIFY_DEV) + $(SONIC_TAM)_RDEPENDS += $(LIBEVENTNOTIFY) +``` + +b. Update Makefile.am of the app to link to event notify library. +``` + Ex: To let tammgr use event notify API, update src/sonic-tam/tammgr/Makefile.am as below: + + tammgrd_LDADD += -leventnotify +``` +c. Declare the name of new event/alarm along with revision, severity, enable flag and static message in sonic-eventd/etc/evprofile/default.json + +d. In the source file where event is to be raised, include eventnotify.h and invoke LOG_EVENT with action as NOTIFY/RAISE_ALARM/CLEAR_ALARM (ACK_ALARM/UNACK_ALARM are used by mgmt-framework to allow users to acknowledge/unacknowledge alarms). + +The event notifier takes the event properties, packs a field value tuple and writes to a table, by name, EVENTPUBSUB. + +The EVENTPUBSUB table uses event-id and a sequence-id generated locally by event notifier as the key so that there wont be any conflicts across multiple applications trying to write to this table. + +### 3.1.2 Event Consumer +The event consumer is a class in sonic-eventd container that processes the incoming record. + +On intitialization, event consumer reads */etc/evprofile/default.json* and builds an internal map of events, called *static_event_map*. +It then verifies if there was a custom event profile configured and merges its contents to static_event_map built from default event profile. +It then reads from EVENTPUBSUB table. This table contains records that are published by applications and waiting to be read by eventd. +Whenever there is a new record, event consumer reads the record, processes and deletes it. + +On reading the field value tuple, using the event-id in the record, event consumer fetches static information from *static_event_map*. +As mentioned above, static information contains severity, static message and event enable flag. +If the enable flag is set to false, event consumer ignores the event by logging a debug message. +If the flag is set to true, it continues to process the event as follows: +- Generate new sequence-id for the event +- Write the event to Event Table +- It verifies if the event corresponds to an alarm - by checking the *action* field. If so, alarm consumer API is invoked for the event for further processing. + - If action is RAISE_ALARM, add the record to ALARM table + - If action is CLEAR_ALARM, remove the entry from ALARM table + - If action is ACK_ALARM, update *acknowledged* flag of the corresponding raised entry to true in ALARM table and stores timestamp to *acknowledge_time*. + - If action is UNACK_ALARM, update *acknowledged* flag of the corresponding raised entry to false in ALARM table and stores timestamp to *acknowledge_time*. + - Event and Alarm Statistics tables are updated +- Invoke logging API to send a formatted message to syslog + +#### 3.1.2.1 Severity +Supported event severities: CRITICAL, MAJOR, MINOR, WARNING and INFORMATIONAL as defined opeconfig alarm yang. +The corresponding syslog severities are: log-alert, log-crit, log-error, log-warning and log-notice respectively. +Severity INFORMATIONAL is not applicable to alarms. + +#### 3.1.2.2 Sequence-ID +Every new event should have a unique sequential ID. The sequence-id is of the format <32 bit time_t><5 digit running sequence 00000 to 99999>. These semantics allows applications to layout the logs chronologically. + +#### 3.1.2.3 Revision +Every event/alarm defined in the profile must have a revision specified as a numerical. If not given, the default revision '0' is assigned to the event/alarm. This revision is to be incremented if the alarm parameters are updated. +For e.g., if TEMPERATURE_EXCEEDED alarm threshold is changed from 65 from original 70, the revision is updated in default.json keeping the name unchanged. +This will allow to redefine the event on an image upgrade. Clients can distinguish events from different sources running different releases. + +### 3.1.3 Alarm Consumer +The alarm consume method on receiving the event record, verifies the event action. If it is RAISE_ALARM, it adds the record to Alarm Table. +The counter in ALARM_STATS corresponding to the severity of the incoming alarm is increased by 1. + +Eventd maintains a lookup map of *sequence-id* and pair of *event-id* and *resource* fields. +An entry for the newly received event is added to this look up map. + +- If the action is CLEAR_ALARM, it removes the previous record of the raised alarm using above lookup map. + The counter in ALARM_STATS corresponding to the severity of the updated alarm is reduced by 1. + +- If the action is ACK_ALARM, alarm consumer finds the raised record of the alarm in the ALARM table using the above lookup map and updates *acknowledged* flag to true. The *acknowledge-time* is updated with the timestamp of ack event. + ALARM_STATS is updated by reducing the corresponding severity counter by 1. + +- If the action is UNACK_ALARM, alarm consumer finds the raised record of the alarm in the ALARM table using the above lookup map and updates *acknowledged* flag to false. The *acknowledge-time* is updated with the timestamp of unack event. + ALARM_STATS is updated by increasing the corresponding severity counter by 1. + +pmon can use ALARM_STATS to update system LED based on severities of outstanding alarms: +``` + Red if any outstanding critical/major alarms, else Yellow if any minor/warning alarms, else Green. +``` +An outstanding alarm is an alarm that is either not cleared or not acknowledged by the user yet. + +The following illustrates how ALARM table is updated as alarms goes through their life cycle and how can an application use it. +Example here is pmon using ALARM_STATS table to control system LED. + +| alarm | severity | acknowledged | +|:-----:|:----------:|:-------------:| +| | | | +| | | | + +Alarm table is empty. All counters in ALARM_STATS is 0. System LED is Green. + +| alarm | severity | acknowledged | +|:-----:|:----------:|:------------:| +| ALM-1 | CRITICAL | | +| ALM-2 | MINOR | | + +Alarm table now has two alarms. One with *CRITICAL* and other with *MINOR*. ALARM_STATS is updated as: Critical as 1 and Minor as 1. As There is atleast one alarm with *critical/major* severity, system LED is Red. + +| alarm | severity | acknowledged | +|:-----:|:----------:|:------------:| +| ALM-2 | MINOR | | + +The *CRITICAL* alarm is cleared by the application, so alarm consumer removes it from ALARM table, ALARM_STATS is updated as: Critical as 0 and Minor as 1. As there is at least one *minor/warning* alarms in the table, system LED is Amber. + +| alarm | severity | acknowledged | +|:-----:|:----------:|:------------:| +| ALM-2 | MINOR | | +| ALM-9 | MAJOR | | + +Now there is an alarm with *MAJOR* severity. ALARM_STATS now reads as: Major as 1 and Minor as 1. So, system LED is Red. + +| alarm | severity | acknowledged | +|:-----:|:----------:|:------------:| +| ALM-2 | MINOR | | +| ALM-9 | MAJOR | true | + +The *MAJOR* alarm is acknowledged by user, alarm consumer sets *acknolwedged* flag to true and reduces Major counter in ALARM_STATS by 1, ALARM_STATS now reads as: Major 0 and Minor 1. This way, acknowledged major alarm has no effect on system LED. There are no other *CRITICAL/MAJOR* alarms. There however, exists an alarm with *MINOR/WARNING* severity. System LED is Amber. + +| alarm | severity | acknowledged | +|:-----:|:----------:|:------------:| +| ALM-2 | MINOR | true | +| ALM-9 | MAJOR | true | + +The *MINOR* alarm is also acknowledged by user. ALARM_STATS reads: Major as 0, Minor as 0. So it is also taken out of consideration for system LED. System LED is Green. + +| alarm | severity | acknowledged | +|:-----:|:----------:|:------------:| +| ALM-2 | MINOR | true | +| ALM-9 | MAJOR | false | + +The *MAJOR* alarm is also unacknowledged by user. ALARM_STATS reads: Major as 1, Minor as 0. So it is now considered for system LED. System LED becomes Red. + +### 3.1.4 Event Receivers +Supported NBIs are: syslog, REST and gNMI. + +#### 3.1.4.1 syslog +Logging API contains logic to take the event record, augment it with any static information, format the message and +send it to syslog. +``` + if (ev_act.empty()) { + const char LOG_FORMAT[] = "[%s], %%%s %s. %s"; + // event Type + // Event Name + // Static Desc + // Dynamic Desc + + // raise a syslog message + syslog(LOG_MAKEPRI(ev_sev, SYSLOG_FACILITY), LOG_FORMAT, + ev_type.c_str(), + ev_id.c_str(), ev_msg.c_str(), ev_static_msg.c_str()); + } else { + const char LOG_FORMAT[] = "[%s] (%s), %%%s %s. %s"; + // event Type + // event action + // Event Name + // Static Desc + // Dynamic Desc + // raise a syslog message + syslog(LOG_MAKEPRI(ev_sev, SYSLOG_FACILITY), LOG_FORMAT, + ev_type.c_str(), ev_act.c_str(), + ev_id.c_str(), ev_msg.c_str(), ev_static_msg.c_str()); + } +``` +An example of syslog message generated for an event raised when user selects a custom event profile. +``` +May 19 21:22:07.122786 2021 sonic WARNING eventd#eventd[2419]: [EVENT], %CUSTOM_EVPROFILE_CHANGE : handle_custom_evprofile: Custom Event Profile myprofile.json is applied.. Custom Event Profile is selected by user. +``` +Syslog message for an alarm raised by a sensor: +``` +May 19 21:42:14.373410 2021 sonic ALERT eventd#eventd[2453]: [ALARM] (RAISE), %TEMPERATURE_EXCEEDED : temperatureCrossedThreshold: Current temperature of sensor/2 is 76 degrees. Temperature threshold is 75 degrees. +``` +Syslog message when alarm is cleared is as follows: +``` +May 19 21:46:34.373693 2021 sonic ALERT eventd#eventd[2453]: [ALARM] (CLEAR), %TEMPERATURE_EXCEEDED : temperatureCrossedThreshold: Current temperature of sensor/2 is 70 degrees. Temperature threshold is 75 degrees. +``` +Syslog message when alarm with id=4 is acknowledged is as follows: +``` +May 19 21:48:05.870530 2021 sonic ALERT eventd#eventd[2453]: [ALARM] (ACKNOWLEDGE), Alarm id 4 ACKNOWLEDGE. +``` + +Syslog message when alarm with id=4 is unacknowledged is as follows: +``` +May 19 21:53:24.490545 2021 sonic ALERT eventd#eventd[2453]: [ALARM] (UNACKNOWLEDGE), Alarm id 4 UNACKNOWLEDGE. +``` +Operator can configure specifc syslog host to receive either syslog messages corresponding to events or general log messages. +Through CLI, operator can chose 'logging server [log|event]' command. +When operator configures a host with 'event' type, it receives *only* log messages corresponding to events. +Support for VRF/source-interface/UDP port are all are applicable for 'event' type. + +#### 3.1.4.2 REST +Subcribing through REST to receive event notifications is currently being evaluated. + +#### 3.1.4.3 gNMI +gNMI clients can subscribe to receive event notifications. Subscribed gNMI clients receive event fields as in the DB and +there is no customization of these fileds similar to syslog messages. + +TODO: add definitions of protobuf spec + +#### 3.1.4.4 System LED +The original requirement was to change LED based on severities of the events. But on most of the platforms the system/power/fan LEDs are managed by the BMC. +BMC (baseboard management controller) is an embedded system that manages various platform elements like fan, PSU, temperature sensors. +There is an API that can be invoked to control LED, but not all platforms will support that API if they are fully controlled by the BMC. +So, on certain platforms, system LED could not represent events on the system. + +Another issue is: Currently pmon controls LED, and as eventd now tries to change the very same LED, which leads to conflicts. +A mechanism must exist for one of these to be master, which, in this case, is pmon. + +The proposed solution is to have pmon use ALAMR_STATS counters in conjunction with existing logic to update system LED. + +#### 3.1.4.5 Event/Alarm flooding +There are scenarios when system enters a loop of a fault condition that makes application trigger events continuously. To avoid such +instances flood the EVENT or ALARM tables, eventd maintains a cache of last event/alarm. Every new event/alarm is compared against this cache entry +to make sure it is not a flood. If it is found to be same event/alarm, the newly raised entry will be silently discarded. + +#### 3.1.4.6 Eventd continuous restart +Under the scenarios when eventd runs into an issue and restarts continuously, applications might keep writing to the eventpubsub table. As consumer - eventd - is not able to remove events from the pusbsub table, eventpusbub table could grow forever as applications keep rising events/alarms. +One way to fix is to have the system monitor daemon to periodically (very high polling interval) to check the number of keys in the table and if it exceeds a number, delete all the entries. When system monitor daemon does this, it logs a syslog message. + +### 3.1.5 Event Profile +The Event profile contains mapping between event-id and severity of the event, enable flag. +Through event profile, operator can change severity of a particular event. And can also enable/disable +a particular event. + +The default profile exists at */etc/evprofile/default.json* +By default, every event is enabled. +The severity of event is decided by developer while adding the event. +``` +{ + "__README__" : "This is default map of events that eventd uses. Developer can modify this file and send + SIGINT to eventd to make it read and use the updated file. Alternatively developer can test + the new event by adding it to a custom event profile and use 'event profile ' command + to apply that profile without sending SIGINT to eventd. Developer need to commit default.json file + with the new event after testing it out. + Supported severities are: CRITICAL, MAJOR, MINOR, WARNING and INFORMATIONAL. + Supported enable flag values are: true and false.", + "events":[ + { + "name" : "CUSTOM_EVPROFILE_CHANGE", + "revision" : 0, + "severity" : "INFORMATIONAL", + "enable" : "true", + "message" : "Custom Event Profile is applied." + }, + { + "name": "TEMPERATURE_EXCEEDED", + "revision" : 0, + "severity": "CRITICAL", + "enable": "true" + "message" : "Temperature threshold is 75 degrees." + } + ] +} +``` +User can download the default event profile to a remote host. User can modify characteristics of +some/all events in the profile and can upload it back to the switch and place the file at /etc/evprofile/. + +The uploaded profile will be called custom event profile. + +An example of custom event profile is as below. +With this particular custom event profile, user wants to +- change severity of CUSTOM_EVPROFILE_CHANGE event (severity changed from INFORMATIONAL to MAJOR) +- suppress the TEMPERATURE_EXCEEDED alarm (enable flag is changed from true to false) +- introduce new alarm by name DUMMY_ALARM (there should be an application to raise/clear this new alarm). +``` +{ + "events": [ + { + "name" : "CUSTOM_EVPROFILE_CHANGE", + "revision" : 0, + "severity" : "MAJOR", + "enable" : "true", + }, + { + "name": "TEMPERATURE_EXCEEDED", + "revision" : 0, + "severity": "CRITICAL", + "enable": "false" + }, + { + "name" : "DUMMY_ALARM", + "revision": 0 + "severity" : "WARNING", + "enable" : "true", + } + ] +} +``` + +User can have multiple custom profiles and can select any of the profiles under /etc/evprofile/ using 'event profile' command. + +The framework will sanity check the user selected profile and merges it map of events *static_event_map* maintained by eventd. + +After a successful sanity check, the framework generates an event indicating that a new profile is in effect. + +If there are any outstanding alarms in the alarm table, the framework removes those records for which enable is set to false in the new profile. +Severity counters in ALARM_STATS are reduced accordingly. + +Eventd starts using the merged map of characteristics for the all the newly generated events. A CUSTOM_EVPROFILE_CHANGE event is generated. + +The event profile is upgrade and downgrade compatible by accepting only those attributes that are *known* to eventd. +All the other attributes will remain to their default values. + +Sanity check rejects the profile if attributes contains values that are not known to eventd. + +Config Migration hooks will be used to persist the current active profile across an upgrade. + +The profile can also be applied through ztp. + +### 3.1.6 CLI +The show CLI require many filters with range specifiers. +Various filters are supported using RPC. + +e.g. +``` +rpc getEventBySeqeuenceId{ +input { + from sequence-id; + to sequence-id; + } +output { + list event-table-entries; +} +``` + +The rpc callback needs to access DB with the given set of sequence ids. + +The gNMI server (gnoi_client.go, gnoi.go, sonic_proto, transl_utils.go) need to be extended to support the RPC to support similar operations for gNMI. + +### 3.1.7 Event Table and Alarm Table +The Event Table (EVENT) and Alarm List Table (ALARM) stored in EVENT_DB. +The size of Event Table is 40k records or 30 days worth of events which ever hits earlier. +A manifest file will be created with parameters to specify the number and number of days limits for +eventd to read and enforce them. + + +``` +root@sonic:/etc# cat eventd.json +{ + "config" : { + "no-of-records": 40000, + "no-of-days": 30 + } +} +``` +'no-of-records' indicates maximum number of records EVENT table can hold. The range is 1-40000. +'no-of-days' indicates maximum number of days an event can exist in the EVENT table. The range is 1-30. + +When either of the limit is reached, the framework wraps around the table by discarding older records. + +User can send SIGINT to eventd process to force read and apply the manifest limits. + +The EVENTPUBSUB table will be periodically monitored and flushed based of a pre-defined table limit. Based on discussions this can be plugged into existing system jobs. + +An example of an event in EVENT table. +``` +EVENT Table: +============================== + +Key : id + +id : Unique sequential ID generated by the system for every event {uint64} +type-id : Name of the event generated {string} +text : Dynamic message describing the cause for the event {string} +time-created : Time stamp at which the event is generated {uint64} +action : Indicates action of the event; for one-shot events, it is empty. For alarms it could be raise, clear or acknowledge {enum} +resource : Object which generated the event {string} +severity : Severity of the event {string} +revision : Revision of the event {uint64} + + +127.0.0.1:6379[6]> hgetall "EVENT|1" + 1) "text" + 2) "handle_custom_evprofile: Custom Event Profile x.json is applied." + 3) "type-id" + 4) "CUSTOM_EVPROFILE_CHANGE" + 5) "id" + 6) "1" + 7) "time-created" + 8) "1621459327118629520" + 9) "resource" +10) "/etc/evprofile/x.json" +11) "severity" +12) "WARNING" +13) "Revision" +14)"0" +``` + +Schema for EVENT_STATS table is as follows: +``` +EVENT_STATS Table: +============================== + +Key : id + +id : key {state} +events : Total events raised {uint64} +raised : Total alarms raised {uint64} +cleared : Total alarms cleared {uint64} +acked : Total alarms acknowledged {uint64} + +127.0.0.1:6379[6]> hgetall "EVENT_STATS|state" +1) "events" +2) "1" +3) "raised" +4) "0" +5) "cleared" +6) "0" +7) "acked" +8) "0" +127.0.0.1:6379[6]> +``` +Alarm Table will not have any limits as it only contains the snapshot of the alarms during the current run. + +Contents of an alarm record. In this case, the alarm was raised temperature crossed a threshold. +``` +ALARM Table: +============================== + +Key : id + +id : Unique sequential ID generated by the system for every event {uint64} +revision : Revision of the alarm {uint64} +type-id : Name of the event generated {string} +text : Dynamic message describing the cause for the event {string} +time-created : Time stamp at which the event is generated {uint64} +acknowledged : Indicates if alarm has been acknowledged {boolean} +resource : Object which generated the event {string} +severity : Severity of the event {string} +acknowledged : Indicates when alarm has been acknowledged/unacknowledged {uint64} + + +127.0.0.1:6379[6]> hgetall "ALARM|2" + 1) "type-id" + 2) "TEMPERATURE_EXCEEDED" + 3) "text" + 4) "temperatureCrossedThreshold: Current temperature for sensor/2 is 76 degrees" + 5) "action" + 6) "RAISE" + 7) "resource" + 8) "sensor/2" + 9) "time-created" +10) "1621460371062299951" +11) "severity" +12) "CRITICAL" +13) "id" +14) "2" +15) "acknowledged" +16) "false" +17) "revision" +18) "0" +``` + +Schema for ALARM_STATS table is as below. When an alarm of particular severity is cleared, +the corresponding severity counter is decremented. +``` +ALARM_STATS Table: +============================== + +Key : id + +id : key {state} +alarms : Number of active alarms {uint64} +critical : Number of alarms of severity 'critical' {uint64} +major : Number of alarms of severity 'major' {uint64} +minor : Number of alarms of severity 'minor' {uint64} +warning : Number of alarms of severity 'warning' {uint64} +informational : Number of alarms of severity 'informational' {uint64} + +127.0.0.1:6379[6]> hgetall "ALARM_STATS|state" + 1) "alarms" + 2) "1" + 3) "critical" + 4) "1" + 5) "major" + 6) "0" + 7) "minor" + 8) "0" + 9) "warning" +10) "0" + +``` +### 3.1.8 Pull Model +All NBIs - CLI, REST and gNMI - can pull contents of alarm table and event table. +The following filters are supported: +- ALL ( pulls all alarms) +- Severity. +- Recent alarms (eg., last 5 minutes, one hour, one day). +- Records between two timestamps, one timestamp and end, and beginning and a timestamp. +- All records between two Sequence Numbers (incl end and begin) + +### 3.1.9 Supporting third party containers +To support third party components ( e.g. FRR, teamd, DHCP Relay, LLDPd, ntpd etc ) which can not be modified to raise events, the following options are considered +and are being evaluated. +1. Patch the components + Create a patch for these components by adding libeventnotify library and invoke the API. This however, requires these patches need to be maintained in the code forever. + +2. Listen to syslog messages + As many of these components raises syslog messages on an important event, a listener can be implemented to read incoming syslog messages and raise + events based on the message. + This however is heavy on performance due to the fact that listener has to parse each syslog message. Also listener need to maintain a map of messages to + event-id and need to be aware of resource and other specific details. It need to be aware of nuances of alarm raising/clearing if the component follows + any specific logic. + +Approach 1 is preferred. + +## 3.2 DB Changes +### 3.2.1 EVENT DB +A new instance, redis4, is created and EVENT DB uses the new instance. +The following tables uses Event DB. +Table EVENTPUBSUB is used for applications to write events and for eventd to access and process them. +Event Table (EVENT) and Alarm Table (ALARM) are used to house events and alarms respectively. +To maintain various statistics of events, these two tables are used : EVENT_STATS and ALARM_STATS. + +EVPROFILE table is used by mgmt-framework to communicate name of the custom event profile when configured through NBI. +Eventd reads the file name from this table and merges it with its static_event_map. + +## 3.3 User Interface +### 3.3.1 Data Models + +The following is SONiC yang for events. +``` +module: sonic-event + +--rw sonic-event + +--rw EVENT + | +--rw EVENT_LIST* [id] + | +--rw id uint64 + | +--rw revision uint64 + | +--rw resource? string + | +--rw text? string + | +--rw time-created? timeticks64 + | +--rw type-id? string + | +--rw severity? severity-type + | +--rw action? action-type + +--rw EVENT_STATS + +--rw EVENT_STATS_LIST* [id] + +--rw id enumeration + +--rw events? uint64 + +--rw raised? uint64 + +--rw acked? uint64 + +--rw cleared? uint64 + + rpcs: + +---x show-events + +---w input + | +---w (option)? + | +--:(time) + | | +---w time + | | +---w begin? yang-types:date-and-time + | | +---w end? yang-types:date-and-time + | +--:(last-interval) + | | +---w interval? enumeration + | +--:(severity) + | | +---w severity? severity-type + | +--:(id) + | +---w id + | +---w begin? string + | +---w end? string + +--ro output + +--ro status? int32 + +--ro status-detail? string + +--ro EVENT + +--ro EVENT_LIST* [id] + +--ro id uint64 + +--ro revision uint64 + +--ro resource? string + +--ro text? string + +--ro time-created? timeticks64 + +--ro type-id? string + +--ro severity? severity-type + +--ro action? action-type +``` + +The following is SONiC yang for alarms. +``` +module: sonic-alarm + +--rw sonic-alarm + +--rw ALARM + | +--rw ALARM_LIST* [id] + | +--rw id uint64 + | +--rw revision uint64 + | +--rw resource? string + | +--rw text? string + | +--rw time-created? event:timeticks64 + | +--rw type-id? string + | +--rw severity? event:severity-type + | +--rw acknowledged? boolean + | +--rw acknowledge-time? event:timeticks64 + +--rw ALARM_STATS + +--rw ALARM_STATS_LIST* [id] + +--rw id enumeration + +--rw alarms? uint64 + +--rw critical? uint64 + +--rw major? uint64 + +--rw minor? uint64 + +--rw warning? uint64 + +--rw acknowledged? uint64 + + rpcs: + +---x acknowledge-alarms + | +---w input + | | +---w id* string + | +--ro output + | +--ro status? int32 + | +--ro status-detail? string + +---x unacknowledge-alarms + | +---w input + | | +---w id* string + | +--ro output + | +--ro status? int32 + | +--ro status-detail? string + +---x show-alarms + +---w input + | +---w (option)? + | +--:(time) + | | +---w time + | | +---w begin? yang-types:date-and-time + | | +---w end? yang-types:date-and-time + | +--:(last-interval) + | | +---w interval? enumeration + | +--:(severity) + | | +---w severity? event:severity-type + | +--:(id) + | +---w id + | +---w begin? string + | +---w end? string + +--ro output + +--ro status? int32 + +--ro status-detail? string + +--ro ALARM + +--ro ALARM_LIST* [id] + +--ro id uint64 + +--ro revision uint64 + +--ro resource? string + +--ro text? string + +--ro time-created? event:timeticks64 + +--ro type-id? string + +--ro severity? event:severity-type + +--ro acknowledged? boolean + +--ro acknowledge-time? event:timeticks64 +``` + +Following is for sonic yang to support event profiles. +``` +module: sonic-evprofile + + rpcs: + +---x get-evprofile + | +--ro output + | +--ro file-name? string + | +--ro file-list* string + +---x set-evprofile + +---w input + | +---w file-name? string + +--ro output + +--ro status? string +``` + +openconfig alarms yang is defined at [here](https://github.com/openconfig/public/blob/master/release/models/system/openconfig-alarms.yang) + +### 3.3.2 CLI +#### 3.3.2.1 Exec Commands +``` +sonic# alarm acknowledge +``` +An operator can acknolwedge a raised alarm. This indicates that the operator is aware of the fault condition and considers the condition not catastrophic. +Acknowledging an alarm updates alarm statistics and thereby applications like pmon can remove the particular alarm from status consideration. + +The alarm record in the ALARM table is marked with acknowledged field set to true. There is acknowledge-time field that indicates when that alarm is acknowledged. + +``` +sonic# alarm unacknowledge +``` +An operator can un-acknolwedge a previously acknowledged raised alarm. +Un-acknowledging an alarm updates alarm statistics and thereby applications like pmon can take the particular alarm into status consideration. + +The alarm record in the ALARM table is marked with acknowledged field set to false. +There is acknowledge-time field that indicates when that alarm is un-acknowledged. + +``` +sonic# event profile +``` +The command takes name of specified file, validates it for its syntax and values; merges it with its internal static map of events *static_event_map*. + +``` +sonic# clear event +``` +This command clears all the records in the event table. All the event stats are cleared. +The command will not affect alarm table or alarm statistics. +Eventd generates an event informing that event table is cleared. + +#### 3.3.2.2 Configuration Commands +``` +sonic(config)# logging server [log|event] +``` +Note: The 'logging server' command is an existing, already supported command. +It is only enhanced to take either 'log' or 'event' to indicate either native syslog messages or syslog messages corresponding to events alone are sent to the remote host. +Support with VRF/source-interface and configuring remote-port are all backward comaptible and will be applicable to either 'log' or 'event' options. + +#### 3.3.2.3 Show Commands +``` +sonic# show event profile +-------------------------- +Active Event Profile +-------------------------- +myProfile.json +-------------------------- +Available Event Profiles +-------------------------- +default.json +myProfile.json +userProfile.json + +sonic# show event [ details | summary | severity | start end | recent <5min|60min|24hr> | id | from to ] + +'show event' commands would display all the records in EVENT table. + +sonic# show event +---------------------------------------------------------------------------------------------------------------------------- +Id Action Severity Name Timestamp Description +---------------------------------------------------------------------------------------------------------------------------- +1 - WARNING CUSTOM_EVPROFILE_CHANGE 2021-05-19T21:38:27.455Z handle_custom_evprofile: Custom Event Profile x.json is applied. +2 RAISE CRITICAL DUMMY_ALARM 2021-05-19T21:39:31.622Z signalHandler: Raising simulated alarm +3 CLEAR CRITICAL DUMMY_ALARM 2021-05-19T21:42:34.371Z signalHandler: Clearing simulated alarm +4 RAISE CRITICAL DUMMY_ALARM 2021-05-19T21:46:14.371Z signalHandler: Raising simulated alarm +5 ACKNOWLEDGE CRITICAL DUMMY_ALARM 2021-05-19T21:48:05.845Z Alarm id 4 ACKNOWLEDGE. +6 UNACKNOWLEDGE CRITICAL DUMMY_ALARM 2021-05-19T21:53:24.484Z Alarm id 4 UNACKNOWLEDGE. +7 CLEAR CRITICAL DUMMY_ALARM 2021-05-19T21:55:54.977Z signalHandler: Clearing simulated alarm + +sonic# show event details +---------------------------------------------- +Event Details - 1 +---------------------------------------------- +Id: 1 +Revision: 0 +Action: - +Severity: WARNING +Type: CUSTOM_EVPROFILE_CHANGE +Timestamp 2021-05-19T21:38:27.455Z +Description: handle_custom_evprofile: Custom Event Profile x.json is applied. +Source: /etc/evprofile/x.json + +---------------------------------------------- +Event Details - 2 +---------------------------------------------- +Id: 2 +Revision: 1 +Action: RAISE +Severity: CRITICAL +Type: DUMMY_ALARM +Timestamp 2021-05-19T21:39:31.622Z +Description: signalHandler: Raising simulated alarm +Source: simulation + +---------------------------------------------- +Event Details - 3 +---------------------------------------------- +Id: 3 +Revision: 0 +Action: CLEAR +Severity: CRITICAL +Type: DUMMY_ALARM +Timestamp 2021-05-19T21:42:34.371Z +Description: signalHandler: Clearing simulated alarm +Source: simulation + +sonic# show event summary +Event summary +--------------------------------- +Total: 14 +Raised: 4 +Acknowledged: 1 +Cleared: 3 +---------------------------------- + +sonic# show event severity critical +---------------------------------------------------------------------------------------------------------------------------- +Id Action Severity Name Timestamp Description +---------------------------------------------------------------------------------------------------------------------------- +2 RAISE CRITICAL DUMMY_ALARM 2021-05-19T21:39:31.622Z signalHandler: Raising simulated alarm +3 CLEAR CRITICAL DUMMY_ALARM 2021-05-19T21:42:34.371Z signalHandler: Clearing simulated alarm +4 RAISE CRITICAL DUMMY_ALARM 2021-05-19T21:46:14.371Z signalHandler: Raising simulated alarm +5 ACKNOWLEDGE CRITICAL DUMMY_ALARM 2021-05-19T21:48:05.845Z Alarm id 4 ACKNOWLEDGE. +6 UNACKNOWLEDGE CRITICAL DUMMY_ALARM 2021-05-19T21:53:24.484Z Alarm id 4 UNACKNOWLEDGE. +7 CLEAR CRITICAL DUMMY_ALARM 2021-05-19T21:55:54.977Z signalHandler: Clearing simulated alarm + +sonic# show event recent 24hr +---------------------------------------------------------------------------------------------------------------------------- +Id Action Severity Name Timestamp Description +---------------------------------------------------------------------------------------------------------------------------- +2 RAISE CRITICAL DUMMY_ALARM 2021-05-19T21:39:31.622Z signalHandler: Raising simulated alarm +3 CLEAR CRITICAL DUMMY_ALARM 2021-05-19T21:42:34.371Z signalHandler: Clearing simulated alarm +4 RAISE CRITICAL DUMMY_ALARM 2021-05-19T21:46:14.371Z signalHandler: Raising simulated alarm +5 ACKNOWLEDGE CRITICAL DUMMY_ALARM 2021-05-19T21:48:05.845Z Alarm id 4 ACKNOWLEDGE. +6 UNACKNOWLEDGE CRITICAL DUMMY_ALARM 2021-05-19T21:53:24.484Z Alarm id 4 UNACKNOWLEDGE. +7 CLEAR CRITICAL DUMMY_ALARM 2021-05-19T21:55:54.977Z signalHandler: Clearing simulated alarm + +sonic# show event id 2 +---------------------------------------------- +Event Details - 2 +---------------------------------------------- +Id: 2 +Revision: 1 +Action: RAISE +Severity: CRITICAL +Type: DUMMY_ALARM +Timestamp 2021-05-19T21:39:31.622Z +Description: signalHandler: Raising simulated alarm +Source: simulation + +sonic# show event from 2 to 5 +---------------------------------------------------------------------------------------------------------------------------- +Id Action Severity Name Timestamp Description +---------------------------------------------------------------------------------------------------------------------------- +2 RAISE CRITICAL DUMMY_ALARM 2021-05-19T21:39:31.622Z signalHandler: Raising simulated alarm +3 CLEAR CRITICAL DUMMY_ALARM 2021-05-19T21:42:34.371Z signalHandler: Clearing simulated alarm +4 RAISE CRITICAL DUMMY_ALARM 2021-05-19T21:46:14.371Z signalHandler: Raising simulated alarm +5 ACKNOWLEDGE CRITICAL DUMMY_ALARM 2021-05-19T21:48:05.845Z Alarm id 4 ACKNOWLEDGE. + +sonic# show event start 2021-05-19T21:39:31.622Z end 2021-05-19T21:46:14.371Z +---------------------------------------------------------------------------------------------------------------------------- +Id Action Severity Name Timestamp Description +---------------------------------------------------------------------------------------------------------------------------- +3 CLEAR CRITICAL DUMMY_ALARM 2021-05-19T21:42:34.371Z signalHandler: Clearing simulated alarm + +sonic# show alarm [ acknowledged | all | detail | summary | severity | id | start end | recent <5min|1hr|1day> | from to ] + +'show alarm' command would display all the *active* alarm records in ALARM table. Acknowledged alarms wont be shown here. + +sonic# show alarm +---------------------------------------------------------------------------------------------------------------------------- +Id Severity Name Timestamp Description +---------------------------------------------------------------------------------------------------------------------------- +14 WARNING TEMPERATURE_EXCEEDED 2021-05-20T00:47:52.992Z temperatureCrossedThreshold: Current temperature of sensor/2 is 76 degrees +16 WARNING PSU_FAULT 2021-05-20T02:16:42.611Z :- /psu/2 has experienced a fault + +sonic# show alarm all +---------------------------------------------------------------------------------------------------------------------------- +Id Severity Name Timestamp Description +---------------------------------------------------------------------------------------------------------------------------- +14 WARNING TEMPERATURE_EXCEEDED 2021-05-20T00:47:52.992Z temperatureCrossedThreshold: Current temperature of sensor/2 is 76 degrees +15 WARNING DUMMY_ALARM 2021-05-20T02:16:41.637Z signalHandler: Raising simulated alarm +16 WARNING PSU_FAULT 2021-05-20T02:16:42.611Z /psu/2 has experienced a fault + + +sonic# show alarm detail + +alarm details - 14 +------------------------------------------- +Id: 14 +Revision: 0 +Severity: CRITICAL +Source: /sensor/2 +Name: TEMPERATURE_EXCEEDED +Description: temperatureCrossedThreshold: Current temperature of sensor/2 is 76 degrees +Raise-time: Wed Feb 10 18:08:24 2021 +Ack-time: +New: true +Acknowledged: false + +sonic# show alarm from 14 to 16 +---------------------------------------------------------------------------------------------------------------------------- +Id Severity Name Timestamp Description +---------------------------------------------------------------------------------------------------------------------------- +14 WARNING TEMPERATURE_EXCEEDED 2021-05-20T00:47:52.992Z temperatureCrossedThreshold: Current temperature of sensor/2 is 76 degrees +15 WARNING DUMMY_ALARM 2021-05-20T02:16:41.637Z signalHandler: Raising simulated alarm +16 WARNING PSU_FAULT 2021-05-20T02:16:42.611Z /psu/2 has experienced a fault + +sonic# show alarm summary +Alarm summary +--------------------------------- +Total: 3 +Critical: 0 +Major: 0 +Minor: 0 +Warning: 3 +Acnowledged: 2 +---------------------------------- +``` + +### 3.3.3 REST API Support + +sonic REST links: +* /restconf/data/sonic-event:sonic-event/EVENT/EVENT_LIST +* /restconf/data/sonic-event:sonic-event/EVENT_STATS/EVENT_STATS_LIST +* /restconf/data/sonic-alarm:sonic-alarm/ALARM/ALARM_LIST +* /restconf/data/sonic-alarm:sonic-alarm/ALARM_STATS/ALARM_STATS_LIST +* /restconf/operations/sonic-evprofile:get-evprofile +* /restconf/operations/sonic-evprofile:set-evprofile +* /restconf/operations/sonic-alarm:acknowledge-alarms +* /restconf/operations/sonic-alarm:unacknowledge-alarms + +openconfig REST links: +* /restconf/data/openconfig-system:system/openconfig-events:events +* /restconf/data/openconfig-system:system/openconfig-events:event-stats +* /restconf/data/openconfig-system:system/alarms +* /restconf/data/openconfig-system:system/openconfig-alarms-ext:alarm-stats + +# 4 Flow Diagrams +![Sequence Diagram](event-alarm-framework-seqdiag.png) + +# 5 Warm Boot Support +## 5.1 Application warm boot +Applications confirming to the warm boot, should have stored their state and compare current values against previous values. +Such compliant application also "remembers" that it raised an event before for a specific condition. +They would +* not raise alarms/events for the same condition that it raised pre warm boot +* clear those alarms once current state of a particular condition is recovered (by comparing against the stored state). + +## 5.2 eventd warm boot +Records from applications are stored in a table, called EVENTPUBSUB. +Records that are being written will be queued when the consumer (eventd) is down. + +During normal operation, eventd reads, processes whenever a new record is added to the table. + +When eventd is restarted, events and alarms raised by applications will be waiting in a queue while eventd is coming up. +When eventd eventually comes back up, it reads those records in the queue. + +# 6 Scalability +In this feature, scalability applies to Event Table (EVENT). As it is persistent and it records every event generated on the system, to protect +against it growing indefinitely, user can limit its size through a manifest file. +By default, the size of Event Table is set to 40k events or events for 30 days - after which, older records are discarded to make way for new records. + +# 7 Showtech support +The techsupport bundle is upgraded to include output of "show event recent 60min” and “show alarm all”. +The first command displays all the events that were sent by applications for the last one hour. +The second command displays all the alarms that are waiting to be cleared by applications (this includes alarms that were acknowledged by operator as well). + +# 8 Unit Test +- Raise an event and verify the fields in EVENT table and EVENT_STATS table +- Raise an alarm and verify the fields in ALARM table and ALARM_STATS table +- Clear an alarm and verify that record is removed from ALARM and ALARM_STATS tables are udpated +- Ack an alarm and verify that acknowledged flag is set to true in ALARM table and acknowledge-time is set +- Un-Ack an alarm and verify that acknowledged flag is set to false in ALARM table and acknowledge-time is set +- Verify wrap around for EVENT table ( change manifest file to a lower range and trigger that many events ) +- Verify sequence-id for events is persistent by restarting +- Verify counters by raising various alarms with different severities +- Change severity of an event through custom event profile and verify it is logged at specified severity +- Change enable/disable of an event through custom event profile and verify it is suppressed +- Verify custom event profile with an invalid severity is rejected +- Verify custom event profile with an invalid enable/disable flag is rejected +- Verify custom event profile is persisted after a reboot +- Verify various show commands +- Verify 'logging-server event' command forwards only event log messages to the host diff --git a/doc/hld_template.md b/doc/hld_template.md index 3a96629539..a32d3f99a7 100755 --- a/doc/hld_template.md +++ b/doc/hld_template.md @@ -66,6 +66,7 @@ Paste a preliminary manifest in a JSON format. This sub-section covers the addition/deletion/modification of CLI changes and YANG model changes needed for the feature in detail. If there is no change in CLI for HLD feature, it should be explicitly mentioned in this section. Note that the CLI changes should ensure downward compatibility with the previous/existing CLI. i.e. Users should be able to save and restore the CLI from previous release even after the new CLI is implemented. This should also explain the CLICK and/or KLISH related configuration/show in detail. +https://github.com/sonic-net/sonic-utilities/blob/master/doc/Command-Reference.md needs be updated with the corresponding CLI change. #### Config DB Enhancements @@ -88,4 +89,4 @@ Example sub-sections for unit test cases and system test cases are given below. ### Open/Action items - if any -NOTE: All the sections and sub-sections given above are mandatory in the design document. Users can add additional sections/sub-sections if required. \ No newline at end of file +NOTE: All the sections and sub-sections given above are mandatory in the design document. Users can add additional sections/sub-sections if required. diff --git a/doc/pins/p4rt_app_hld.md b/doc/pins/p4rt_app_hld.md index 5aa70ef738..a1da70af65 100644 --- a/doc/pins/p4rt_app_hld.md +++ b/doc/pins/p4rt_app_hld.md @@ -16,6 +16,7 @@ _Rev v0.1_ * [Response path](#response-path) - [APPL DB Schema High-Level Design](#appl-db-schema-high-level-design) - [Testing Requirements/Design](#testing-requirements-design) +- [Configuring P4RT Application](#configuring-p4rt-application) - [Open/Action items - if any](#open-action-items---if-any) ## Revision @@ -164,6 +165,29 @@ P4RT application introduces new tables that are written to APPL_DB for the table The P4RT application code will have unit & component tests that together will give >80% code coverage. +## Configuring P4RT Application + +The P4RT application is configured at the start-up by reading the P4RT configuration from the CONFIG_DB. If no valid config exists in CONFIG_DB, it uses the default values. The configuration can be added to the CONFIG_DB by being manually added to the config_db.json file. The P4RT container will need to be restarted if the configuration is changed. + +Below is an example of adding P4RT configuration to config_db.json. The user can modify this block based on their environment settings. + +``` +"P4RT": { + "certs": { + "server_crt": "/keys/server_cert.lnk", + "server_key": "/keys/server_key.lnk", + "ca_crt": "/keys/ca_cert.lnk", + "cert_crl_dir": "/keys/crl" + }, + "p4rt_app": { + "port": "9559", + "use_genetlink": "false", + "use_port_ids": "false", + "save_forwarding_config_file" : "/etc/sonic/p4rt_forwarding_config.pb.txt", + "authz_policy": "/keys/authorization_policy.json" + } +} +``` ## Open/Action items - if any diff --git a/doc/psud/PSU_daemon_design.md b/doc/psud/PSU_daemon_design.md index cc65015e9f..f3e9006263 100644 --- a/doc/psud/PSU_daemon_design.md +++ b/doc/psud/PSU_daemon_design.md @@ -1,27 +1,35 @@ # SONiC PSU Daemon Design # -### Rev 0.1 ### +### Rev 0.2 ### ### Revision ### | Rev | Date | Author | Change Description | |:---:|:-----------:|:------------------:|-----------------------------------| | 0.1 | | Chen Junchao | Initial version | - + | 0.2 | August 4st, 2022 | Stephen Sun | Update according to the current implementation | ## 1. Overview The purpose of PSU daemon is to collect platform PSU data and trigger proper actions if necessary. Major functions of psud include: - Collect constant PSU data during daemon boot up, such as PSU number. -- Collect variable PSU data periodically. -- Monitor PSU event, set LED color and trigger syslog according to event type. +- Collect variable PSU data periodically, including: + - PSU entity information + - PSU present status and power good status + - PSU power, current, voltage and voltage threshold + - PSU temperature and temperature threshold +- Monitor PSU event, set LED color and trigger syslog according to event type, including: + - PSU present status and power good status + - whether the PSU voltage exceeds the minimal and maximum thresholds + - whether the PSU temperature exceeds the threshold + - whether the total PSU power consumption exceeds the budget (modular switch only) ## 2. PSU data collection PSU daemon data collection flow diagram: -![](https://github.com/Azure/SONiC/blob/master/doc/pmon/daemon-flow.svg) +![](PSU_daemon_design_pictures/PSU-daemon-data-collection-flow.svg) Now psud collects PSU data via platform API, and it also support platform plugin for backward compatible. All PSU data will be saved to redis database for further usage. @@ -34,13 +42,23 @@ PSU information is stored in PSU table: ; Defines information for a psu key = PSU_INFO|psu_name ; information for the psu ; field = value - presence = BOOLEAN ; presence of the psu + presence = BOOLEAN ; presence state of the psu model = STRING ; model name of the psu serial = STRING ; serial number of the psu + revision = STRING ; hardware revision of the PSU status = BOOLEAN ; status of the psu change_event = STRING ; change event of the psu fan = STRING ; fan_name of the psu led_status = STRING ; led status of the psu + is_replaceable = STRING ; whether the PSU is replaceable + temp = 1*3.3DIGIT ; temperature of the PSU + temp_threshold = 1*3.3DIGIT ; temperature threshold of the PSU + voltage = 1*3.3DIGIT ; the output voltage of the PSU + voltage_min_threshold = 1*3.3DIGIT ; the minimal voltage threshold of the PSU + voltage_max_threshold = 1*3.3DIGIT ; the maximum voltage threshold of the PSU + current = 1*3.3DIGIT ; the current of the PSU + power = 1*3.3DIGIT ; the power of the PSU + Now psud only collect and update "presence" and "status" field. @@ -72,10 +90,10 @@ The current output for "show platform psustatus" looks like: ``` admin@sonic:~$ show platform psustatus -PSU Status ------ -------- -PSU 1 OK -PSU 2 OK +PSU Model Serial HW Rev Voltage (V) Current (A) Power (W) Status LED +----- ------------- ------------ -------- ------------- ------------- ----------- -------- ----- +PSU 1 MTEF-PSF-AC-A MT1629X14911 A3 12.09 5.44 64.88 OK green +PSU 2 MTEF-PSF-AC-A MT1629X14913 A3 12.02 4.69 56.25 OK green ``` ## 5. PSU LED management @@ -147,4 +165,4 @@ Supervisord takes charge of this daemon. This daemon will loop every 3 seconds a - The psu_num will store in "chassis_info" table. It will just be invoked one time when system boot up or reload. The key is chassis_name, the field is "psu_num" and the value is from get_psu_num(). - The psu_status and psu_presence will store in "psu_info" table. It will be updated every 3 seconds. The key is psu_name, the field is "presence" and "status", the value is from get_psu_presence() and get_psu_num(). -- The daemon query PSU event every 10 seconds via platform API. If any event detects, it should set PSU LED color accordingly and trigger proper syslog. +- The daemon query PSU event every 3 seconds via platform API. If any event detects, it should set PSU LED color accordingly and trigger proper syslog. diff --git a/doc/psud/PSU_daemon_design_pictures/PSU-daemon-data-collection-flow.svg b/doc/psud/PSU_daemon_design_pictures/PSU-daemon-data-collection-flow.svg new file mode 100644 index 0000000000..72081ced92 --- /dev/null +++ b/doc/psud/PSU_daemon_design_pictures/PSU-daemon-data-collection-flow.svg @@ -0,0 +1,4 @@ + + + +
alt
alt
loop
loop
daemon
daemon
wait for the device ready
wait for the device ready
platfrom util
platfrom util
device driver
device driver
State DB
State DB
get constant info at init
get constant info at init
return
return
read constant info
read constant info
return
return
update DB
update DB
get variable info
get variable info
return
return
read variable info
read variable info
return
return
update DB with dom info
update DB with dom info
update db with variable info
update db with variable info
refresh time out
refresh time out
Text is not SVG - cannot display
\ No newline at end of file diff --git a/doc/srv6/SRv6_uSID.md b/doc/srv6/SRv6_uSID.md new file mode 100755 index 0000000000..c951a17577 --- /dev/null +++ b/doc/srv6/SRv6_uSID.md @@ -0,0 +1,112 @@ +# SONiC uSID + +## Table of Content +- [Overview](#Overview) +- [Scope](#Scope) +- [Design](#Design) +- [Example](#Example) + +## Revision + +| Rev | Date | Author | Change Description | +| :--: | :-------: | :------------------------------: | :--------------------------: | +| 0.1 | 7/17/2022 | Shitanshu Shah, Reshma Sudarshan | Initial version | +| 0.2 | 7/24/2022 | Shitanshu Shah, Reshma Sudarshan | Incorporate review comments | + +## Overview +SRv6 uSID (micro-segment) is extension of the SRv6 network programming model, refer to IETF drafts [Compressed SRv6 Segment List Encoding]( draft-ietf-spring-srv6-srh-compression-02) and [SRv6 uSID instructions IETF draft](https://datatracker.ietf.org/doc/draft-filsfils-spring-net-pgm-extension-srv6-usid/). uSID is a compressed SID value which can be for example carried in 16-bits (unlike full IPv6 address to represent a SID). uSID as is designed scales well with much lower MTU overhead required per uSID carrier. uSID carrier is 128-bit IPv6 address that can carry upto 6 uSIDs [Refer to Example for more details] + +## Scope +The scope of this document is to enhance orchagent to support uSID programming instructions in this IETF draft. Current SAI API definitions already support uSID instructions. No SAI API change required in scope of this document. Current version of routing protocols in SONiC does not support SRv6, it is not in the scope of this document to add such a support for FRR routing stack. + +## Design +Current srv6orch is designed, per [SRv6 HLD](https://github.com/sonic-net/SONiC/blob/master/doc/srv6/srv6_hld.md), to support SRv6 programming instructions as described in RFC 8754 and RFC 8986. This design extends SRv6 Network Programming with a new type of SRv6 SID behaviors defined as uSID. + +SRv6 uSID fully leverages current SRv6 control-plane, without any change, as is implemented by srv6orch. +Following uSID behaviors are added, +uN - uN behavior is implemented with PSP and USD flavor +uA - uA local behavior is implemented with PSP and USD flavor +uDT - uDT local behavior is implemented exactly same as that of End.DT4/End.DT6 +uDX - uDX local behavior is implemented exactly same as that of End.DX4/End.DX6 + +PSP and USD end behavior flavors are already supported by SAI API today. End.DT4/6 and End.DX4/6 end behaviors are supported by SAI APIs as well. Thus there is no additional change required in SAI to support uN, uA, uDT and uDX behaviors. + +Changes in orchagent, +- While processing MYSID entries, from SRV6_MY_SID_TABLE off of APPDB, handling of new actions uN, uA, uDT and uDX added in srv6orch. No APPDB schema changes required. +- SAI end behavior and end behavior flavor are determined, before calling SAI APIs, to program MYSID entries + +```text +const map end_behavior_map = +{ + {"end", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_E}, + {"end.x", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_X}, + {"end.t", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_T}, + {"end.dx6", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_DX6}, + {"end.dx4", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_DX4}, + {"end.dt4", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_DT4}, + {"end.dt6", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_DT6}, + {"end.dt46", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_DT46}, + {"end.b6.encaps", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_B6_ENCAPS}, + {"end.b6.encaps.red", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_B6_ENCAPS_RED}, + {"end.b6.insert", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_B6_INSERT}, + {"end.b6.insert.red", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_B6_INSERT_RED}, ++ {"udx6", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_DX6}, ++ {"udx4", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_DX4}, ++ {"udt6", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_DT6}, ++ {"udt4", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_DT4}, ++ {"udt46", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_DT46}, ++ {"un", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_UN}, ++ {"ua", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_UA} +}; + +const map end_flavor_map = +{ + {"end", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_FLAVOR_PSP_AND_USD}, + {"end.x", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_FLAVOR_PSP_AND_USD}, + {"end.t", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_FLAVOR_PSP_AND_USD}, ++ {"un", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_FLAVOR_PSP_AND_USD}, ++ {"ua", SAI_MY_SID_ENTRY_ENDPOINT_BEHAVIOR_FLAVOR_PSP_AND_USD} +}; +``` + +## Example +uSID carrier is 128-bit IPv6 address which is specified in following format: +```text +...[...] +``` +- uSID Block: An IPv6 prefix (defines a block of SRv6 uSIDs) +- Active uSID: The first uSID +- Next uSID: The next uSID after the Active uSID. +- Last uSID: The last uSID in the carrier before the End-of-Carrier +- End-of-Carrier: A globally reserved uSID that marks the end of a uSID list. The End-of-Carrier ID is 0000. As many End-of-Carriers as required to complete full 128-bits IPv6 address + +![](images/SRv6_uSID_Example.png) + +- uSID block: 2001:41f0 +- Active uSID: 0100 +- Next uSID: 0200 +- Last uSID: 0A00 +- 2 End-of-Carriers (0000) to complete full 128-bits IPv6 address + +A node with local uSID of 2001:41f0:0100 is to be programmed with following SRV6_MY_SID_TABLE entry, with appropriate uSID end behavior. Following shown two separate examples with 2 different end behaviors. + +```text +Note: prefix of "16:8:8:8" is (locator_block_len:locator_node_len:function_len:args_len) as is currently consumed by srv6orch. + +If end-behavior "un" +"SRV6_MY_SID_TABLE" : { + "16:8:8:8:2001:41f0:0100::" : { + "action": "un", + } +} + +If end-behavior is "udt46" +"SRV6_MY_SID_TABLE" : { + "16:8:8:8:2001:41f0:0100::" : { + "action": "udt46", + "vrf": "VRF-1001" + } +} + +A node with local uSID of 2001:41f0:0200 is to be programmed with appropriate uSID end behavior, similarly for node with 2001:41f0:0500, and for node with 2001:41f0:0A00 +``` diff --git a/doc/srv6/images/SRv6_uSID_Example.png b/doc/srv6/images/SRv6_uSID_Example.png new file mode 100755 index 0000000000..3885802945 Binary files /dev/null and b/doc/srv6/images/SRv6_uSID_Example.png differ diff --git a/index.html b/index.html index 75f120a000..2c088d2205 100644 --- a/index.html +++ b/index.html @@ -95,7 +95,7 @@
  • Building Guide
  • Testing Guide
  • Technical FAQ
  • -
  • SONiC Latest Images
  • +
  • SONiC Latest Images