Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum Retry Limit for Firmware Upgrades #109

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

dmccoystephenson
Copy link
Member

PR Details

Description

Problem

The current firmware manager continuously attempts to upgrade the firmware for an RSU, regardless of how many failures occur. This behavior can clog the upgrade queue and waste CPU resources, especially when an RSU is unreachable.

Solution

To address this, a new environment variable has been introduced to set a maximum retry limit for firmware upgrades per RSU. Each time a firmware upgrade fails, the system will increase a 'consecutive failure count' for that RSU. Once the count reaches the maximum retry limit, the target_firmware_version in the rsus table will be updated to match the current firmware_version, effectively halting further upgrades for that RSU. The failure will also be logged in a new Postgres table.

Additionally, the firmware manager will now skip upgrade attempts for RSUs that have returned a negative ping result.

How Has This Been Tested?

Unit tests have been implemented to verify function calls and behavior of methods. Additionally, the firmware manager was spun up in a local docker network with an ubuntu container serving as a mock RSU with a signedUpgrade.sh script in the /bin directory that just printed a hard-coded failure alert. The IP address of the ubuntu container was set as the value for the IP address of a sample RSU and three firmware upgrades were executed manually by calling the relevant endpoint.

The firmware manager modified the target_firmware_version to firmware_version as expected and logged the incident to the 'max_retry_limit_reached_instances' table.

image

Types of changes

  • Defect fix (non-breaking change that fixes an issue)
  • [ x ] New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that cause existing functionality to change)

Checklist:

  • [ x ] My changes require new environment variables:
    • [ / ] I have updated the docker-compose, K8s YAML, and all dependent deployment configuration files.
  • My changes require updates to the documentation:
    • I have updated the documentation accordingly.
  • [ x ] My changes require updates and/or additions to the unit tests:
    • [ x ] I have modified/added tests to cover my changes.
  • [ x ] All existing tests pass.

Copy link
Collaborator

@drewjj drewjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! A lot for such a simple check (conceptually) but I think the corner cases covered are all necessary additions so I have no complaints. All unit tests pass.

Copy link
Collaborator

@mwodahl mwodahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants