-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve service checker for gnmi/telemetry container #19153
Conversation
/azpw ms_conflict |
@qiluo-msft , can you please review this PR ? an if it is approved, can you please merge ? |
@qiluo-msft kindly reminder to review. it should go to 202311 and 202405 |
""" | ||
try: | ||
DOCKER_CLIENT = docker.DockerClient(base_url='unix://var/run/docker.sock') | ||
DOCKER_CLIENT.images.get(image_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an assumption that images.get function will drop an exception when image is not there? Should we also check what it returns other than depend the logic of images.get function? For example, if the images.get refined to return a null for a search miss, it would still return True by check_docker_image. @ganglyu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function will raise exception if the image does not exist:
https://docker-py.readthedocs.io/en/stable/images.html#docker.models.images.ImageCollection.get
Raises:
docker.errors.ImageNotFound – If the image does not exist.
docker.errors.APIError – If the server returns an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change looks like more a work around, but I'm ok with it to deal with the difficulty of warm reboot case.
@ganglyu could you help to update the description about the details of difficulty of warm reboot case?
@qiluo-msft to take a look before merge. |
@qiluo-msft , Can you please approve and merge ? |
Why I did it Fix sonic-net#19081 We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade. service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container. It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py. When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration. Work item tracking Microsoft ADO (number only): How I did it I modify service_checker script: If there's docker-sonic-telemetry image, check telemetry container. If there's no docker-sonic-telemetry image, check gnmi container instead. If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry. How to verify it Run unit test and end to end test.
Cherry-pick PR to 202405: #19332 |
Why I did it Fix #19081 We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade. service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container. It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py. When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration. Work item tracking Microsoft ADO (number only): How I did it I modify service_checker script: If there's docker-sonic-telemetry image, check telemetry container. If there's no docker-sonic-telemetry image, check gnmi container instead. If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry. How to verify it Run unit test and end to end test.
@yxieca , Can you please cherry pick to 202311 ? |
@ganglyu can you provide the ADO number? |
Why I did it Fix sonic-net#19081 We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade. service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container. It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py. When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration. Work item tracking Microsoft ADO (number only): How I did it I modify service_checker script: If there's docker-sonic-telemetry image, check telemetry container. If there's no docker-sonic-telemetry image, check gnmi container instead. If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry. How to verify it Run unit test and end to end test.
Cherry-pick PR to 202311: #19360 |
Why I did it Fix #19081 We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade. service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container. It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py. When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration. Work item tracking Microsoft ADO (number only): How I did it I modify service_checker script: If there's docker-sonic-telemetry image, check telemetry container. If there's no docker-sonic-telemetry image, check gnmi container instead. If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry. How to verify it Run unit test and end to end test. Co-authored-by: ganglv <88995770+ganglyu@users.noreply.github.com>
Why I did it Fix sonic-net#19081 We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade. service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container. It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py. When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration. Work item tracking Microsoft ADO (number only): How I did it I modify service_checker script: If there's docker-sonic-telemetry image, check telemetry container. If there's no docker-sonic-telemetry image, check gnmi container instead. If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry. How to verify it Run unit test and end to end test.
Why I did it
Fix #19081
We have used gnmi container to replace telemetry container, and telemetry is still enabled after upgrade.
service_checker script reads from features table and check if the container is running, telemetry is enabled but there's no telemetry container.
It's difficult to disable telemetry in feature table for warm reboot and cold reboot, we need to check docker image in db migrator and minigraph.py.
When we use warm reboot to upgrade from 202305 to 202311, config_db still has telemetry configuration, and we can't simply remove related configuration.
Work item tracking
How I did it
I modify service_checker script:
If there's docker-sonic-telemetry image, check telemetry container.
If there's no docker-sonic-telemetry image, check gnmi container instead.
If there's no docker-sonic-telemetry image and docker-sonic-gnmi image, do not check telemetry.
How to verify it
Run unit test and end to end test.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)