Fix guest-api ConnectionClosedError: Reader at end of file #739
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sentry Issue: ALEPH-VM-STAGING-41
Jira Issue: ALEPH-353
This error was making the diagnostic down constently, raising 3K error in 48h on Sentry
In aleph.vm.guest_api.main.put_in_cache
Investigation
The error started at Jan 12, 2025 7:26:47 AM CET
The redis server was restarted around the same time by the server unattended-upgrades (apt)
Analysis
The guest api for the diagnostic VM lost the connexion to the redis server (via unix connexion) when it was restarted. Since the guest api always reuse the same connexion the error was always triggered.
In addition as the diagnostic vm is called regularly by monitoring services, it doesn't timeout and stop, so the init process that establish the redis connection was never redone
Solution
Check if the redis connection is still ok by pinging the service, if it raise an error, create a new connection
How to test
Start CRN, call the diagnostic vm redis endpoint
http://localhost:4020/vm/63faf8b5db1cf8d965e6a464a0cb8062af8e7df131729e48738342d956f29ace/cache/get/a
Then restart the redis service on the CRN
and call the diagnostic vm redis endpoint again