Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix guest-api ConnectionClosedError: Reader at end of file #739

Conversation

olethanh
Copy link
Collaborator

Sentry Issue: ALEPH-VM-STAGING-41
Jira Issue: ALEPH-353

This error was making the diagnostic down constently, raising 3K error in 48h on Sentry

In aleph.vm.guest_api.main.put_in_cache

ConnectionClosedError: Reader at end of file
  File "aiohttp/web_app.py", line 569, in _handle
    return await handler(request)
  File "aleph/vm/guest_api/__main__.py", line 128, in put_in_cache
    return web.json_response(await redis.set(f"{prefix}:{key}", value, expire=CACHE_EXPIRES_AFTER))

Investigation
The error started at Jan 12, 2025 7:26:47 AM CET
The redis server was restarted around the same time by the server unattended-upgrades (apt)

Analysis
The guest api for the diagnostic VM lost the connexion to the redis server (via unix connexion) when it was restarted. Since the guest api always reuse the same connexion the error was always triggered.

In addition as the diagnostic vm is called regularly by monitoring services, it doesn't timeout and stop, so the init process that establish the redis connection was never redone

Solution
Check if the redis connection is still ok by pinging the service, if it raise an error, create a new connection

How to test
Start CRN, call the diagnostic vm redis endpoint
http://localhost:4020/vm/63faf8b5db1cf8d965e6a464a0cb8062af8e7df131729e48738342d956f29ace/cache/get/a

Then restart the redis service on the CRN

systemctl restart redis-server

and call the diagnostic vm redis endpoint again

Sentry Issue: ALEPH-VM-STAGING-41
Jira Issue: ALEPH-353

This error was making the diagnostic down constently, raising 3K error
in 48h on Sentry

In aleph.vm.guest_api._main_.put_in_cache
```
ConnectionClosedError: Reader at end of file
  File "aiohttp/web_app.py", line 569, in _handle
    return await handler(request)
  File "aleph/vm/guest_api/__main__.py", line 128, in put_in_cache
    return web.json_response(await redis.set(f"{prefix}:{key}", value, expire=CACHE_EXPIRES_AFTER))
```

*Investigation*
The error started at Jan 12, 2025 7:26:47 AM CET
The redis server was restarted around the same time by the
server unattended-upgrades (apt)

*Analysis*
The guest api for the diagnostic VM lost the connexion to the redis server (via unix
connexion) when it was restarted. Since the guest api always reuse
the same connexion the error was always triggered.

In addition as the diagnostic vm is called regularly by monitoring
services, it doesn't timeout and stop, so the init process that
establish the redis connection was never redone

*Solution*
Check if the redis connection is still ok by pinging the service, if it
raise an error, create a new connection

*How to test*
Start CRN, call the diagnostic vm redis endpoint
http://localhost:4020/vm/63faf8b5db1cf8d965e6a464a0cb8062af8e7df131729e48738342d956f29ace/cache/get/a

Then restart the redis service on the CRN

```bash
systemctl restart redis
```

and call the diagnostic vm redis ndpoint again
Copy link
Collaborator

@Psycojoker Psycojoker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@olethanh olethanh merged commit 76c6897 into main Jan 16, 2025
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants