Skip to content

Commit

Permalink
Fix guest-api ConnectionClosedError: Reader at end of file
Browse files Browse the repository at this point in the history
Sentry Issue: ALEPH-VM-STAGING-41
Jira Issue: ALEPH-353

This error was making the diagnostic down constently, raising 3K error
in 48h on Sentry

In aleph.vm.guest_api._main_.put_in_cache
```
ConnectionClosedError: Reader at end of file
  File "aiohttp/web_app.py", line 569, in _handle
    return await handler(request)
  File "aleph/vm/guest_api/__main__.py", line 128, in put_in_cache
    return web.json_response(await redis.set(f"{prefix}:{key}", value, expire=CACHE_EXPIRES_AFTER))
```

*Investigation*
The error started at Jan 12, 2025 7:26:47 AM CET
The redis server was restarted around the same time by the
server unattended-upgrades (apt)

*Analysis*
The guest api for the diagnostic VM lost the connexion to the redis server (via unix
connexion) when it was restarted. Since the guest api always reuse
the same connexion the error was always triggered.

In addition as the diagnostic vm is called regularly by monitoring
services, it doesn't timeout and stop, so the init process that
establish the redis connection was never redone

*Solution*
Check if the redis connection is still ok by pinging the service, if it
raise an error, create a new connection

*How to test*
Start CRN, call the diagnostic vm redis endpoint
http://localhost:4020/vm/63faf8b5db1cf8d965e6a464a0cb8062af8e7df131729e48738342d956f29ace/cache/get/a

Then restart the redis service on the CRN

```bash
systemctl restart redis
```

and call the diagnostic vm redis ndpoint again
  • Loading branch information
olethanh committed Jan 14, 2025
1 parent c2ad82a commit a2b9ed0
Showing 1 changed file with 8 additions and 1 deletion.
9 changes: 8 additions & 1 deletion src/aleph/vm/guest_api/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,15 @@

async def get_redis(address: str = REDIS_ADDRESS) -> aioredis.Redis:
global _redis
if _redis is None:
# Ensure the redis connection is still up before returning it
if _redis:
try:
await _redis.ping()
except aioredis.ConnectionClosedError:
_redis = None
if not _redis:
_redis = await aioredis.create_redis(address=address)

return _redis


Expand Down

0 comments on commit a2b9ed0

Please sign in to comment.