Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spawn failing after stopping and starting redis container #242

Closed
kellyrowland opened this issue May 17, 2024 · 13 comments
Closed

spawn failing after stopping and starting redis container #242

kellyrowland opened this issue May 17, 2024 · 13 comments
Labels
bug Something isn't working

Comments

@kellyrowland
Copy link

Bug description

I'm using a local Jupyterhub instance using using TraefikRedisProxy, where Jupyterhub, Traefik, and Redis each have their own container and all three containers are brought up with Docker Compose. After bringing everything up, if I stop and start the Redis container, a server fails to spawn.

How to reproduce

  1. Build jupyterhub container with docker build -t jupyterhub-demo-redis .
  2. Run docker-compose up redis traefik jupyterhub
  3. Navigate to http://127.0.0.1:8000/ in a browser
  4. Enter arbitrary credentials and click on "Sign In"
  5. Log out of Jupyterhub
  6. Run docker-compose stop redis
  7. Run docker-compose start redis
  8. Navigate to http://127.0.0.1:8000/ in a browser
  9. Sign in with a different username or differently-named server than done previously in step 4

I've included the Dockerfile, docker-compose.yml, jupyterhub_config.py, and traefik.toml files in the files.tgz archive attached:

files.tgz

Expected behaviour

Redis should be able to stop and start without causing interference to spawning.

Actual behaviour

Stopping and starting Redis causes servers with new routes to fail to spawn.

Your personal set up

  • Installing the latest commit of the traefik-proxy repo main branch via pip in the Jupyterhub image
  • DummyAuthenticator
  • SimpleSpawner
  • Traefik 3
  • Redis 7.2
  • Using Docker Desktop 4.17.0 on macOS Monterey 12.7.2

Jupyterhub container log:

jupyterhub.log

@kellyrowland kellyrowland added the bug Something isn't working label May 17, 2024
@minrk
Copy link
Member

minrk commented May 22, 2024

I think I know what's happening here. Since redis doesn't have persistent storage, the config create by _setup_traefik_dynamic_config step gets lost when redis restarts, and not recreated until JupyterHub itself restarts.

I think the fix is to call _setup_traefik_dynamic_config at the right time to recreate it. If that's the case, enabling redis data persistence should help.

@rcthomas
Copy link
Contributor

Hi @minrk, I think I can see why that might cause new server routes not to work, but why the Redis ConnectionError?

@manics
Copy link
Member

manics commented May 22, 2024

The Redis client is initialised once:

redis = Any()
@default("redis")
def _connect_redis(self):
try:
from redis.asyncio import Redis
except ImportError:
raise ImportError(
"Please install `redis` package to use traefik-proxy with redis"
)
url = urlparse(self.redis_url)
if url.port:
port = url.port
else:
# default port
port = 6379
kwargs = dict(
host=url.hostname,
port=port,
decode_responses=True,
)
if self.redis_password:
kwargs["password"] = self.redis_password
if self.redis_username:
kwargs["username"] = self.redis_username
kwargs.update(self.redis_client_kwargs)
return Redis(**kwargs)

so maybe the connection is invalidated by the restart?

@rcthomas
Copy link
Contributor

rcthomas commented May 22, 2024

@manics the asyncio Redis client uses a connection pool by default right?

@manics
Copy link
Member

manics commented May 22, 2024

@minrk
Copy link
Member

minrk commented May 22, 2024

We should be using a default retry, which would fix the ConnectionError, I think.

@kellyrowland
Copy link
Author

Thanks for the suggestion on trying Redis persistence - unfortunately it doesn't seem to change the behavior. I changed the service to:

  redis:
    image: redis/redis-stack:latest
    volumes:
      - $PWD/redis-data:/data

in docker-compose.yml but am still seeing a ConnectionError after a stop/start of redis and trying to spawn a server with a new name:

github-jupyterhub-1  | [D 2024-05-23 17:35:12.165 JupyterHub redis:93] Setting redis keys dict_keys(['traefik/http/routers/router__2Fuser_2Ft9_2Fnew_2F/service', 'traefik/http/routers/router__2Fuser_2Ft9_2Fnew_2F/rule', 'traefik/http/routers/router__2Fuser_2Ft9_2Fnew_2F/entryPoints/0', 'traefik/http/services/service__2Fuser_2Ft9_2Fnew_2F/loadBalancer/servers/0/url', 'traefik/http/services/service__2Fuser_2Ft9_2Fnew_2F/loadBalancer/passHostHeader', 'jupyterhub/routes/router__2Fuser_2Ft9_2Fnew_2F/data/user', 'jupyterhub/routes/router__2Fuser_2Ft9_2Fnew_2F/data/server_name', 'jupyterhub/routes/router__2Fuser_2Ft9_2Fnew_2F/routespec', 'jupyterhub/routes/router__2Fuser_2Ft9_2Fnew_2F/target', 'jupyterhub/routes/router__2Fuser_2Ft9_2Fnew_2F/router', 'jupyterhub/routes/router__2Fuser_2Ft9_2Fnew_2F/service'])
github-jupyterhub-1  | [E 2024-05-23 17:35:12.166 JupyterHub base:994] Failed to add t9:new to proxy!
github-jupyterhub-1  |     Traceback (most recent call last):
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub/handlers/base.py", line 987, in finish_user_spawn
github-jupyterhub-1  |         await self.proxy.add_user(user, server_name)
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub/proxy.py", line 343, in add_user
github-jupyterhub-1  |         await self.add_route(
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub_traefik_proxy/proxy.py", line 687, in add_route
github-jupyterhub-1  |         await self._apply_dynamic_config(traefik_config, jupyterhub_config)
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub_traefik_proxy/kv_proxy.py", line 127, in _apply_dynamic_config
github-jupyterhub-1  |         await self._kv_atomic_set(to_set)
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/jupyterhub_traefik_proxy/redis.py", line 94, in _kv_atomic_set
github-jupyterhub-1  |         await self.redis.mset(to_set)
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/redis/asyncio/client.py", line 612, in execute_command
github-jupyterhub-1  |         return await conn.retry.call_with_retry(
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/redis/asyncio/retry.py", line 62, in call_with_retry
github-jupyterhub-1  |         await fail(error)
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/redis/asyncio/client.py", line 599, in _disconnect_raise
github-jupyterhub-1  |         raise error
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/redis/asyncio/retry.py", line 59, in call_with_retry
github-jupyterhub-1  |         return await do()
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/redis/asyncio/client.py", line 586, in _send_command_parse_response
github-jupyterhub-1  |         return await self.parse_response(conn, command_name, **options)
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/redis/asyncio/client.py", line 633, in parse_response
github-jupyterhub-1  |         response = await connection.read_response()
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/redis/asyncio/connection.py", line 541, in read_response
github-jupyterhub-1  |         response = await self._parser.read_response(
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/resp2.py", line 82, in read_response
github-jupyterhub-1  |         response = await self._read_response(disable_decoding=disable_decoding)
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/resp2.py", line 90, in _read_response
github-jupyterhub-1  |         raw = await self._readline()
github-jupyterhub-1  |       File "/usr/local/lib/python3.10/dist-packages/redis/_parsers/base.py", line 221, in _readline
github-jupyterhub-1  |         raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
github-jupyterhub-1  |     redis.exceptions.ConnectionError: Connection closed by server.
github-jupyterhub-1  |     
github-jupyterhub-1  | [E 2024-05-23 17:35:12.179 JupyterHub base:995] Stopping t9:new to avoid inconsistent state
github-jupyterhub-1  | [D 2024-05-23 17:35:12.179 JupyterHub user:931] Stopping t9:new
github-jupyterhub-1  | [D 2024-05-23 17:35:12.179 JupyterHub spawner:1760] Interrupting 62
github-jupyterhub-1  | [I 2024-05-23 17:35:12.180 SingleUserLabApp serverapp:3108] Interrupted...
github-jupyterhub-1  | /usr/local/lib/python3.10/dist-packages/jupyterhub/orm.py:664: SAWarning: The argument signature for the "ConnectionEvents.engine_connect" event listener has changed as of version 2.0, and conversion for the old argument signature will be removed in a future release.  The new signature is "def engine_connect(conn)" (This warning originated from the Session 'autoflush' process, which was invoked automatically in response to a user-initiated operation.)
github-jupyterhub-1  |   for orm_token in prefix_match:
github-jupyterhub-1  | [D 2024-05-23 17:35:13.138 JupyterHub user:951] Deleting oauth client jupyterhub-user-t9-new
github-jupyterhub-1  | [D 2024-05-23 17:35:13.156 JupyterHub user:954] Finished stopping t9:new
github-jupyterhub-1  | [W 2024-05-23 17:35:13.174 JupyterHub users:764] Server t9:new didn't start for unknown reason

@manics
Copy link
Member

manics commented May 24, 2024

There are two issues here:

  • Loss of persistent state, solved by adding a volume for Redis
  • Loss of the connection to Redis, this requires a retry on failure as mentioned above.

@kellyrowland
Copy link
Author

This does not appear to address the issue. I added:

from redis.backoff import ExponentialBackoff
from redis.retry import Retry
from redis.exceptions import (
   BusyLoadingError,
   ConnectionError,
   TimeoutError
)

retry = Retry(ExponentialBackoff(), 3)
c.TraefikRedisProxy.redis_client_kwargs = dict(
    retry=retry,
    retry_on_error=[BusyLoadingError, ConnectionError, TimeoutError]
) 

to jupyterhub_config.py in addition to the Redis container volume in docker-compose.yml and am still seeing redis.exceptions.ConnectionError: Connection closed by server. after a stop and start of Redis and trying a differently-named server spawn.

@minrk
Copy link
Member

minrk commented May 27, 2024

The retry fix should work if you from redis.asyncio.retry instead of from redis.retry.

But the root problem seems to be on the traefik side, where we seem to be hitting traefik/traefik#6832 which was closed optimistically based on a dependency update, but without a confirmed fix. traefik/traefik#8749 (comment) suggests that we need to enable keyspace events in order for traefik watch to work properly. Why the issue only occurs after a disconnect rather than always? 🤷

Combining all that:

  1. adding retry to redis:
from redis.backoff import ExponentialBackoff
from redis.asyncio.retry import Retry
from redis.exceptions import (
   BusyLoadingError,
   ConnectionError,
   TimeoutError
)

retry = Retry(ExponentialBackoff(), 3)
c.TraefikRedisProxy.redis_client_kwargs = dict(
    retry=retry,
    retry_on_error=[BusyLoadingError, ConnectionError, TimeoutError]
) 
  1. enabing persistence and keyspace notifications in redis:
  redis:
    environment:
      REDIS_ARGS: "--appendonly yes --notify-keyspace-events KEA"

the problem seems to go away for me.

@minrk
Copy link
Member

minrk commented May 29, 2024

#244 adds default retry config, while #246 documents the requirement for persistence and with those two everything seems to work for me.

I opened #247 as a separate issue for the loss/restore of the initial dynamic config, which is part of why we currently have the implicit assumption of persistence.

@kellyrowland
Copy link
Author

Thanks! The latest commit looks good on my end, both locally and with a k8s setup that we've got, so I'll close this out with those two PRs in.

@minrk
Copy link
Member

minrk commented May 30, 2024

Thanks for reporting and testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants