Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add redis implementation #230

Merged
merged 8 commits into from
Jan 25, 2024
Merged

add redis implementation #230

merged 8 commits into from
Jan 25, 2024

Conversation

minrk
Copy link
Member

@minrk minrk commented Jan 18, 2024

based on the situation described in #155, it appears that redis is still the only kv store with a well supported Python client.

After the refactor of the KV Store base class, adding a new kv store backend is very easy. This is the first one implemented since the refactor, and nothing needed to change (other than handling when cleanup may be async). So that worked well!

When I first ran the benchmarks, most things were fine, but delete was extremely slow and O(number of routes).

Screenshot 2024-01-18 at 14 31 34

It was implemented as:

async for sub_key in self.redis.scan_iter(key + "*"):
    to_delete.append(sub_key)
await self.redis.delete(*to_delete)

However, redis performance recommendations suggested a lua script to run essentially the exact same code on the redis host itself. This seems to dramatically improve performance, and bring it in line or better than the other KV stores (see the benchmark notebook):

Screenshot 2024-01-18 at 16 16 27

I don't know if the Python delete implementation could be improved much, but the lua is simple enough and very quick, I don't see much reason it would need to be modified.

The only place where performance wasn't great was the add_route measurement with concurrency=50. I don't see 50 simultaneous transactions on the routing table as a super likely situation, so I'm not too worried about it. And at least the performance wasn't outrageous from a JupyterHub perspective (up to a few seconds at the longest). Performance was good up to concurrency=20, and significantly better than others for get_all.

Given the relative ease and support level of using redis, I think redis should be the default traefik backend, and the one we recommend with this package.

TODO:

  • implement redis
  • tests
  • benchmarks
  • update CI for redis tests
  • update docs

@minrk minrk added the new new features label Jan 18, 2024
@manics
Copy link
Member

manics commented Jan 18, 2024

Sounds fine to me. If we make Redis the default backend can we also make it a mandatory requirement so it works out-of-the-box? The dependencies of redis-py are pretty minimal
https://github.com/redis/redis-py/blob/v5.0.1/setup.py#L38-L42

separate each backend installation
@minrk
Copy link
Member Author

minrk commented Jan 19, 2024

Yeah, I think that's a sensible idea.

@minrk
Copy link
Member Author

minrk commented Jan 19, 2024

I added docs, and ultimately decided not to add it to default dependencies. The reason being that the most common place this package is used in tljh, where redis is not needed. So it's not quite true that it's the default backend, it's the default distributed backend. I don't think documenting pip install jupyterhub-traefik-proxy redis is a huge burden, though w could have extras for the backends (especially since etcd/consul clients appear to be a moving target).

Copy link
Member

@manics manics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I forgot the main use case for a distributed proxy will involve containers. Based on this I don't think it's worth adding extras since that can be handled in the container build file.

I've made a pass through the docs, but I haven't looked/tried the code yet.

docs/source/redis.md Outdated Show resolved Hide resolved
docs/source/index.md Outdated Show resolved Hide resolved
@@ -70,6 +72,7 @@ you can get etcd or consul via their respective release pages:
Or, more likely, select the appropriate container image.
You will also need to install a Python client for the Key-Value store of your choice:

- `redis`
- `etcdpy`
- `python-consul2`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind updating the section below (https://jupyterhub-traefik-proxy--230.org.readthedocs.build/en/230/install.html#enabling-traefik-proxy-in-jupyterhub) too, to add redis, and to make it the example config instead of etcd? I think it'd be nice to replace the GitHub source file links with links to the Using TraefikXXXProxy pages instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, thanks

docs/source/redis.md Outdated Show resolved Hide resolved
docs/source/redis.md Outdated Show resolved Hide resolved
docs/source/redis.md Outdated Show resolved Hide resolved
docs/source/redis.md Outdated Show resolved Hide resolved
docs/source/redis.md Outdated Show resolved Hide resolved
docs/source/redis.md Outdated Show resolved Hide resolved
docs/source/redis.md Outdated Show resolved Hide resolved
Co-authored-by: Simon Li <orpheus+devel@gmail.com>
Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see that redis client become an extra dependency, because there can be version constraints we want to encode and we can if we establish a pattern of installing redis client as part of installing this packge

docs/source/install.md Outdated Show resolved Hide resolved
Copy link
Member

@consideRatio consideRatio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see redis client installed as an extra dep to help us manage constraints long term, but besides that this looks great from what i can say.

minrk and others added 2 commits January 25, 2024 09:08
Co-authored-by: Erik Sundell <erik.i.sundell@gmail.com>
so you can pip install jupyterhub-traefik-proxy[redis]
@consideRatio
Copy link
Member

Everything seems okay to go, nice work!

@consideRatio consideRatio merged commit 2af884a into jupyterhub:main Jan 25, 2024
9 checks passed
Copy link
Member

@GeorgianaElena GeorgianaElena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, the performance difference with using lua for deleting routes. Also, it's so nice to see how straightforward it is right now, after the refactoring to add support for a new kv store.

It's a bummer that etcd and consul python clients don't have better support, so having this redis implementation and recommending it as the default sounds like a very good decision. Thank you!

async def _kv_get_tree(self, prefix):
"""Return all data under prefix as a dict

Should probably use `unflatten_dict_from_kv`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that unflatten_dict_from_kv is already used here, so maybe it's ok for this comment to go?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a commit about this in #231!

@minrk minrk deleted the redis branch January 25, 2024 09:52
manics added a commit to manics/traefik-proxy that referenced this pull request May 24, 2024
Etcd and Consul have limited support jupyterhub#230
manics added a commit to manics/traefik-proxy that referenced this pull request May 24, 2024
Etcd and Consul have limited support jupyterhub#230
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new new features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants