Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide memory usage limits on session creation from startup params #545

Closed
scsmithr opened this issue Jan 25, 2023 · 12 comments · Fixed by #890
Closed

Provide memory usage limits on session creation from startup params #545

scsmithr opened this issue Jan 25, 2023 · 12 comments · Fixed by #890
Assignees
Labels
feat 🎇 New feature or request

Comments

@scsmithr
Copy link
Member

scsmithr commented Jan 25, 2023

Summary

When making a request to Cloud for connection authentication, Cloud will respond back with that database ID. Eventually Cloud should respond back with an optional set of resource limits as well. For example, a session connecting to a database on a shared node should be limited to .5G of memory. This will then be used during session creation to create a RuntimeEnv with this limit.

We'll want to specify disk/cpu usage in the future, but I'm not yet sure how we accomplish that at the session level yet.

See what we're currently doing for the database id:

https://github.com/glaredb/glaredb/blob/8d758de55c8112623e13c24de9341b29ea63e5ce/crates/pgsrv/src/handler.rs#L105-L116

Specifications

  • Extend response body that we get back from cloud to include mem limit.
  • Put that limit (if it exists) in the startup params that get forwarded to the underlying node.
  • Parse limit on GlareDB side (if exists), and include it during session creation.

Rationale

Limit session resource usage on shared nodes.

Impact

Cloud needs the expand the endpoint to optionally include the limits when it sends back a response to pgsrv.

cc @greyscaled @f0ssel

Tasks

Related issues:
apache/datafusion#587
apache/datafusion#3941

@scsmithr scsmithr added the feat 🎇 New feature or request label Jan 25, 2023
@greyscaled
Copy link
Contributor

Could we start with a hard-coded limit?

@scsmithr
Copy link
Member Author

Yes, I was thinking we could do:

  • No limit if connecting to "dedicated" node.
  • 500M memory limit if connecting to "shared" node.

@greyscaled
Copy link
Contributor

@scsmithr I like what's described here:

  • alpha = hard-coded limit of 500M for shared, no such limit for dedicated

I think we can implement that in cloud and actually include that information to accomplish https://github.com/GlareDB/cloud/issues/578

I imagine seeing this possibly show up in the header, and for sure in the settings (now that we have them 😎

@greyscaled
Copy link
Contributor

How do we actually limit resource utilization?

@greyscaled
Copy link
Contributor

Garrett: Can we infer it from disk usage?
Sean: How do you limit tokio per session?

@greyscaled
Copy link
Contributor

Mem and disk more important than cpu

@greyscaled
Copy link
Contributor

I think we can start by stress testing and getting an reasonable estimate for number of people we can roll alpha out to without causing problems.

Again what matters most is memory and disk.

@greyscaled
Copy link
Contributor

Sean: What if we attached a disk to each node

@greyscaled
Copy link
Contributor

Garrett: Does it need to be shared?
Sean: It doesn't matter we don't care to save the data, it's to prevent spillage.

@greyscaled
Copy link
Contributor

greyscaled commented Apr 17, 2023

2023-04-17:

Adding the param is easy, but we need to be able to actually limit the memory too.

DF?

  • We don't know how well it works (some issues in GH)
  • Sometimes it doesn't actually limit, or record the used memory
  • But we can still give it a go/figure it out?

@scsmithr
Copy link
Member Author

@f0ssel
Copy link
Contributor

f0ssel commented Apr 19, 2023

Really appreciate the guidance here 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat 🎇 New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants