Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: reflect: add mechanisms to allocate large slices only if possible #33060

Closed
ianlancetaylor opened this issue Jul 11, 2019 · 18 comments
Closed

Comments

@ianlancetaylor
Copy link
Contributor

We've seen several discussions about how to handle running out of memory in a Go program. In a quick look I found #5049 (closed), #14162 (closed), #16843 (open).

One way to think about memory allocations in a program is to observe that there are fixed-size allocations (new(int), or simply letting the address of a variable escape) and there are variable-size allocations (make([]int, N), append(a, b...)). If your program does too many fixed-size allocations, then you have a garbage collection problem; how to handle GC backpressure is discussed in the issues cited above. However, many server programs need to read variable-length data from a client, and they don't know how much space they will need when a request arrives. It seems useful to permit those programs to have some way of checking, as they allocate memory, whether that memory is available.

Any approach that relies on asking the garbage collector how much memory is available is prone to race conditions in a server, and relies on details of garbage collection. I propose a simpler mechanism:

package reflect

// TryMakeSlice creates a new zero-initialized slice value
// for the specified slice type, length, and capacity.
// If the memory is not available, it returns a zero Value.
func TryMakeSlice(typ Type, len, cap int) Value

// TryAppend appends the values x to a slice s and returns the resulting slice.
// As in Go, each x's value must be assignable to the slice's element type.
// If the memory is not available, it returns a zero Value.
func TryAppend(s Value, x ...Value) Value

(I note in passing that if we had generics, we could write these functions in a more natural way and put them in the runtime package. But the underlying idea remains the same.)

These functions in the reflect package are not the easiest to use, but they will permit servers to handle certain common cases of memory overload without simply crashing. We should set the criteria under which they would fail. For example, perhaps they should fail if after allocation less than 5% of available memory would remain unallocated.

We could then use these functions in packages like encoding/json and encoding/gob, so that those packages could reliably fail with an out of memory error when decoding a large incoming stream.

@gopherbot gopherbot added this to the Proposal milestone Jul 11, 2019
@networkimprov
Copy link

networkimprov commented Jul 11, 2019

Can anything be done to help existing programs fail gracefully?

I imagine the proposed API would alter best practices for long-running programs to: "Always use reflect.Try*() instead of make() and append()."

Since the goal is to keep allocated memory < N% of available mem, have you considered a runtime API which makes normal allocations optionally return nil or panic with "almost-OOM" (which is recoverable) if an allocation would exceed the limit?

@ianlancetaylor
Copy link
Contributor Author

If we change encoding/gob and encoding/json, that will automatically help a number of existing programs: they will see and report an error rather than running out of memory. A server will report an error back to the client, and continue serving other clients rather than crashing. There are likely a few other packages that could be changed similarly, such as, perhaps, archive/zip.

I don't think it's necessary or appropriate to always use the new functions. It's appropriate to use the new functions in a server serving multiple concurrent clients where the size of the allocated memory is controlled by the client. When memory usage is not controlled by an external client, other mechanisms to limit memory use are almost certainly more appropriate.

I don't think we can reasonably change existing allocations (by which I assume you mean the pre-declared make and append functions). Existing programs do not expect those functions to fail or panic. Changing the functions would likely cause the existing programs either to fail or to crash. Since the current behavior is to crash anyhow, nothing would be gained by the change.

@ianlancetaylor
Copy link
Contributor Author

ianlancetaylor commented Jul 11, 2019

Let me add something:

Since the goal is to keep allocated memory < N% of available mem

I want to clarify that although that may be a reasonable goal, it's not the goal of these proposed functions. The goal of these proposed functions is to permit a program to avoid some crash cases when a client makes a request that requires a lot of memory to serve. We don't currently have any way to ask whether an allocation can succeed. This proposal adds that capability.

@bcmills
Copy link
Contributor

bcmills commented Jul 12, 2019

You suggest a 5% threshold of remaining free memory for failure, but it seems to me that the appropriate threshold in a real server depends very strongly on the implementation details of that server.

If the caller already knows that the allocation is large enough that it may fail, it seems simpler for them to apply a (weighted) semaphore before the allocation rather than a check after it — and that makes the overall memory fraction much easier to tailor to their specific server. What advantages does the post-hoc approach provide over the explicit semaphore?

@ianlancetaylor
Copy link
Contributor Author

The advantage of the post-hoc approach is that it can be added to existing packages like encoding/json without complicating their ABI. A server that takes json-encoded data from a client has no way of knowing how much memory will be required to represent that information.

Another approach would be to invent a new interchange format that is easier to control, but JSON is what we have today.

We can always add more knobs, either to encoding/json or to the entirely arbitrary 5% threshold, but I think it's worth thinking about what we can do without knobs. Especially since even if we had knobs most programs wouldn't use them.

@bcmills
Copy link
Contributor

bcmills commented Jul 12, 2019

It seems to me that this would allow binary maintainers to paper over flow-control bugs under moderate load in much the same way that increasing channel buffers allows maintainers to paper over concurrency bugs, and under similar conditions: it's an easy hammer to make the symptom go away without actually understanding or addressing the underlying problem.

In particular, it's easy to use such a tool to “fix” flakiness in a simplified load-test without actually fixing the production risk that the test is intended to detect: it would produce recoverable errors for large overallocations (which are easy to provoke in a synthetic test), but still result in crashes for more modest ones (which are more likely to occur under organic growth).

@rsc
Copy link
Contributor

rsc commented Jul 16, 2019

Reflect seems pretty clunky; what about x, ok = make([]int, N) or x, ok = append(x, y...)?

@rsc
Copy link
Contributor

rsc commented Jul 16, 2019

(Also how do we know for sure whether a particular allocation should fail as out of memory? Overcommit means sometimes you think it's going to work and you get killed anyway.)

@RLH
Copy link
Contributor

RLH commented Jul 18, 2019

The amount of memory available for allocation is a global property and this proposal tries to deal with this using local information. The proposal comes with potential races between various allocations to see which one succeeds, which one fails using this proposal’s functions, and which fails with an OOM. Using these functions will increase technical debt as applications solve localized coding problems instead of taking a more global approach such as envisioned by SetMaxHeap.

Pragmatically, it would be difficult to create a reliable replicator, decrease the robustness of tests, and increase the maintenance load on the core runtime team responding to error reports. Replicator creation will be further complicated by the fact that the proposal is sensitive to co-tenancy problems so global is even “more global” (sorry) than reachability. Finally, any allocation not using this API can result in an OOM so it would not be possible for a high level package API, say the JSON API, to claim that it will return an error instead of an OOM.

The SetMaxHeap alternative should be allowed to mature and be used for several releases to see if it can address some of the problems that motivated this proposal. If a client can force a server to allocate unbounded amounts of memory then perhaps the server can be architected to use a series of allocation with bounded size along with SetMaxHeap to fail gracefully.

@ianlancetaylor
Copy link
Contributor Author

If a client can force a server to allocate unbounded amounts of memory then perhaps the server can be architected to use a series of allocation with bounded size along with SetMaxHeap to fail gracefully.

I think this is the key point. Does SetMaxHeap permit such a server to fail gracefully?

@bcmills
Copy link
Contributor

bcmills commented Aug 9, 2019

In the context of “encoding/json without adding knobs”, I don't see how SetMaxHeap would help: the API would need some way to allow the caller to cancel a call to Decode or Unmarshal in response to heap growth, and plumbing in any explicit cancellation mechanism would require an API change.

@RLH
Copy link
Contributor

RLH commented Aug 10, 2019 via email

@rsc
Copy link
Contributor

rsc commented Dec 11, 2019

There is a larger problem here than just allocating large slices. The difficult part is the "only if possible". It's not clear what the conditions are under which an allocation should fail. What if it leaves 1 byte of memory, or even 1MB of memory, and the next new([2<<20]byte) crashes instead?

C lets malloc fail and many people do check it.
C++ throws an exception when new fails.
Java throws an exception too.
But I bet most C++ and Java programs do not handle those exceptions well.

We have a hard-coded limit in bytes.Buffer.
We also have a hard-coded limit in encoding/gob, in part to protect against malicious gobs that say "here comes a billion entries" and then cut off.
There's no prefixed count in encoding/json or encoding/xml that would cause an arbitrarily large allocation from an arbitrarily small input.

For streaming outputs like encoding/gzip we have io.LimitWriter.
There's no equivalent for data structure unmarshalers.
At least json and xml are generating data proportional to the input size.

This specific proposal of reflect handling just slice allocation is probably not enough to solve this larger problem. It's unclear how to move forward here.

@josharian
Copy link
Contributor

We also have a hard-coded limit in encoding/gob, in part to protect against malicious gobs that say "here comes a billion entries" and then cut off.

IIRC, the protections for encoding/gob are incomplete. See #20221.

@ianlancetaylor
Copy link
Contributor Author

Related issues: #5050 #20169.

@rsc
Copy link
Contributor

rsc commented Jan 8, 2020

It's still unclear how to solve the underlying problem - give Go programs that care some way to limit allocation in code that is cooperating with the limits. Perhaps some kind of memory allocation context threaded through code (a little like an arena) would work, but that's getting pretty far from this specific proposal.

Perhaps @ianlancetaylor will write more about an allocation-context-with-max in a different proposal. The difference is that the explicit context would be for a group of allocations, not one, and users would be in control of the limit.

It seems like this one is a likely decline.

Leaving open for a week for final comments.

@networkimprov
Copy link

Regarding a limit for a group of allocations, here's that idea for channel buffers: #20868

It would be great if an "arena" proposal encompassed that use case.

@rsc
Copy link
Contributor

rsc commented Jan 15, 2020

No change in consensus; closing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants